Skip to main content
Version: edge

Codecs

TL;DR: Codecs are used to describe how to decode data from the wire and encode it back to wire format.

Tremor connects to external systems using connectors. Connectors use codecs to transform the data Tremor receives from connected system participants into a structured value that forms the payload of each and every Tremor event.

Codecs are the means of turning the (mostly binary) data from the wire (e.g. from a TCP connection) into structured values for Tremor events and back into binary wire format. Each connector can be configured with a codec.

Usage

If you expect JSON data from a UDP connection, you need to configure the json codec.

Example:

define connector udp_example from udp_server
with
codec = "json",
config = {
"url": "localhost:12345"
}
end;

This udp_example connector is configured to expect exactly one JSON document for each UDP Datagram with no leading or tailing bytes

Codecs and Preprocessors

If you expect line-delimited JSON instead, with 1 document per line, you need to add a preprocessor that separates the wire data by newline and feeds each line to the codec.

Preprocessors perform various kinds of preprocessing on the wire data, e.g. splitting data by some separator or decompressing data, and multiple can be configured to operate in a chain. The result of this chain, one or multiple chunks of binary data, is passed on to the codec.

This can be done elegantly with the TCP connector instead.

Example:

define connector line_delimited_json_via_tcp from tcp_server
with
preprocessors = [
{
"name": "separate",
"config": {
"separator": "\n"
}
}
],
codec = "json",
config = {
"url": "localhost:65535"
}
end;

This line_delimited_json_via_tcp connector is now configured to expect 1 JSON document per line from each accepted TCP connection. Just by adding the separate Preprocessor.

Codecs and Postprocessors

If we want to send out line delimited JSON where each JSON document is base64 encoded, we need to use a postprocessor. Postprocessors perform some action on the binary data a codec produces. They can e.g. Split or join the data, compress the data or prefix it with a length-prefix.

Example:

define connector my_tcp_client from tcp_client
with
codec = "json",
postprocessors = [
"base64",
"separate"
],
config = {
"url": "localhost:9200"
}
end;

This my_tcp_client connector is configured to use 2 postprocessors in a chain. First each event is encoded using the json codec, then the encoded binary data is base64-encoded by the base64 postprocessor and finally each resulting chunk of base64 data is split from the next by inserting a line delimiter using the separate postprocessor.

Codecs share similar concepts to extractors, but differ in their application. Codecs are applied to external data as they are ingested by or egressed from a running Tremor process. Extractors, on the other hand, are used in scripts to extract structured from e.g. strings that are already part of a Tremor event.

Data Format

Tremor's internal data representation is JSON-like. The supported value types are:

  • String- UTF-8 encoded
  • Numeric (float, integer)
  • Boolean
  • Null
  • Array
  • Record (string keys)
  • Binary (raw bytes)