Skip to main content

azure_table_storage

BETA

This component is mostly stable but breaking changes could still be made outside of major version releases if a fundamental problem with the component is found.

Stores messages in an Azure Table Storage table.

Introduced in version 1.0.0.

# Common config fields, showing default values
output:
label: ""
azure_table_storage:
storage_account: ""
storage_access_key: ""
storage_connection_string: ""
storage_sas_token: ""
table_name: ${! metadata("kafka_topic") } # No default (required)
partition_key: ""
row_key: ""
properties: {}
max_in_flight: 64
batching:
count: 0
byte_size: 0
period: ""
jitter: 0
check: ""

Only one authentication method is required, storage_connection_string or storage_account and storage_access_key. If both are set then the storage_connection_string is given priority.

In order to set the table_name, partition_key and row_key you can use function interpolations described here, which are calculated per message of a batch.

If the properties are not set in the config, all the json fields are marshalled and stored in the table, which will be created if it does not exist.

The object and array fields are marshaled as strings. e.g.:

The JSON message:

{
"foo": 55,
"bar": {
"baz": "a",
"bez": "b"
},
"diz": ["a", "b"]
}

Will store in the table the following properties:

foo: '55'
bar: '{ "baz": "a", "bez": "b" }'
diz: '["a", "b"]'

It's also possible to use function interpolations to get or transform the properties values, e.g.:

properties:
device: '${! json("device") }'
timestamp: '${! json("timestamp") }'

Performance

This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field max_in_flight.

This output benefits from sending messages as a batch for improved performance. Batches can be formed at both the input and output level. You can find out more in this doc.

Fields

storage_account

The storage account to access. This field is ignored if storage_connection_string is set.

Type: string
Default: ""

storage_access_key

The storage account access key. This field is ignored if storage_connection_string is set.

Type: string
Default: ""

storage_connection_string

A storage account connection string. This field is required if storage_account and storage_access_key / storage_sas_token are not set.

Type: string
Default: ""

storage_sas_token

The storage account SAS token. This field is ignored if storage_connection_string or storage_access_key are set.

Type: string
Default: ""

table_name

The table to store messages into. This field supports interpolation functions.

Type: string

# Examples

table_name: ${! metadata("kafka_topic") }

table_name: ${! json("table") }

partition_key

The partition key. This field supports interpolation functions.

Type: string
Default: ""

# Examples

partition_key: ${! json("date") }

row_key

The row key. This field supports interpolation functions.

Type: string
Default: ""

# Examples

row_key: ${! json("device")}-${!uuid_v4() }

properties

A map of properties to store into the table. This field supports interpolation functions.

Type: object
Default: {}

transaction_type

Type of transaction operation. This field supports interpolation functions.

Type: string
Default: "INSERT"
Options: INSERT, INSERT_MERGE, INSERT_REPLACE, UPDATE_MERGE, UPDATE_REPLACE, DELETE.

# Examples

transaction_type: ${! json("operation") }

transaction_type: ${! metadata("operation") }

transaction_type: INSERT

max_in_flight

The maximum number of parallel message batches to have in flight at any given time.

Type: int
Default: 64

timeout

The maximum period to wait on an upload before abandoning it and reattempting.

Type: string
Default: "5s"

batching

Allows you to configure a batching policy.

Type: object

# Examples

batching:
byte_size: 5000
count: 0
period: 1s

batching:
count: 10
period: 1s

batching:
check: this.contains("END BATCH")
count: 0
period: 1m

batching:
count: 10
jitter: 0.1
period: 10s

batching.count

A number of messages at which the batch should be flushed. If 0 disables count based batching.

Type: int
Default: 0

batching.byte_size

An amount of bytes at which the batch should be flushed. If 0 disables size based batching.

Type: int
Default: 0

batching.period

A period in which an incomplete batch should be flushed regardless of its size.

Type: string
Default: ""

# Examples

period: 1s

period: 1m

period: 500ms

batching.jitter

A non-negative factor that adds random delay to batch flush intervals, where delay is determined uniformly at random between 0 and jitter * period. For example, with period: 100ms and jitter: 0.1, each flush will be delayed by a random duration between 0-10ms.

Type: float
Default: 0

# Examples

jitter: 0.01

jitter: 0.1

jitter: 1

batching.check

A Bloblang query that should return a boolean value indicating whether a message should end a batch.

Type: string
Default: ""

# Examples

check: this.type == "end_of_transaction"

batching.processors

A list of processors to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.

Type: array

# Examples

processors:
- archive:
format: concatenate

processors:
- archive:
format: lines

processors:
- archive:
format: json_array