parquet_encode
This component is experimental and therefore subject to change or removal outside of major version releases.
Encodes Parquet files from a batch of structured messages.
Introduced in version 1.0.0.
- Common
- Advanced
# Common config fields, showing default values
label: ""
parquet_encode:
schema: [] # No default (required)
default_compression: uncompressed
# All config fields, showing default values
label: ""
parquet_encode:
schema: [] # No default (required)
default_compression: uncompressed
default_encoding: DELTA_LENGTH_BYTE_ARRAY
This processor uses https://github.com/parquet-go/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases.
Examples
- Writing Parquet Files to AWS S3
In this example we use the batching mechanism of an aws_s3
output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it.
output:
aws_s3:
bucket: TODO
path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet'
batching:
count: 1000
period: 10s
processors:
- parquet_encode:
schema:
- name: id
type: INT64
- name: weight
type: DOUBLE
- name: content
type: BYTE_ARRAY
default_compression: zstd
Fields
schema
Parquet schema.
Type: array
schema[].name
The name of the column.
Type: string
schema[].type
The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8.
Type: string
Options: BOOLEAN
, INT32
, INT64
, FLOAT
, DOUBLE
, BYTE_ARRAY
, UTF8
.
schema[].repeated
Whether the field is repeated.
Type: bool
Default: false
schema[].optional
Whether the field is optional.
Type: bool
Default: false
schema[].fields
A list of child fields.
Type: array
# Examples
fields:
- name: foo
type: INT64
- name: bar
type: BYTE_ARRAY
default_compression
The default compression type to use for fields.
Type: string
Default: "uncompressed"
Options: uncompressed
, snappy
, gzip
, brotli
, zstd
, lz4raw
.
default_encoding
The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support DELTA_LENGTH_BYTE_ARRAY
and is therefore best left unset where possible.
Type: string
Default: "DELTA_LENGTH_BYTE_ARRAY"
Requires version 1.0.0 or newer
Options: DELTA_LENGTH_BYTE_ARRAY
, PLAIN
.