Skip to main content

opensnowcat

EXPERIMENTAL

This component is experimental and therefore subject to change or removal outside of major version releases.

Processes OpenSnowcat/Snowplow enriched TSV events. Convert enriched TSV to flattened JSON, filter events, and transform sensitive fields for privacy compliance.

Introduced in version 1.12.0.

# Common config fields, showing default values
label: ""
opensnowcat:
output_format: tsv

This processor provides comprehensive event processing capabilities:

Features

Format Conversion

  • Convert enriched TSV to flattened JSON with automatic context extraction
  • Maintain TSV format for OpenSnowcat/Snowplow downstream compatibility

Event Filtering

  • Drop events based on field values (IP addresses, user agents, etc.)
  • Filter by schema property paths in contexts, derived_contexts, and unstruct_event
  • OR logic: event is dropped if ANY filter matches

Field Transformations

Transform sensitive fields for PII compliance and privacy:

  • hash: Hash field values using configurable algorithms (MD5, SHA-1, SHA-256, SHA-384, SHA-512) with salt
  • redact: Replace field values with a fixed string (e.g., "[REDACTED]")
  • anonymize_ip: Mask IP addresses while preserving network information (supports both IPv4 and IPv6)
    • IPv4: Mask last N octets using anon_octets parameter
    • IPv6: Mask last N segments using anon_segments parameter

All transformations support both direct TSV columns and schema property paths.

Examples

Converts OpenSnowcat/Snowplow enriched TSV events to flattened JSON format, extracting all contexts, derived contexts, and unstruct events into top-level fields.

pipeline:
processors:
- opensnowcat:
output_format: json

Fields

output_format

Output format for processed events.

Type: string
Default: "tsv"

OptionSummary
enriched_jsonConvert to database-optimized nested JSON with key-based schema structure. Each schema becomes a key (vendor_name) containing version and data array. Compatible with BigQuery, Snowflake, Databricks, Redshift, PostgreSQL, ClickHouse, and Iceberg tables. Enables simple queries without UNNEST and schema evolution without table mutations.
jsonConvert enriched TSV to flattened JSON with contexts, derived_contexts, and unstruct_event automatically flattened into top-level objects.
tsvMaintain enriched TSV format without conversion.

filters

Filter and transformation configurations

Type: object

filters.drop

Map of field names to filter criteria. Events matching ANY criteria will be dropped (OR logic). Supports both regular TSV columns (e.g., user_ipaddress, useragent) and schema property paths (e.g., com.snowplowanalytics.snowplow.ua_parser_context.useragentFamily). Each filter uses 'contains' for substring matching.

Type: object

filters.transform

Field transformation configuration for anonymization, hashing, and redaction

Type: object

filters.transform.salt

Global default salt for hashing operations. Can be overridden per field.

Type: string

filters.transform.hash_algo

Global default hash algorithm. Can be overridden per field.

Type: string
Default: "SHA-256"
Options: MD5, SHA-1, SHA-256, SHA-384, SHA-512.

filters.transform.fields

Map of field names to transformation configurations. Each field must specify:

  • strategy (required): Transformation type - "hash", "redact", or "anonymize_ip"
  • hash_algo (optional): Algorithm for hash strategy - "MD5", "SHA-1", "SHA-256", "SHA-384", "SHA-512" (overrides global default)
  • salt (optional): Salt for hash strategy (overrides global default)
  • redact_value (optional): Replacement value for redact strategy (default: "[REDACTED]")
  • anon_octets (optional): Number of IPv4 octets to mask for anonymize_ip strategy (default: 0)
  • anon_segments (optional): Number of IPv6 segments to mask for anonymize_ip strategy (default: 0)

Supports both TSV columns (e.g., user_id, user_ipaddress) and schema property paths (e.g., com.vendor.schema.field).

Type: object

schema_discovery

Schema discovery configuration

Type: object

schema_discovery.enabled

Enable schema discovery feature

Type: bool
Default: false

schema_discovery.flush_interval

Interval between schema discovery flushes

Type: string
Default: "5m"

schema_discovery.endpoint

HTTP endpoint to send schema discovery data

Type: string
Default: "https://api.snowcatcloud.com/internal/schema-discovery"

schema_discovery.template

Template for schema discovery payload. Use {{SCHEMAS}} variable for schema list

Type: string
Default: "{\"schemas\": {{SCHEMAS}}}"