Skip to main content

opensnowcat

EXPERIMENTAL

This component is experimental and therefore subject to change or removal outside of major version releases.

Processes OpenSnowcat/Snowplow enriched TSV events. Convert enriched TSV to flattened JSON, filter events, and transform sensitive fields for privacy compliance.

Introduced in version 1.12.0.

# Common config fields, showing default values
label: ""
opensnowcat:
output_format: tsv

This processor provides comprehensive event processing capabilities:

Features

Format Conversion

  • Convert enriched TSV to flattened JSON with automatic context extraction
  • Maintain TSV format for OpenSnowcat/Snowplow downstream compatibility

Event Filtering

  • Drop events based on field values (IP addresses, user agents, etc.)
  • Filter by schema property paths in contexts, derived_contexts, and unstruct_event
  • OR logic: event is dropped if ANY filter matches

Field Transformations

Transform sensitive fields for PII compliance and privacy:

  • hash: Hash field values using configurable algorithms (MD5, SHA-1, SHA-256, SHA-384, SHA-512) with salt
  • redact: Replace field values with a fixed string (e.g., "[REDACTED]")
  • anonymize_ip: Mask IP addresses while preserving network information (supports both IPv4 and IPv6)
    • IPv4: Mask last N octets using anon_octets parameter
    • IPv6: Mask last N segments using anon_segments parameter

All transformations support both direct TSV columns and schema property paths.

Examples

Converts OpenSnowcat/Snowplow enriched TSV events to flattened JSON format, extracting all contexts, derived contexts, and unstruct events into top-level fields.

pipeline:
processors:
- opensnowcat:
output_format: json

Fields

output_format

Output format for processed events.

Type: string
Default: "tsv"

OptionSummary
jsonConvert enriched TSV to flattened JSON with contexts, derived_contexts, and unstruct_event automatically flattened into top-level objects.
tsvMaintain enriched TSV format without conversion.

filters

Filter and transformation configurations

Type: object

filters.drop

Map of field names to filter criteria. Events matching ANY criteria will be dropped (OR logic). Supports both regular TSV columns (e.g., user_ipaddress, useragent) and schema property paths (e.g., com.snowplowanalytics.snowplow.ua_parser_context.useragentFamily). Each filter uses 'contains' for substring matching.

Type: object

filters.transform

Field transformation configuration for anonymization, hashing, and redaction

Type: object

filters.transform.salt

Global default salt for hashing operations. Can be overridden per field.

Type: string

filters.transform.hash_algo

Global default hash algorithm. Can be overridden per field.

Type: string
Default: "SHA-256"
Options: MD5, SHA-1, SHA-256, SHA-384, SHA-512.

filters.transform.fields

Map of field names to transformation configurations. Each field must specify:

  • strategy (required): Transformation type - "hash", "redact", or "anonymize_ip"
  • hash_algo (optional): Algorithm for hash strategy - "MD5", "SHA-1", "SHA-256", "SHA-384", "SHA-512" (overrides global default)
  • salt (optional): Salt for hash strategy (overrides global default)
  • redact_value (optional): Replacement value for redact strategy (default: "[REDACTED]")
  • anon_octets (optional): Number of IPv4 octets to mask for anonymize_ip strategy (default: 0)
  • anon_segments (optional): Number of IPv6 segments to mask for anonymize_ip strategy (default: 0)

Supports both TSV columns (e.g., user_id, user_ipaddress) and schema property paths (e.g., com.vendor.schema.field).

Type: object