opensnowcat
This component is experimental and therefore subject to change or removal outside of major version releases.
Processes OpenSnowcat/Snowplow enriched TSV events. Convert enriched TSV to flattened JSON, filter events, and transform sensitive fields for privacy compliance.
Introduced in version 1.12.0.
- Common
- Advanced
# Common config fields, showing default values
label: ""
opensnowcat:
output_format: tsv
# All config fields, showing default values
label: ""
opensnowcat:
output_format: tsv
filters:
drop: {} # No default (optional)
transform:
salt: "" # No default (optional)
hash_algo: SHA-256
fields: {} # No default (optional)
This processor provides comprehensive event processing capabilities:
Features
Format Conversion
- Convert enriched TSV to flattened JSON with automatic context extraction
- Maintain TSV format for OpenSnowcat/Snowplow downstream compatibility
Event Filtering
- Drop events based on field values (IP addresses, user agents, etc.)
- Filter by schema property paths in contexts, derived_contexts, and unstruct_event
- OR logic: event is dropped if ANY filter matches
Field Transformations
Transform sensitive fields for PII compliance and privacy:
- hash: Hash field values using configurable algorithms (MD5, SHA-1, SHA-256, SHA-384, SHA-512) with salt
- redact: Replace field values with a fixed string (e.g., "[REDACTED]")
- anonymize_ip: Mask IP addresses while preserving network information (supports both IPv4 and IPv6)
- IPv4: Mask last N octets using
anon_octets
parameter - IPv6: Mask last N segments using
anon_segments
parameter
- IPv4: Mask last N octets using
All transformations support both direct TSV columns and schema property paths.
Examples
- TSV > JSON
- Filter IP
- Schema Filter
- Transform
- Advanced Transforms
- Combined
Converts OpenSnowcat/Snowplow enriched TSV events to flattened JSON format, extracting all contexts, derived contexts, and unstruct events into top-level fields.
pipeline:
processors:
- opensnowcat:
output_format: json
Filters out events from IP addresses while maintaining TSV format.
pipeline:
processors:
- opensnowcat:
output_format: tsv
filters:
drop:
user_ipaddress:
contains: ["127.0.0.1", "192.168.", "10.0."]
Filters events based on schema property values (without version). The processor automatically searches contexts, derived_contexts, and unstruct_event fields for matching vendor, schemas and property name (case sensitive).
pipeline:
processors:
- opensnowcat:
output_format: tsv
filters:
drop:
com.snowplowanalytics.snowplow.ua_parser_context.useragentFamily:
contains: ["Chrome", "Firefox"]
user_ipaddress:
contains: ["10.0."]
Transforms sensitive fields using various strategies: hash user identifiers, anonymize IP addresses, and redact network identifiers. Perfect for GDPR and privacy compliance.
pipeline:
processors:
- opensnowcat:
output_format: json
filters:
transform:
salt: "your-secret-salt-here"
hash_algo: SHA-256
fields:
user_id:
strategy: hash
user_ipaddress:
strategy: anonymize_ip
anon_octets: 2
anon_segments: 3
network_userid:
strategy: redact
redact_value: "[REDACTED]"
Combines multiple transformation strategies with field-specific configurations. Uses different hash algorithms for different fields and supports both IPv4 and IPv6 anonymization.
pipeline:
processors:
- opensnowcat:
output_format: tsv
filters:
transform:
salt: "global-default-salt"
hash_algo: SHA-256
fields:
user_id:
strategy: hash
hash_algo: SHA-512
salt: "user-specific-salt"
user_ipaddress:
strategy: anonymize_ip
anon_octets: 2
anon_segments: 4
domain_userid:
strategy: hash
network_userid:
strategy: redact
redact_value: "REDACTED"
user_fingerprint:
strategy: hash
hash_algo: MD5
Drops unwanted events while transforming sensitive fields in the remaining events. Useful for processing only relevant data while maintaining privacy.
pipeline:
processors:
- opensnowcat:
output_format: json
filters:
drop:
user_ipaddress:
contains: ["127.0.0.1", "10.0.", "192.168."]
com.snowplowanalytics.snowplow.ua_parser_context.useragentFamily:
contains: ["bot", "crawler", "spider"]
transform:
salt: "production-salt-v1"
hash_algo: SHA-256
fields:
user_id:
strategy: hash
user_ipaddress:
strategy: anonymize_ip
anon_octets: 2
anon_segments: 3
Fields
output_format
Output format for processed events.
Type: string
Default: "tsv"
Option | Summary |
---|---|
json | Convert enriched TSV to flattened JSON with contexts, derived_contexts, and unstruct_event automatically flattened into top-level objects. |
tsv | Maintain enriched TSV format without conversion. |
filters
Filter and transformation configurations
Type: object
filters.drop
Map of field names to filter criteria. Events matching ANY criteria will be dropped (OR logic). Supports both regular TSV columns (e.g., user_ipaddress
, useragent
) and schema property paths (e.g., com.snowplowanalytics.snowplow.ua_parser_context.useragentFamily
). Each filter uses 'contains' for substring matching.
Type: object
filters.transform
Field transformation configuration for anonymization, hashing, and redaction
Type: object
filters.transform.salt
Global default salt for hashing operations. Can be overridden per field.
Type: string
filters.transform.hash_algo
Global default hash algorithm. Can be overridden per field.
Type: string
Default: "SHA-256"
Options: MD5
, SHA-1
, SHA-256
, SHA-384
, SHA-512
.
filters.transform.fields
Map of field names to transformation configurations. Each field must specify:
- strategy (required): Transformation type - "hash", "redact", or "anonymize_ip"
- hash_algo (optional): Algorithm for hash strategy - "MD5", "SHA-1", "SHA-256", "SHA-384", "SHA-512" (overrides global default)
- salt (optional): Salt for hash strategy (overrides global default)
- redact_value (optional): Replacement value for redact strategy (default: "[REDACTED]")
- anon_octets (optional): Number of IPv4 octets to mask for anonymize_ip strategy (default: 0)
- anon_segments (optional): Number of IPv6 segments to mask for anonymize_ip strategy (default: 0)
Supports both TSV columns (e.g., user_id, user_ipaddress) and schema property paths (e.g., com.vendor.schema.field).
Type: object