gcp_bigquery_select
This component is mostly stable but breaking changes could still be made outside of major version releases if a fundamental problem with the component is found.
Executes a SELECT
query against BigQuery and creates a message for each row received.
Introduced in version 1.0.0.
# Config fields, showing default values
input:
label: ""
gcp_bigquery_select:
project: "" # No default (required)
table: bigquery-public-data.samples.shakespeare # No default (required)
columns: [] # No default (required)
where: type = ? and created_at > ? # No default (optional)
auto_replay_nacks: true
job_labels: {}
priority: ""
args_mapping: root = [ "article", now().ts_format("2006-01-02") ] # No default (optional)
prefix: "" # No default (optional)
suffix: "" # No default (optional)
Once the rows from the query are exhausted, this input shuts down, allowing the pipeline to gracefully terminate (or the next input in a sequence to execute).
Examples
- Word counts
Here we query the public corpus of Shakespeare's works to generate a stream of the top 10 words that are 3 or more characters long:
input:
gcp_bigquery_select:
project: sample-project
table: bigquery-public-data.samples.shakespeare
columns:
- word
- sum(word_count) as total_count
where: length(word) >= ?
suffix: |
GROUP BY word
ORDER BY total_count DESC
LIMIT 10
args_mapping: |
root = [ 3 ]
Fields
project
GCP project where the query job will execute.
Type: string
table
Fully-qualified BigQuery table name to query.
Type: string
# Examples
table: bigquery-public-data.samples.shakespeare
columns
A list of columns to query.
Type: array
where
An optional where clause to add. Placeholder arguments are populated with the args_mapping
field. Placeholders should always be question marks (?
).
Type: string
# Examples
where: type = ? and created_at > ?
where: user_id = ?
auto_replay_nacks
Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to false
these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.
Type: bool
Default: true
job_labels
A list of labels to add to the query job.
Type: object
Default: {}
priority
The priority with which to schedule the query.
Type: string
Default: ""
args_mapping
An optional Bloblang mapping which should evaluate to an array of values matching in size to the number of placeholder arguments in the field where
.
Type: string
# Examples
args_mapping: root = [ "article", now().ts_format("2006-01-02") ]
prefix
An optional prefix to prepend to the select query (before SELECT).
Type: string
suffix
An optional suffix to append to the select query.
Type: string