aws_s3
This component is experimental and therefore subject to change or removal outside of major version releases.
Performs an S3 GetObject operation using the bucket
+ key
provided in the config and replaces the original message parts with the content retrieved from S3.
Introduced in version 1.4.0.
- Common
- Advanced
# Common config fields, showing default values
label: ""
aws_s3:
bucket: "" # No default (required)
key: "" # No default (required)
force_path_style_urls: false
scanner:
to_the_end: {}
# All config fields, showing default values
label: ""
aws_s3:
bucket: "" # No default (required)
key: "" # No default (required)
force_path_style_urls: false
delete_objects: false
region: ""
endpoint: ""
credentials:
profile: ""
id: ""
secret: ""
token: ""
from_ec2_role: false
role: ""
role_external_id: ""
scanner:
to_the_end: {}
This aws_s3
processor is offered as an alternative to streaming-objects-on-upload-with-sqs.
This aws_s3
processor may be preferable to the aws_s3
input with the field sqs
in the following situations:
- You require data from the SQS message as well the S3 Object Data
- You are using a similar pattern to the streaming-objects-on-upload-with-sqs but with a different queue technology such as RabbitMQ
- You need to access some data from S3 as part of the processing stage of your config
Scanner
Note that this processor is odd because it has a scanner field.
This means that depending on the scanner used, it can change the amount of messages.
Therefore if you plan on using it inside a branch processor, you would need a scanner that doesn't alter the number of messages, such as the default to_the_end
scanner.
Metadata
This input adds the following metadata fields to each message:
- s3_key
- s3_bucket
- s3_last_modified_unix
- s3_last_modified (RFC3339)
- s3_content_type
- s3_content_encoding
- s3_content_length
- s3_version_id
- All user defined metadata
You can access these metadata fields using function interpolation. Note that user defined metadata is case insensitive within AWS, and it is likely that the keys will be received in a capitalized form, if you wish to make them consistent you can map all metadata keys to lower or uppercase using a Bloblang mapping such as meta = meta().map_each_key(key -> key.lowercase())
.
Examples
- Amazon SQS Extended Client Library
This example shows how to create a Bento config that will allow connecting to SQS queues that are using the Amazon SQS Extended Client Library
input:
label: sqs_extended_client
aws_sqs:
url: https://sqs.${AWS_REGION}.amazonaws.com/${AWS_ACCOUNT_ID}/sqs-extended-client-queue
processors:
- switch:
# check it's a large message:
- check: this.0 == "com.amazon.sqs.javamessaging.MessageS3Pointer"
# use the aws_s3 processor to download it
processors:
- aws_s3:
bucket: ${! this.1.s3BucketName }
key: ${! this.1.s3Key }
Fields
bucket
The bucket to perform the GetObject operation on.
Type: string
key
The key of the object you wish to retrive.
Type: string
force_path_style_urls
Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints.
Type: bool
Default: false
delete_objects
Whether to delete downloaded objects from the bucket once they are processed.
Type: bool
Default: false
Requires version 1.5.0 or newer
region
The AWS region to target.
Type: string
Default: ""
endpoint
Allows you to specify a custom endpoint for the AWS API.
Type: string
Default: ""
credentials
Optional manual configuration of AWS credentials to use. More information can be found in this document.
Type: object
credentials.profile
A profile from ~/.aws/credentials
to use.
Type: string
Default: ""
credentials.id
The ID of credentials to use.
Type: string
Default: ""
credentials.secret
The secret for the credentials being used.
This field contains sensitive information that usually shouldn't be added to a config directly, read our secrets page for more info.
Type: string
Default: ""
credentials.token
The token for the credentials being used, required when using short term credentials.
Type: string
Default: ""
credentials.from_ec2_role
Use the credentials of a host EC2 machine configured to assume an IAM role associated with the instance.
Type: bool
Default: false
Requires version 1.0.0 or newer
credentials.role
A role ARN to assume.
Type: string
Default: ""
credentials.role_external_id
An external ID to provide when assuming a role.
Type: string
Default: ""
scanner
The scanner by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the csv
scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.
Type: scanner
Default: {"to_the_end":{}}