Skip to main content

aws_s3

EXPERIMENTAL

This component is experimental and therefore subject to change or removal outside of major version releases.

Performs an S3 GetObject operation using the bucket + key provided in the config and replaces the original message parts with the content retrieved from S3.

Introduced in version 1.4.0.

# Common config fields, showing default values
label: ""
aws_s3:
bucket: "" # No default (required)
key: "" # No default (required)
force_path_style_urls: false
scanner:
to_the_end: {}

This aws_s3 processor is offered as an alternative to streaming-objects-on-upload-with-sqs.

This aws_s3 processor may be preferable to the aws_s3 input with the field sqs in the following situations:

  • You require data from the SQS message as well the S3 Object Data
  • You are using a similar pattern to the streaming-objects-on-upload-with-sqs but with a different queue technology such as RabbitMQ
  • You need to access some data from S3 as part of the processing stage of your config

Scanner

Note that this processor is odd because it has a scanner field. This means that depending on the scanner used, it can change the amount of messages. Therefore if you plan on using it inside a branch processor, you would need a scanner that doesn't alter the number of messages, such as the default to_the_end scanner.

Metadata

This input adds the following metadata fields to each message:

- s3_key
- s3_bucket
- s3_last_modified_unix
- s3_last_modified (RFC3339)
- s3_content_type
- s3_content_encoding
- s3_content_length
- s3_version_id
- All user defined metadata

You can access these metadata fields using function interpolation. Note that user defined metadata is case insensitive within AWS, and it is likely that the keys will be received in a capitalized form, if you wish to make them consistent you can map all metadata keys to lower or uppercase using a Bloblang mapping such as meta = meta().map_each_key(key -> key.lowercase()).

Examples

This example shows how to create a Bento config that will allow connecting to SQS queues that are using the Amazon SQS Extended Client Library

input:
label: sqs_extended_client

aws_sqs:
url: https://sqs.${AWS_REGION}.amazonaws.com/${AWS_ACCOUNT_ID}/sqs-extended-client-queue

processors:
- switch:
# check it's a large message:
- check: this.0 == "com.amazon.sqs.javamessaging.MessageS3Pointer"
# use the aws_s3 processor to download it
processors:
- aws_s3:
bucket: ${! this.1.s3BucketName }
key: ${! this.1.s3Key }

Fields

bucket

The bucket to perform the GetObject operation on.

Type: string

key

The key of the object you wish to retrive.

Type: string

force_path_style_urls

Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints.

Type: bool
Default: false

delete_objects

Whether to delete downloaded objects from the bucket once they are processed.

Type: bool
Default: false
Requires version 1.5.0 or newer

region

The AWS region to target.

Type: string
Default: ""

endpoint

Allows you to specify a custom endpoint for the AWS API.

Type: string
Default: ""

credentials

Optional manual configuration of AWS credentials to use. More information can be found in this document.

Type: object

credentials.profile

A profile from ~/.aws/credentials to use.

Type: string
Default: ""

credentials.id

The ID of credentials to use.

Type: string
Default: ""

credentials.secret

The secret for the credentials being used.

Secret

This field contains sensitive information that usually shouldn't be added to a config directly, read our secrets page for more info.

Type: string
Default: ""

credentials.token

The token for the credentials being used, required when using short term credentials.

Type: string
Default: ""

credentials.from_ec2_role

Use the credentials of a host EC2 machine configured to assume an IAM role associated with the instance.

Type: bool
Default: false
Requires version 1.0.0 or newer

credentials.role

A role ARN to assume.

Type: string
Default: ""

credentials.role_external_id

An external ID to provide when assuming a role.

Type: string
Default: ""

scanner

The scanner by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the csv scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.

Type: scanner
Default: {"to_the_end":{}}