aws_s3

EXPERIMENTAL

This component is experimental and therefore subject to change or removal outside of major version releases.

Performs an S3 GetObject operation using the bucket + key provided in the config and replaces the original message parts with the content retrieved from S3.

Introduced in version 1.4.0.

Common
Advanced

# Common config fields, showing default values
label: ""
aws_s3:
  bucket: "" # No default (required)
  key: "" # No default (required)
  force_path_style_urls: false
  scanner:
    to_the_end: {}

# All config fields, showing default values
label: ""
aws_s3:
  bucket: "" # No default (required)
  key: "" # No default (required)
  force_path_style_urls: false
  delete_objects: false
  region: ""
  endpoint: ""
  credentials:
    profile: ""
    id: ""
    secret: ""
    token: ""
    from_ec2_role: false
    role: ""
    role_external_id: ""
  scanner:
    to_the_end: {}

This aws_s3 processor is offered as an alternative to streaming-objects-on-upload-with-sqs.

This aws_s3 processor may be preferable to the aws_s3 input with the field sqs in the following situations:

You require data from the SQS message as well the S3 Object Data
You are using a similar pattern to the streaming-objects-on-upload-with-sqs but with a different queue technology such as RabbitMQ
You need to access some data from S3 as part of the processing stage of your config

In order to keep the original payload instead of replacing it entirely, you can use the branch processor.

Scanner

Note that this processor is odd because it has a scanner field. This means that depending on the scanner used, it can change the amount of messages. Therefore if you plan on using it inside a branch processor, you would need a scanner that doesn't alter the number of messages, such as the default to_the_end scanner.

Metadata

This input adds the following metadata fields to each message:

- s3_key
- s3_bucket
- s3_last_modified_unix
- s3_last_modified (RFC3339)
- s3_content_type
- s3_content_encoding
- s3_content_length
- s3_version_id
- All user defined metadata

You can access these metadata fields using function interpolation. Note that user defined metadata is case insensitive within AWS, and it is likely that the keys will be received in a capitalized form, if you wish to make them consistent you can map all metadata keys to lower or uppercase using a Bloblang mapping such as meta = meta().map_each_key(key -> key.lowercase()).

Examples

Amazon SQS Extended Client Library

This example shows how to create a Bento config that will allow connecting to SQS queues that are using the Amazon SQS Extended Client Library

input:
  label: sqs_extended_client

  aws_sqs:
    url: https://sqs.${AWS_REGION}.amazonaws.com/${AWS_ACCOUNT_ID}/sqs-extended-client-queue

  processors:
    - switch: 
        # check it's a large message:
        - check: this.0 == "com.amazon.sqs.javamessaging.MessageS3Pointer" 
        # use the aws_s3 processor to download it 
          processors:
            - aws_s3:
                bucket: ${! this.1.s3BucketName }
                key: ${! this.1.s3Key }

Fields

`bucket`

The bucket to perform the GetObject operation on. This field supports interpolation functions.

Type: string

`key`

The key of the object you wish to retrive. This field supports interpolation functions.

Type: string

`force_path_style_urls`

Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints.

Type: bool
Default: false

`delete_objects`

Whether to delete downloaded objects from the bucket once they are processed. Note: the S3 Object will be deleted from AWS as soon as this processor has consumed the object.

Type: bool
Default: false
Requires version 1.5.0 or newer

`region`

The AWS region to target.

Type: string
Default: ""

`endpoint`

Allows you to specify a custom endpoint for the AWS API.

Type: string
Default: ""

`credentials`

Optional manual configuration of AWS credentials to use. More information can be found in this document.

Type: object

`credentials.profile`

A profile from ~/.aws/credentials to use.

Type: string
Default: ""

`credentials.id`

The ID of credentials to use.

Type: string
Default: ""

`credentials.secret`

The secret for the credentials being used.

Secret

This field contains sensitive information that usually shouldn't be added to a config directly, read our secrets page for more info.

Type: string
Default: ""

`credentials.token`

The token for the credentials being used, required when using short term credentials.

Type: string
Default: ""

`credentials.from_ec2_role`

Use the credentials of a host EC2 machine configured to assume an IAM role associated with the instance.

Type: bool
Default: false
Requires version 1.0.0 or newer

`credentials.role`

A role ARN to assume.

Type: string
Default: ""

`credentials.role_external_id`

An external ID to provide when assuming a role.

Type: string
Default: ""

`scanner`

The scanner by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the csv scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.

Type: scanner
Default: {"to_the_end":{}}

Scanner​

Metadata​

Examples​

Fields​

bucket​

key​

force_path_style_urls​

delete_objects​

region​

endpoint​

credentials​

credentials.profile​

credentials.id​

credentials.secret​

credentials.token​

credentials.from_ec2_role​

credentials.role​

credentials.role_external_id​

scanner​

Scanner

Metadata

Examples

Fields

`bucket`

`key`

`force_path_style_urls`

`delete_objects`

`region`

`endpoint`

`credentials`

`credentials.profile`

`credentials.id`

`credentials.secret`

`credentials.token`

`credentials.from_ec2_role`

`credentials.role`

`credentials.role_external_id`

`scanner`