Previously we designed a simple serverless batch process to extract data from a data warehouse. This design lacked an intuitive way for data consumers to be notified when new data is available. Requiring consumers to do one of the following:

  1. Consume data regardless of when it was last updated
  2. Create logic to identify when new data is available

In this post, we will discuss adding a S3 bucket notification to inform subscribers when new data is available for consumption. This will prevent the need for developing a custom solution to solve this issue on our own.

What is an S3 bucket notification

When an action is taken on an s3 object, an equivalent event is created. The created event can send notifications to downstream applications, enabling an event driven workflow. Read the AWS S3 documentation to learn more about the supported S3 event types.

Understanding our S3 bucket

Before creating our notification, we need to understand the structure of our bucket and identify which events to send notifications for. Understanding the structure of our S3 bucket is critical to the reliability of our notifications and the future use of the bucket itself.

To understand the structure of our bucket we need to answer the following questions:

  • Where in our S3 bucket are we storing the extract data? We must know where we are storing our data prior to creating S3 events. As we create notifications at the bucket level, we can specify a prefix or a file path in order to be more specific. This is important when creating new bucket events, since a single bucket can’t have overlapping rules.
  • What is the naming convention of the objects we are saving? Does every file have the same name or are we adding a unique identifier to each file? What kind of file are we saving (.txt, .csv, .gz, etc.)? We can use the file extension of our extract file as a suffix, similar to how we use prefixes as an extra level of filtering.
  • Are any other applications using the same bucket and/or prefix? If other applications are using the same bucket and/or prefix, how we setup our notifications will potentially be affected. We want to ensure notifications are only being sent when our data is loaded.
  • Are there any existing event notifications? If other notifications exist, then we need to make sure our notifications don’t conflict. Potentially, causing unexpected results. As mentioned earlier, a bucket cannot have overlapping rules. Therefore, if an existing rule conflicts with our new rule we will have to modify our rule or create a new bucket.

Designing our S3 bucket notification

For our example we will create our S3 bucket notification with the following assumptions:

  • Our batch process uploads data to its own folder within our bucket, with the same file name (versioning is enabled).
  • No other applications are using the same prefix, but they will be using the same bucket.
  • There aren’t any existing events associated with our bucket.

With these assumptions, we will now proceed to design our S3 bucket notification. When creating an S3 event using the AWS console, we have to specify a name, the events, a prefix, a suffix, and a destination for our notification. In this case, we will create a notification for all creation events. Our prefix will be the file path in which our batch uploads files (e.g. data-extract). Since our batch process uploads text files, we will add ‘.txt’ as the suffix for our event. Which will ignore any object created in the ‘data-extract’ folder that is not a text file. Finally for our event destination, we will send notifications to a Simple Notification Service (SNS) endpoint. The SNS topic for which we decide to send notifications will be the service our data consumers subscribe to for updates.

Conclusion

With the help of AWS S3 and SNS, we are able to design an event driven process that informs consumers of new data. Adding S3 bucket notifications to our batch process allows us to focus on extracting the necessary data and our consumers to focus on using the data. If you have further questions regarding this addition to our serverless batch design, please do not hesitate to contact us.

About the author