“Data is at the center of every application, process, and business decision,” wrote Swami Sivasubramanian, VP of Database, Analytics, and Machine Learning at AWS, and I couldn’t agree more. A common pattern customers use today is to build data pipelines to move data from Amazon Aurora to Amazon Redshift. These solutions help them gain insights to grow sales, reduce costs, and optimize their businesses.
To help you focus on creating value from data instead of preparing data for analysis, we announced Amazon Aurora zero-ETL integration with Amazon Redshift at AWS re:Invent 2022 and in public preview for Amazon Aurora MySQL-Compatible Edition in June 2023.
Now generally available: Amazon Aurora MySQL zero-ETL integration with Amazon Redshift
Today, we announced the general availability of Amazon Aurora MySQL zero-ETL integration with Amazon Redshift. With this fully managed solution, you no longer need to build and maintain complex data pipelines in order to derive time-sensitive insights from your transactional data to inform critical business decisions.
This zero-ETL integration between Amazon Aurora and Amazon Redshift unlocks opportunities for you to run near real-time analytics and machine learning (ML) on petabytes of transactional data in Amazon Redshift. As this data gets written into Aurora, it will be available in Amazon Redshift within seconds.
It also enables you to run consolidated analytics from multiple Aurora MySQL database clusters in Amazon Redshift to derive holistic insights across many applications or partitions. Amazon Aurora MySQL zero-ETL integration with Amazon Redshift processes over 1 million transactions per minute (an equivalent of 17.5 million insert/update/delete row operations per minute) from multiple Aurora databases and makes them available in Amazon Redshift in less than 15 seconds (p50 latency lag).
Furthermore, you can take advantage of the analytics and built-in ML capabilities of Amazon Redshift, such as materialized views, cross-Region data sharing, and federated access to multiple data stores and data lakes.
Let’s get started
In this article, I’ll highlight some steps along with information on how you can get started easily. I will use my existing Amazon Aurora MySQL serverless database and Amazon Redshift data warehouse.
To get started, I need to navigate to Amazon RDS and select Create zero-ETL integration on the Zero-ETL integrations page.
On the Create zero-ETL integration page, I need to follow a few steps to configure the integration for my Amazon Aurora database cluster and my Amazon Redshift data warehouse.
First, I define an identifier for my integration and select Next.
On the next page, I need to select the source database by selecting Browse RDS databases.
Here, I can select my existing database as the source.
The next step asks me the target Amazon Redshift data warehouse. Here, I have the flexibility to choose the Amazon Redshift Serverless or RA3 data warehouse in my account or in different account. I select Browse Redshift data warehouses.
Then, I choose the target data warehouse.
Because Amazon Aurora needs to replicate into the data warehouse, we need to add an additional resource policy and add the Aurora database as an authorized integration source in the Amazon Redshift data warehouse.
I can solve this by manually updating in the Amazon Redshift console or let Amazon RDS fix it for me. I tick the checkbox.
On the next page, it shows me the changes that Amazon RDS will perform for us. I select Continue.
On the next page, I can configure the tags and also the encryption. By default, zero-ETL integration encrypts your data using AWS Key Management Service (AWS KMS), and I have the option to use my own key.
Then, I need to review all the configurations and select Create zero-ETL integration to create the integration.
After a few minutes, my zero-ETL integration is sucessfully created. Then, I switch to Amazon Redshift, and on the Zero-ETL integrations page, I can see that I have my recently created zero-ETL integration.
Since the integration does not yet have a target database inside Amazon Redshift, I need to create one.
Now the integration configuration is complete. On this page, I can see the integration status is active, and there is one table that has been replicated.
For testing, I create a new table in my Amazon Aurora database and insert a record into this table.
Then I switched to the Redshift query editor v2 inside Amazon Redshift. Here I can make a connection to the database that I formed as part of the integration. By running a simple query, I can see that my data is already available inside Amazon Redshift.
I found this zero-ETL integration very convenient for two reasons. First, I could unify all data from multiple database clusters together and analyze it in aggregate. Second, within seconds of the transactional data being written into Amazon Aurora MySQL, this zero-ETL integration seamlessly made the data available in Amazon Redshift.
Things to know
Availability – Amazon Aurora zero-ETL integration with Amazon Redshift is available in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Paciﬁc (Singapore), Asia Paciﬁc (Sydney), Asia Paciﬁc (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).
Supported Database Engines – Amazon Aurora zero-ETL Integration with Amazon Redshift currently supports MySQL-compatible editions of Amazon Aurora. Support for Amazon Aurora PostgreSQL-Compatible Edition is a work in progress.
Pricing – Amazon Aurora zero-ETL integration with Amazon Redshift is provided at no additional cost. You pay for existing Amazon Aurora and Amazon Redshift resources used to create and process the change data created as part of a zero-ETL integration.
We’re one step closer to helping you focus more on creating value from data instead of preparing it for analysis. To learn more on how to get started, please visit the Amazon Aurora MySQL zero-ETL integration with Amazon Redshift page.