AWS vs Azure vs Google Cloud Platform - Database
Choosing the right cloud platform provider can be a daunting task. Take the big three, AWS, Azure, and Google Cloud Platform; each offer a huge number of products and services, but understanding how they enable your specific needs is not easy. Since most organisations plan to migrate existing applications it is important to understand how these systems will operate in the cloud. Through our work helping customers move to the cloud we have compared all three provider's offerings in relation to three typical migration scenarios:
- Lift and shift - the cloud service can support running legacy systems with minimal change
- Consume PaaS services - the cloud offering is a managed service that can be consumed by existing solutions with minimal architectural change
- Re-architect for cloud - the cloud technology is typically used in solution architectures that have been optimised for cloud
Choosing the right strategy will depend on the nature of the applications being migrated, the business landscape and internal constraints.
In this series, we're comparing cloud services from AWS, Azure and Google Cloud Platform. A full breakdown and comparison of cloud providers and their services are available in this handy poster.
We have grouped all services into 10 categories:
- Compute
- Storage & Content Delivery
- Database
- Analytics & Big Data
- Internet of Things
- Mobile Services
- Networking
- Security & Identity
- Management & Monitoring
- Hybrid
In this post we are looking at...
Database
For many, migrating applications to the cloud will include a database of one form or another. As we have seen in our compute comparison, all three cloud providers can run database software on virtual server infrastructure, so this is always an option. It worth checking your software terms and conditions if you wish to bring your own existing licences, though. Both Azure Marketplace and AWS Marketplace have a good selection of pre-configured database images from a variety of vendors, often coming with a pay-as-you-go licence. Google's marketplace Launcher, in comparison, has a somewhat limited range of database images to choose from.
Things start to get a bit more interesting when you compare platform-as-a-service database offerings, and that is going to be the main focus for this post.
AWS
Amazon's Relational Database Service (RDS) offers a range of managed databases including SQL Server, MySQL, ProgreSQL, Oracle and MariaDB. RDS takes care of provisioning, patching and day to day maintenance so can be an attractive step up from running your own instance in EC2. It's worth noting that many of these databases were originally designed to run on-premise so lack many of the features of native cloud solutions. In addition, RDS also offers Aurora, a high performance, MySQL-compatible database designed for the cloud. RDS comes in a range of instance types offering up to 40 vCPUs and 244 GB of memory. It relies on Elastic Block Store for data and log storage and offers a 'dedicated' IOPS option. All database types support multi-zone replication, with MySQL, MariaDB, PostgreSQL and Aurora supporting cross region read replicas.
Database Migration Service allows you to move on-premise relational data to the cloud. This works by setting up replica of an on-premise database in AWS thus reducing the amount of downtime required to migrate applications. Database Migration Service can also replicate to different target database platforms.
DynamoDB is a fully managed NoSql database offering high scale, low cost document and key-value storage. DynamoDB supports primary indexes and multiple secondary indexes on documents up to 400 KB in size. The service automatically replicates data across three facilities in a single AWS Region. DynamoDB reads are eventually consistent by default with an option for strongly consistent reads where required. Updates are not atomic but there is a client library that provides support for ACID transactions.
DynamoDB Streams allows developers to work with a time-ordered sequence of events raised as changes are made to the database. This can be used to enable cross region replication (via a client library), notifications and usage analysis.
An alternative low cost NoSql option is SimpleDB. SimpleDB offers document storage with automatic indexing but comes with a storage limitation of 10GB per domain (collection of items) and a limited request rate of under 25 writes/second. Increased scale and performance can be achieved by partitioning data over multiple domains. Data stored in a domain will reside in a single region with replicas being maintained across multiple locations for high availability.
Redshift is Amazon's fully managed petabyte scale data warehouse. To achieve this scale Redshift is based on a massively parallel processing architecture (MPP) that distributes query execution across multiple nodes. Standard SQL-based BI tools are supported through ODBC and JDBC connections. It is possible to create a Redshift cluster with up to 128 nodes with pricing based on total node compute hours. Replicas are automatically maintained across the nodes in a cluster with snapshot backups stored in S3.
Where fast in-memory caching is required, AWS provides ElastiCache. ElastiCache can run either a Memcached or Redis backing store.
Azure
SQL Database is a fully managed relational database based on SQL Server. Most migrations from on-premise SQL Server to SQL Database will require few, if any, changes. Azure SQL Database has matured into a fully featured cloud database offering active geo-replication (creating up to 4 online secondaries which can be used for queries and failover) and fully automatic backups with point-in-time restore. Microsoft have recently added Elastic database pools, which allow organisations to optimise cloud costs by running multiple databases against the same set of resources thus maximising utilisation and reducing costs. Elastic database pools can work well for database-per-tenant scenarios. Microsoft have also released Stretch Database which allows on-premise SQL Server instances to transparently store portions of data in Azure SQL Database. For full replication of on-premise data to the cloud Azure SQL Data Sync can be used. Like AWS Database Migration Service this can be useful to reduce downtime during migrations.
Microsoft defines database performance characteristics in terms of Database Transaction Units (DTUs), the more DTUs the better performance. SQL Database is offered in 3 tiers, the most powerful P15 tier offers 1TB storage and 4000 DTUs. This measure can be a little confusing at first, and requires some benchmarking before settling on the most suitable tier. For now, just think of a P15 as a really big box.
Azure has a number of NoSQL options. Cosmos DB is Azure's flagship high performance, highly available database. CosmosDB supports multiple database APIs, including document (SQL), graph (Gremlin), Mongo DB, Cassandara, Etcd and key value pair (table) stores. The service supports automatic index creation (no need to define secondary indexes) and comes with a comprehensive set of consistency levels.
Cosmos DB will automatically replicate data to any supported region and it is possible to define connection policies that ensure clients connect to the nearest available instance. Cosmos DB document store supports server side stored procedures, triggers and user-defined functions written in JavaScript, the only document database of the three providers to support this. Cross-document atomic updates can be implemented via stored procedures and triggers. CosmosDB is designed to scale linearly to meet the needs of the application. Data is partitioned into collections with each collection being assigned a number of Request Units (RU). RUs represent a measure of resources required to perform a database operation. Assigning more RUs to a collection will provide more performance.
Table Storage is a simple low cost key/value store. It is part of Azure Storage and offers the same high availability options as blob storage from local redundancy up to read-access geo-redundancy. Data is organised into tables, partitions and rows with data being indexed by a partition key and row key combination. Atomic updates are supported for entities in the same partition.
Azure Synapse Analytics is a complete data analytics platform service. It is included here as it includes Azure Synapse SQL Pools, formally SQL Data Warehouse. Synapse SQL Pools is Microsoft's warehouse-as-a-service, built on a massively parallel processing architecture (MPP). Developers can use the SQL Server tools, including T-SQL, they are familiar with. Like its relational cousin, Synapse SQL Pools is measured using its own performance measurement units called Data Warehouse Units (DWU). Using PolyBase SQL Data Warehouse also allows users to query, combine and analyse data across a range of stores such as Azure Data Lake Store, Blob Storage and HDFS (HDInsight).
For high performance caching, Azure offers Azure Redis Cache.
Google Cloud Platform
Cloud SQL is a managed MySQL database. The second generation version of Cloud SQL (currently in beta) supports MySQL 5.6 and 5.7 and promises significant performance improvements over the first generation. It also adds point-in-time restore and instant backups. Available instances sizes range from 10 GB to an impressive 10 TB with up 16 vCPUs and 104 GB of RAM. Automatic zone redundancy comes built-in. Organisations migrating from a standard MySQL database to Cloud SQL should be aware of the differences between the two platforms.
Cloud Datastore is Google's general purpose NoSQL document database. Cloud Datastore automatically indexes each property for each type of entity and allows composite indexes to be defined via a yaml configuration file. Cloud Datastore provides strong consistency for entities that have been looked up by key or by ancestor, all other queries are eventually consistent. Acid transactions are supported and data is automatically replicated across multiple data centers within a single region.
If you are processing very large scale workloads then Bigtable may be a better NoSQL store. Bigtable is a battle hardened cloud scale database which has been used to drive many of Googles own products. Having an Apache HBase API, Bigtable can naturally integrate with Hadoop and other big data products. It comes with options for SSD and HDD backed storage, with SSD providing 10,000 queries per second on a single node cluster with a linear improvement in performance as nodes are added. There are by design a few limitations to be aware of: Bigtable only supports one index per table, updates are only atomic at row level and there is no built in replication across zones or regions.
Big Queryis Google's fully managed, petabyte scale data warehouse. Data is loaded either through a job or via a streaming API. Streaming used for near realtime analytics or to stream data to others regions. Customer pay for the storage they use, the number of queries and the number of streaming inserts. Data held in Big Query is automatically replicated across data centers in a single region.
Caching is provided memcache, part of the App Engine product.
Conclusion
There are certainly plenty of choices when it comes to databases in the cloud. All three providers boast impressive relational, no-sql and petabyte scale data warehouse offerings. RDS supports an impressive range of managed relational stores while Azure SQL Database is probably the most advanced managed relational database available today. With more workloads moving to the cloud the need for NoSQL databases will become ever more important, and again all providers have a good range of options to satisfy most performance/cost requirements. Of all the NoSQL products on offer it's hard not to be impressed by DocumentDB; once again Microsoft is showing the rest how PaaS should be done. Azure also has the best out-of-the-box support for cross-region geo-replication across its database offerings.
Finally, if choosing a provider isn't hard enough, your choices don't end there. Many solution architectures are adopting polyglot persistence in order to optimise for features, scale, and cost. This approach is often employed during cloud migrations.
Next, we will be looking at Analytics & Big Data.