Heuristics for choosing Google Cloud persistence layer services
Are you confused by the multitude of Google Cloud persistence layer options available to you when designing or migrating your applications to Google Cloud? If so, it might be helpful to know a few heuristics (rules of thumb) to quickly narrow down your options and choose the right option that will reduce cost and complexity.
There are several options available to you for storing your data in Google Cloud.
- Google Cloud Storage
- Google Cloud SQL
- Google Cloud Spanner
- Google Bigtable
- Google BigQuery
- Google Firestore
- Google Memorystore
You could use a flow chart like this one to decide which storage option you could choose. But understanding WHY (teach a man to fish) will arm you with more tools to decide for yourself when the decision is not as black and white.
is an object storage that is
- Global
- Scalable
- Highly durable (eleven 9 i.e. 99.999999999% annual durability)
Heuristics, why would you choose Google Cloud Storage). Think of files. It’s easier to think this if you have mp3, jpeg, csv etc. Taking it a little further, let’s say you want to process micro batches it might be useful to think those micro batches as files and you could then choose Google Cloud Storage. So if you could configure or transform your data as files then this could be a good option as it is the least expensive and highest throughput storage option available in Google Cloud.
One caveat, Google Cloud storage is different than block storage. Block storage can be mounted on to a virtual machine and files could be accessed (e.g. open file, read file etc.). On the other hand, Google Cloud storage provides numerous APIs and client libraries to interact with them.
Relational database that is
- Fully managed
- MySQL
- PostgreSQL
- SQL Server
Heuristics, this is very simple. If you want OLTP and ACID transaction support then this the option. Constraints to remember are
Relational database that is
- Horizontally scalable
- Globally distributed
This is the option to choose if the constraints of Cloud SQL is not acceptable. The constraints on using Google Spanner are
NoSQL database that is
- Fully managed
- Horizontally scalable
Why would you use it? Think of Hashtables. As mentioned in the original Google Bigtable paper, it is used for Google Search and numerous other critical services Google provides. The data model is
- Sparse
- Distributed
- Persistent
- Multi-dimensional
- Sorted map
Does that sound like sparse Adjacency matrix? If you can model your data (or if it already is) in adjacency matrix format and you want an efficient storage and retrieval system then this is the tool for you.
Data warehouse solution that is
- Serverless
- Horizontally scalable
- Cost effective
This is a simple choice as well. If you have data warehouse and or online analytic processing (OLAP) needs this is the default choice for you.
NoSQL document database that is
- Fully managed
- Serverless
- Cloud native
Why will you use it? Think of configuration files or lookup tables. If you need
- Tree structure (not relational)
- Schema changes
- ACID transaction
Then it is better to choose Firestore.
In memory service that is
- Scalable
- Secure
- Highly available
Why would you use it? If you need to cache then this is the option for you.
The above mentions are just rules of thumb around storage options choices and what they are intended for. But real life production applications are not as black and white and most times fall between these choices.
The intention here is to help you see for yourself what your application’s needs, looks, or feels like. Hopefully you can choose the most cost effective and most suitable persistent layer for your next application. Cheers!