When we move from on-prem to the cloud world, storage options can be overwhelming & selecting right storage option for your particular use case can be bit time-consuming task, hence having a clear picture of all the options with the use cases and alternatives can build a strong foundation in deciding which storage to choose based on requirements.
In this article, I will attempt to provide a clear picture of all the storage options on Google Cloud with its common use cases.
If we can divide storage based on what type of data we will store into GCP then it will be of two types:
- Storage Options for Structured Data
- Storage Options for Unstructured Data
For unstructured data, we can either use Block storage or Object Storage.
Based on our use case we can choose a suitable storage type.
If we choose Block storage then storage type can be Persistent Disk or Local Disk. In Object storage case we have Google Cloud Storage as our storage type.
- Persistent Disk is basically tied to Google compute instance.
- It has size limitation of 64 TB. but it has to be allocated in advance . its not pay as you go model.
- Persistent Disk can HDD or SSD based on cost and performance requirements.
- Persistent Disk can be regional(ex-US central, US west, etc.) or zonal(US central a,b,c etc.) .
Google Cloud Storage(GCS)
- GCS doesn’t tie to VM and can be used as the storage layer for many use cases.
- It’s infinitely scalable so no size restriction.
- Its Pay as you go, model, that means you only pay for what you store.
- GCS bucket can be regional or global.
Okay, but when to use what?
In case if you are using a compute engine and each VM needs local storage then better to go with SSD or HDD, but in scenarios where you need global access of data, GCS should be the choice.
In most of the scenarios, you might want to leverage the combination of both based on the data type.
Structure data can be stored based on their usage.
If the requirement is to choose storage for Online Transaction Processing(OLTP) systems then we have the following options:
Cloud SQL :
- Cloud SQL is basically managed Mysql, Postgresql, or MS SQL server on GCP.
- It’s best suitable as a database for online transaction processing systems(for example, financial transaction system, e-commerce sales, travel reservation system).
- Cloud Spanner is google proprietary database which suitable for online transaction processing.
- It’s a globally distributed database system which has very high availability SLA (99.999% means yearly 5 mins downtime).
- It’s horizontally scalable with high read-write performance.
Okay, but when to use what?
Cloud Spanner is built for a very niche use case. If you have a massive amount of data that has to span across the globe with high performance then Cloud Spanner is the choice. Otherwise, Cloud SQL should be the choice.Also, Cloud Spanner is costly compare to Cloud SQL, so choose wisely.
If the requirement is to choose storage for Online Analytical Processing Systems then we have the following options:
- BigQuery is a data warehouse solution on GCP.
- We can store petabyte data and query and analyze using SQL within minutes.
- We pay for the amount of data processed by per query.
- Many Business Intelligence tool has connectors for BigQuery so we can connect without much trouble.
- BigTable is another offering as a database for analytical use cases.
- BigTable is a NoSQL database built upon googles proprietary Distributed File System called Colossus.
- BigTable can be compared with Open source HBase which is a distributed database built on top of the Hadoop Distributed File System(HDFS).
- Just to mention BigTable or HBase doesn’t provide you SQL interface to query database since BigTable is not SQL database.
- BigTable is horizontally distributed NoSQL database, used for low latency use cases. Google Map, Gmail, Youtube uses BigTable internally.
Okay, when to use what?
BigQuery is more suitable when we need SQL interface to perform analytical Query on underlying data storage . This could be the use case for business intelligence where data from various system are stored in BugQuery data warehouse and connect with business intelligence tools like Tableau or Looker to analyze and build dashboards. On the other hand if the NoSQL database with high scalability and throughput for key-value data is something we need then BigTable is a choice. BigTable is a low latency database and suitable when SLA is very high.
I hope this blog was helpful. I appreciate your time. Thank you for reading.