Google Cloud Platform Tutorial
Google Cloud Platform Tutorial
If you do not understand some parts of it, you can go back to the
relevant sections. And if that is not enough, visit the links to the
documentation that I have provided.
At the end of the day, you are not paid just for what you know but
for your thought process and the decisions you make. That is why it
is vitally important that you exercise this skill.
At the end of the article, I'll provide more resources and next steps
if you want to continue learning about GCP.
Preemptible VM's
You can use preemptible virtual machines to save up to 80% of
your costs. They are ideal for fault-tolerant, non-critical
applications. You can save the progress of your job in a persistent
disk using a shut-down script to continue where you left off.
Google may stop your instances at any time (with a 30-second
warning) and will always stop them after 24 hours.
When you are creating your instance in the Google Console, there is
a field to paste your code.
Using the metadata server URL to point your instance to a script
stored in Google Cloud Storage.
This latter is preferred because it is easier to create many instances
and to manage the script.
To estimate your costs, use the Price Calculator. This helps prevent
any surprises with your bills and create budget alerts.
How to manage resources in GCP
In this section, I will explain how you can manage and administer
your Google Cloud resources.
Resource Hierarchy
There are four types of resources that can be managed through
Resource Manager:
You can create super admin accounts that have access to every
resource in your organization. Since they are very powerful, make
sure you follow Google's best practices.
Labels
Labels are key-value pairs you can use to organize your resources in
GCP. Once you attach a label to a resource (for instance, to a virtual
machine), you can filter based on that label. This is useful also to
break down your bills by labels.
Cloud IAM
Simply put, Cloud IAM controls who can do what on which
resource. A resource can be a virtual machine, a database instance,
a user, and so on.
It is important to notice that permissions are not directly assigned to
users. Instead, they are bundled into roles, which are assigned
to members. A policy is a collection of one or more bindings of a
set of members to a role.
Identities
In a GCP project, identities are represented by Google accounts,
created outside of GCP, and defined by an email address (not
necessarily @gmail.com). There are different types:
Cloud Logging
Cloud Logging is GCP's centralized solution for real-time log
management. For each of your projects, it allows you to store,
search, analyze, monitor, and alert on logging data:
Note: you may want to log your billing data for analysis. In this
case, you do not create a sink. You can directly export your reports
to BigQuery.
Cloud Monitoring
Cloud Monitoring lets you monitor the performance of your
applications and infrastructure, visualize it in dashboards,
create uptime checks to detect resources that are down and alert you
based on these checks so that you can fix problems in your
environment. You can monitor resources in GCP, AWS, and even
on-premise.
It is recommended to create a separate project for Cloud Monitoring
since it can keep track of resources across multiple projects.
Alerts
To receive alerts, you must declare an alerting policy. An alerting
policy defines the conditions under which a service is considered
unhealthy. When the conditions are met, a new incident will be
created and notifications will be sent (via email, Slack, SMS,
PagerDuty, etc).
A policy belongs to an individual workspace, which can contain a
maximum of 500 policies.
Trace
Trace helps find bottlenecks in your services. You can use this
service to figure out how long it takes to handle a request, which
microservice takes the longest to respond, where to focus to reduce
the overall latency, and so on.
It is enabled by default for applications running on Google App
Engine (GAE) - Standard environment - but can be used for
applications running on GCE, GKE, and Google App Engine
Flexible.
Error Reporting
Error Reporting will aggregate and display errors produced in
services written in Go, Java, Node.js, PHP, Python, Ruby, or .NET.
running on GCE, GKE, GAP, Cloud Functions, or Cloud Run.
Debug
Debug lets you inspect the application's state without stopping your
service. Currently supported for Java, Go, Node.js and Python. It is
automatically integrated with GAE but can be used on GCE, GKE,
and Cloud Run.
Profile
Profiler that continuously gathers CPU usage and memory-
allocation information from your applications. To use it, you need
to install a profiling agent.
{
"lifecycle":{
"rule":[
{
"action":{
"type":"Delete"
},
"condition":{
"age":30,
"isLive":true
}
},
{
"action":{
"type":"Delete"
},
"condition":{
"numNewerVersions":2
}
},
{
"action":{
"type":"Delete"
},
"condition":{
"age":180,
"isLive":false
}
}
]
}
}
It will be applied through gsutils or a REST API call. Rules can be
created also through the Google Console.
Permissions in GCS
In addition to IAM roles, you can use Access Control Lists (ACLs)
to manage access to the resources in a bucket.
Use IAM roles when possible, but remember that ACLs grant
access to buckets and individual objects, while IAM roles are
project or bucket wide permissions. Both methods work in
tandem.
To grant temporary access to users outside of GCP, use Signed
URLs.
Bucket lock
Bucket locks allow you to enforce a minimum retention
period for objects in a bucket. You may need this for auditing or
legal reasons.
Once a bucket is locked, it cannot be unlocked. To remove, you
need to first remove all objects in the bucket, which you can only
do after they all have reached the retention period specified by the
retention policy. Only then, you can delete the bucket.
You can include the retention policy when you are creating the
bucket or add a retention policy to an existing bucket (it
retroactively applies to existing objects in the bucket too).
Cloud SQL
Cloud SQL provides access to a managed MySQL or PostgreSQL
database instance in GCP. Each instance is limited to a single
region and has a maximum capacity of 30 TB.
Google will take care of the installation, backups, scaling,
monitoring, failover, and read replicas. For availability reasons,
replicas must be defined in the same region but a different zone
from the primary instances.
Cloud Spanner
Cloud Spanner is globally available and can scale (horizontally)
very well.
These two features make it capable of supporting different use cases
than Cloud SQL and more expensive too. Cloud Spanner is not an
option for lift and shift migrations.
Datastore
Datastore is a completely no-ops, highly-scalable document
database ideal for web and mobile applications: game states,
product catalogs, real-time inventory, and so on. It's great for:
Bigtable
Bigtable is a NoSQL database ideal for analytical workloads where
you can expect a very high volume of writes, reads in the
milliseconds, and the ability to store terabytes to petabytes of
information. It's great for:
Financial analysis
IoT data
Marketing data
Bigtable requires the creation and configuration of your nodes (as
opposed to the fully-managed Datastore or BigQuery). You can add
or remove nodes to your cluster with zero downtime. The simplest
way to interact with Bigtable is the command-line tool cbt.
Bigtable's performance will depend on the design of your database
schema.
You can only define one key per row and must keep all the
information associated with an entity in the same row. Think of it as
a hash table.
Tables are sparse: if there is no information associated with a
column, no space is required.
To make reads more efficient, try to store related entities in adjacent
rows.
Since this topic is worth an article on its own, I recommend you
read the documentation.
Memorystore
It provides a managed version of Redis and Memcache (in-memory
databases), resulting in very fast performance. Instances are
regional, like Cloud SQL, and have a capacity of up to 300 GB.
As usual, there are quotas and limits to what you can do in a VPC,
amongst them:
In the next section, I will discuss how to connect your VPC(s) with
networks outside of GCP.
Cloud VPN
Cloud Interconnect
Cloud Peering
Each of them with different capabilities, use cases, and prices that I
will describe in the following sections.
Cloud VPN
With Cloud VPN, your traffic travels through the public internet
over an encrypted tunnel. Each tunnel has a maximum capacity of 3
Gb per second and you can use a maximum of 8 for better
performance. These two characteristics make VPN the cheapest
option.
You can define two types of routes between your VPC and your on-
premise networks:
Static routes. You have to manually define and update them, for
example when you add a new subnet. This is not the preferred
option.
Dynamic routes. Routes are automatically handled (defined and
updated) for you using Cloud Router. This is the preferred option
when BGP is available.
Your traffic gets encrypted and decrypted by VPN Gateways (in
GCP, they are regional resources).
Cloud Interconnect
With Cloud VPN, traffic travels through the public internet. With
Cloud Interconnect, there is a direct physical connection between
your on-premises network and your VPC. This option will be more
expensive but will provide the best performance.
There are two types of interconnect available, depending on how
you want your connection to GCP to materialize:
Dedicated interconnect. There is "a direct cable" connecting your
infrastructure and GCP. This is the fastest option, with a capacity of
10 to 200 Gb per second. However, it is not available everywhere:
at the time of this writing, only in 62 locations in the world.
Partner interconnect. You connect through a service provider.
This option is more geographically available, but the not as fast as a
dedicated interconnects: from 50 Mb per second to 10 Gb per
second.
Cloud Peering
Cloud peering is not a GCP service, but you can use it to connect
your network to Google's network and access services like Youtube,
Drive, or GCP services.
There are different types of load balancers. They differ in the type
of traffic (HTTP vs TCP/UDP - Layer 7 or Layer 4), whether they
handle external or internal traffic, and whether their scope is
regional or global:
Cloud DNS
Cloud DNS is Google's managed Domain Name System
(DNS) host, both for internal and external (public) traffic. It will
map URLs like https://www.freecodecamp.org/ to an IP address. It
is the only service in GCP with 100% SLA - it is available 100% of
the time.
Google Cloud CDN
Cloud DNS is Google's Content Delivery Network. If you have data
that does not change often (images, videos, CSS, etc.) it makes
sense to cache it close to your users. Cloud CDN provides 90 Edges
Point of Presence (POP) to cache the data close to your end-users.
After the first request, static data can be stored in a POP, usually
much closer to your user than your main servers. Thus, in
subsequent requests, you can retrieve the data faster from the POP
and reduce the load on your backend servers.
Local SSD
Local SSDs are attached to a VM to which they provide high-
performance ephemeral storage. As of now, you can attach up to
eight 375GB local SSDs to the same instance. However, this data
will be lost if the VM is killed.
Cloud Storage
We have extensively covered GCS in a previous section. GCS is not
a filesystem, but you can use GCS-Fuse to mount GCS buckets as
filesystems in Linux or macOS systems. You can also let apps
download and upload data to GCS using standard filesystem
semantics.
How to back up your VM's data: Snapshots
Snapshots are backups of your disks. To reduce space, they are
created incrementally:
Images
Images refer to the operating system images needed to create boot
disks for your instances. There are two types of images:
Instance groups
Instance groups let you treat a group of instances as a single unit
and they come in two flavors:
Shielded VMs
Prevent instances from being reached from the public internet
Trusted images to make sure your users can only create disks from
images in specific projects
App Engine
App Engine is a great choice when you want to focus on the code
and let Google handle your infrastructure. You just need to choose
the region where your app will be deployed (this cannot be changed
once it is set). Amongst its main use cases are websites, mobile
apps, and game backends.
You can easily update the version of your app that is running via
the command line or the Google Console.
You can interact with your data in BigQuery using SQL via the
Google Console.
Command-line, running commands like bq query 'SELECT field
FROM ....
REST API.
Code using client libraries.
User-Defined Functions allow you to combine SQL queries with
JavaScript functions to create complex operations.
BigQuery is a columnar data store: records are stored in columns.
Tables are collections of columns and datasets are collections of
tables.
Jobs are actions to load, export, query, or copy data that BigQuery
runs on your behalf.
Views are virtual tables defined by a SQL query and are useful
sharing data with others when you want to control exactly what
they have access to.
Your costs depend on how much data you store and stream into
BigQuery and how much data you query. To reduce costs,
BigQuery automatically caches previous queries (per user). This
behavior can be disabled.
A common pattern is to stream data into Pub/Sub, let's say from IoT
devices, process it in Dataflow, and store it for analysis in
BigQuery.
But Pub/Sub does not guarantee that the order in which messages
are pushed to the topics will be the order in which the messages are
consumed. However, this can be done with Dataflow.
Cloud Dataproc
Cloud Dataproc is Google's managed the Hadoop and Spark
ecosystem. It lets you create and manage your clusters easily and
turn them off when you are not using them, to reduce costs.
Dataproc can only be used to process batch data, while Dataflow
can handle also streaming data.
Reduce costs turning your cluster off when you are not using it.
Leverage Google's infrastructure
Use some preemptible virtual machines to reduce costs
Add larger (SSD) persistent disks to improve performance
BigQuery can replace Hive and BigTable can replace HBase
Cloud Storage replaces HDFS. Just upload your data to GCS and
change the prefixes hdfs:// to gs://
Otherwise, you should choose Cloud Dataflow.
Dataprep
Cloud Dataprep provides you with a web-based interface to clean
and prepare your data before processing. The input and output
formats include, among others, CSV, JSON, and Avro.
After defining the transformations, a Dataflow job will run. The
transformed data can be exported to GCS, BigQuery, etc.
Cloud Composer
Cloud Composer is Google's fully-managed Apache
Airflow service to create, schedule, monitor, and manage
workflows. It handles all the infrastructure for you so that you can
concentrate on combining the services I have described above to
create your own workflows.
Under the hood, a GKE cluster will be created with Airflow in it
and GCS will be used to store files.
AI Platform
AI Platform provides you with a fully-managed platform to use
machine learning libraries like Tensorflow. You just need to focus
on your model and Google will handle all the infrastructure needed
to train it.
After your model is trained, you can use it to get online and batch
predictions.
Cloud AutoML
Google lets you use your data to train their models. You can
leverage models to build applications that are based on natural
language processing (for example, document classification or
sentiment analysis applications), speech processing, machine
translation, or video processing (video classification or object
detection).
How to explore and visualize your data
in GCP
Cloud Data Studio
Data Studio lets you create visualizations and dashboards based
on data that resides in Google services (YouTube Analytics, Sheets,
AdWords, local upload), Google Cloud Platform (BigQuery, Cloud
SQL, GCS, Spanner), and many third-party services, storing your
reports in Google Drive.
Data Studio is not part of GCP, but G-Suite, thus its permissions
are not managed using IAM.
There are no additional costs for using Data Studio, other than the
storage of the data, queries in BigQuery, and so on. Caching can be
used to improve performance and reduce costs.
Cloud Datalab
Datalab lets you explore, analyze, and visualize data in BigQuery,
ML Engine, Compute Engine, Cloud Storage, and Stackdriver.
It is based on Jupyter notebooks and supports Python, SQL, and
Javascript code. Your notebooks can be shared via the Cloud
Source Repository.
Security in GCP
Encryption on Google Cloud Platform
Google Cloud encrypts data both at rest (data stored on disk) and in
transit (data traveling in the network), using AES implemented
via Boring SSL.
You can manage the encryption keys yourself (both storing them in
GCP or on-premise) or let Google handle them.
Encryption at rest
GCP encrypts data stored at rest by default. Your data will be
divided into chunks. Each chunk is distributed across different
machines and encrypted with a unique key, called a data
encryption key (DEK).
Keys are generated and managed by Google but you can also
manage the keys yourself, as we will see later in this guide.
Encryption in Transit
To add an extra security layer, all communications between two
GCP services or from your infrastructure to GCP are encrypted at
one or more network layers. Your data would not be compromised
if your messages were to be intercepted.
The DEKs used to encrypt your data are also encrypted using key
encryption keys (KEKs), in a process called envelope encryption.
By default, KEKs are rotated every 90 days.
Cloud Armor
Cloud Armor protects your infrastructure from distributed denial of
service (DDoS) attacks. You define rules (for example to whitelist
or deny certain IP addresses or CIDR ranges) to create security
policies, which are enforced at the Point of Presence level (closer to
the source of the attack).
Cloud Armor gives you the option of previewing the effects of your
policies before activating them.
You can specify what type of data you're interested in, called info
type, define your own types (based on dictionaries of words and
phrases or based on regex expressions), or let Google use the
default which can be time-consuming for large amounts of data.
For each result, DLP will return the likelihood of that piece of data
matches a certain info type: LIKELIHOOD_UNSPECIFIED,
VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY,
VERY_LIKELY.
Question 1
Your customer is moving their corporate applications to Google
Cloud. The security team wants detailed visibility of all resources in
the organization. You use the Resource Manager to set yourself up
as the Organization Administrator.
Question 2
Your company wants to try out the cloud with low risk. They want
to archive approximately 100 TB of their log data to the cloud and
test the serverless analytics features available to them there, while
also retaining that data as a long-term disaster recovery backup.
Question 3
Your company wants to track whether someone is present in a
meeting room reserved for a scheduled meeting.
B. Have devices poll for connectivity to Cloud SQL and insert the
latest messages on a regular interval to a device-specific table.
Question 4
To reduce costs, the Director of Engineering has required all
developers to move their development infrastructure resources from
on-premises virtual machines (VMs) to Google Cloud.
These resources go through multiple start/stop events during the
day and require the state to persist.
A. Use persistent disks to store the state. Start and stop the VM as
needed.
E. Store all state in a Local SSD, snapshot the persistent disks and
terminate the VM.
Question 5
The database administration team has asked you to help them
improve the performance of their new database server running on
Compute Engine.
Question 6
Your organization has a 3-tier web application deployed in the same
Google Cloud Virtual Private Cloud (VPC).
C. Add tags to each tier and set up routes to allow the desired traffic
flow.
D. Add tags to each tier and set up firewall rules to allow the
desired traffic flow.
Question 7
You are developing an application on Google Cloud that will label
famous landmarks in users’ photos. You are under competitive
pressure to develop a predictive model quickly. You need to keep
service costs low.
B. Build an application that calls the Cloud Vision API. Pass client
image locations as base64-encoded strings.
Question 8
You set up an autoscaling managed instance group to serve web
traffic for an upcoming launch.
You have verified that the appropriate web response is coming from
each instance using the curl command. You want to ensure that the
backend is configured correctly.
Question 9
You created a job that runs daily to import highly sensitive data
from an on-premises location to Cloud Storage. You also set up a
streaming data insert into Cloud Storage via a Kafka node that is
running on a Compute Engine instance.
You need to encrypt the data at rest and supply your own
encryption key. Your key should not be stored in the Google Cloud.
Question 10
You are designing a relational data repository on Google Cloud to
grow as needed. The data will be transactionally consistent and
added from any location in the world. You want to monitor and
adjust node count for input traffic, which can spike unpredictably.
C. Use Cloud Bigtable for storage. Monitor data stored and increase
node count if more than 70% is utilized.
Answers
1. B
2. A, E
3. C
4. A, D
5. C
6. D
7. B
8. C
9. D
10. B
Take a pen and a piece of paper and try to come up with your own
solution based on the services I have described here. If you get
stuck, the following questions might help:
Do not worry if your solution does not look like Google's. This is
just one possible solution. Learning to design complex systems is a
skill that takes a lifetime to master. Luckily, you're headed in the
right direction.
Conclusion
This guide will help you get started on GCP and give you a broad
perspective of what you can do with it.