GCP
GCP
Editorial Board
Spandana Chatterjee
Apress, Pune, Maharashtra, India
Melissa Duffy
Apress, New York, USA
Miriam Haidara
Apress, Dordrecht, The Netherlands
Celestin Suresh John
Apress, Pune, Maharashtra, India
Susan McDermott
Suite 4600, Apress, New York, NY, USA
Aditee Mirashi
Apress, Heidelberg, Baden-Württemberg, Germany
Divya Modi
Apress, Pune, Maharashtra, India
Mark Powers
Apress, New York, NY, USA
Shiva Ramachandran
Apress, New York, NY, USA
James Robinson-Prior
Apress, London, UK
Smriti Srivastava
Apress, Seattle, WA, USA
Dario Cabianca
Dario Cabianca
Georgetown, KY, USA
ISSN 2731-8761e-ISSN 2731-877X
Certification Study Companion Series
ISBN 978-1-4842-9353-9e-ISBN 978-1-4842-9354-6
https://doi.org/10.1007/978-1-4842-9354-6
© Dario Cabianca 2023
This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service
marks, etc. in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate at
the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material
contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
Introduction
This book is about preparing you to pass the Google Cloud Professional
Cloud Network Engineer certification exam and—most importantly—to get
you started for an exciting career as a Google Cloud Platform (GCP)
network engineer.
This book is intended for a broad audience of cloud solution architects (in
any of the three public cloud providers), as well as site reliability, security,
network, and software engineers with foundational knowledge of Google
Cloud and networking concepts. Basic knowledge of the OSI model, the
RFC 1918 (private address space) paper, the TCP/IP, the TLS (or SSL), and
the HTTP protocols is a plus, although it is not required.
I used the official exam guide to organize the content and to present it in a
meaningful way. As a result, the majority of the chapters are structured to
map one to one with each exam objective and to provide detailed coverage
of each topic, as defined by Google. The exposition of the content for most
of the key topics includes a theoretical part, which is focused on conceptual
knowledge, and a practical part, which is focused on the application of the
acquired knowledge to solve common use cases, usually by leveraging
reference architectures and best practices. This approach will help you
gradually set context, get you familiarized with the topic, and lay the
foundations for more advanced concepts.
Given the nature of the exam, whose main objective is to teach you how to
design, engineer, and architect efficient, secure, and cost-effective network
solutions with GCP, I have developed a bias for diagrams, infographic
content, and other illustrative material to help you “connect the dots” and
visually build knowledge.
Another important aspect of the exposition includes the use of the Google
Cloud Command Line Interface (gcloud CLI) as the main tool to solve the
presented use cases. This choice is deliberate, and the rationale about it is
twofold. On the one side, the exam has a number of questions that require
you to know the gcloud CLI commands. On the other side, the alternatives
to the gcloud CLI are the console and other tools that enable Infrastructure
as Code (IaC), for example, HashiCorp Terraform. The former leverages
the Google Cloud user interface and is subject to frequent changes without
notice. The latter is a product that is not in the scope of the exam.
A Google Cloud free account is recommended to make the best use of this
book. This approach will teach you how to use the gcloud CLI and will let
you practice the concepts you learned. Chapter 1 will cover this setup and
will provide an overview of the exam, along with the registration process.
If you want to become an expert on shared Virtual Private Cloud (VPC)
networks, I also recommend that you create a Google Workspace account
with your own domain. Although this is not free, the price is reasonable,
and you will have your own organization that you can use to create multiple
GCP users and manage IAM (Identity and Access Management) policies
accordingly.
In Chapter 2, you will learn the important factors you need to consider to
design the network architecture for your workloads. The concept of a
Virtual Private Cloud (VPC) network as a logical routing domain will be
first introduced, along with a few reference topologies. Other important
GCP constructs will be discussed, for example, projects, folders,
organizations, billing accounts, Identity and Access Management (IAM)
allow policies, and others, to help you understand how to enable separation
of duties—also known as microsegmentation—effectively. Finally, an
overview of hybrid and multi-cloud deployments will be provided to get
you familiarized with the GCP network connectivity products.
Chapter 3 is your VPC “playground.” In this chapter, you’ll use the gcloud
CLI to perform a number of operations on VPCs and their components.
You will learn the construct of a subnetwork, intended as a partition of a
VPC, and you will create, update, delete, and peer VPCs. We will deep dive
in the setup of a shared VPC, which we’ll use as a reference for the
upcoming sections and chapters. The concepts of Private Google Access
and Private Service Connect will be introduced and implemented. A
detailed setup of a Google Kubernetes Engine (GKE) cluster in our shared
VPC will be implemented with examples of internode connectivity. The
fundamental concepts of routing and firewall rules will be discussed, with
emphasis on their applicability scope, which is the entire VPC.
Chapter 5 will cover all the load balancing services you need to know to
pass the exam, beginning from the nine different “flavors” of GCP load
balancers. A number of deep dive examples on how to implement global,
external HTTP(S) load balancers with different backend types will be
provided. You will become an expert at choosing the right load balancer
based on a set of business and technical requirements, which is exactly
what you are expected to know during the exam and at work.
In Chapter 7, you will learn how to implement the GCP products that
enable hybrid and multi-cloud connectivity. These include the two
“flavors” of Cloud Interconnect (Dedicated and Partner) and the two
flavors of Cloud VPN (HA and Classic).
The last chapter (Chapter 8) concludes our study by teaching you how to
perform network operations as a means to proactively support and optimize
the network infrastructure you have designed, architected, and
implemented.
Each chapter (other than Chapter 1) includes at the end a few questions
(and the correct answers) to help you consolidate your knowledge of the
covered exam objective.
As in any discipline, you will need to supplement what you learned with
experience. The combination of the two will make you a better GCP
network engineer. I hope this book will help you achieve your Google
Cloud Professional Cloud Network Engineer certification and, most
importantly, will equip you with the tools and the knowledge you need to
succeed at work.
Acknowledgments
This book is the result of the study, work, and research I accomplished over
the past two years. I could not have written this book without the help of
family, friends, colleagues, and experts in the field of computer networks
and computer science.
When my friend, former colleague, and author Tom Nelson first introduced
me to Apress in August 2021, I had no idea I was about to embark on this
wonderful journey.
I am also grateful to Luca Prete for his article “GCP Routing Adventures
(Vol. 1)” he posted on Medium, which helped me explain in a simple yet
comprehensive way the concept of BGP (Border Gateway Protocol) routing
mode, as it pertains to VPCs.
Last, I cannot express enough words of gratitude for the Late Prof.
Giovanni Degli Antoni (Gianni), who guided me through my academic
career in the University of Milan, and my beloved parents Eugenia and
Giuseppe, who always supported me in my academic journey and in life.
Table of Contents
Chapter 1: Exam Overview1
Exam Content1
Exam Subject Areas2
Exam Format2
Supplementary Study Materials2
Sign Up for a Free Tier3
Register for the Exam3
Schedule the Exam4
Rescheduling and Cancellation Policy7
Exam Results7
Retake Policy7
Summary8
Chapter 2: Designing, Planning, and Prototyping a Google Cloud Network9
Designing an Overall Network Architecture9
High Availability, Failover, and Disaster Recovery Strategies11
DNS (Domain Name System) Strategy16
Security and Data Exfiltration Requirements18
Load Balancing19
Applying Quotas per Project and per VPC24
Container Networking24
SaaS, PaaS, and IaaS Services29
Designing Virtual Private Cloud (VPC) Instances29
VPC Specifications30
Subnets31
IP Address Management and Bring Your Own IP (BYOIP)32
Standalone vs. Shared VPC34
Multiple vs. Single35
Regional vs. Multi-regional38
VPC Network Peering39
Firewalls42
Custom Routes45
Designing a Hybrid and Multi-cloud Network46
Drivers for Hybrid and Multi-cloud Networks47
Overall Goals48
Designing a Hybrid and Multi-cloud Strategy49
Dedicated Interconnect vs. Partner Interconnect51
Direct vs. Carrier Peering52
IPsec VPN53
Bandwidth and Constraints Provided by Hybrid Connectivity Solutions53
Cloud Router55
Multi-cloud and Hybrid Topologies57
Regional vs. Global VPC Routing Mode65
Failover and Disaster Recovery Strategy66
Accessing Google Services/APIs Privately from On-Premises Locations70
IP Address Management Across On-Premises Locations and Cloud74
Designing an IP Addressing Plan for Google Kubernetes Engine (GKE)74
GKE VPC-Native Clusters75
Optimizing GKE IP Ranges76
Expanding GKE IP Ranges79
Public and Private Cluster Nodes81
Control Plane Public vs. Private Endpoints81
Summary84
Exam Questions85
Question 2.1 (VPC Peering)85
Question 2.2 (Private Google Access)85
Chapter 3: Implementing Virtual Private Cloud Instances87
Configuring VPCs87
Configuring VPC Resources88
Configuring VPC Peering92
Creating a Shared VPC and Sharing Subnets with Other Projects98
Using a Shared VPC107
Sharing Subnets Using Folders111
Configuring API Access to Google Services (e.g., Private Google Access,
Public Interfaces)112
Expanding VPC Subnet Ranges After Creation118
Configuring Routing118
Static vs. Dynamic Routing120
Global vs. Regional Dynamic Routing124
Routing Policies Using Tags and Priorities131
Internal Load Balancer As a Next Hop135
Custom Route Import/Export over VPC Peering136
Configuring and Maintaining Google Kubernetes Engine Clusters137
VPC-Native Clusters Using Alias IP Ranges137
Clusters with Shared VPC139
Creating Cluster Network Policies158
Private Clusters and Private Control Plane Endpoints166
Adding Authorized Networks for Cluster Control Plane Endpoints168
Configuring and Managing Firewall Rules171
Target Network Tags and Service Accounts172
Priority175
Protocols and Ports176
Direction176
Firewall Rules Logs177
Firewall Rule Summary177
Exam Questions178
Question 3.1 (Routing)178
Question 3.2 (Firewall Rules)178
Question 3.3 (Firewall Rules, VPC Flow Logs)179
Question 3.4 (Firewall Rules, Target Network Tags)180
Chapter 4: Implementing Virtual Private Cloud Service Controls181
VPC Service Controls Introduction18
Creating and Configuring Access Levels and Service Perimeters182
Perimeters182
Access Levels184
Service Perimeter Deep Dive185
Enabling Access Context Manager and Cloud Resource Manager APIs186
Creating an Access Policy for the Organization186
Creating an Access Level187
Creating a Perimeter188
Testing the Perimeter189
Deleting the Buckets192
VPC Accessible Services192
Perimeter Bridges194
Audit Logging197
Dry-Run Mode197
Dry-Run Concepts198
Dry-Run Perimeter Deep Dive198
Setting Up Private Connectivity to Google APIs198
Updating the Access Level202
Updating the Perimeter202
Testing the Perimeter203
Creating a Dry-Run Perimeter by Limiting VPC Allowed Services205
Testing the Dry-Run Perimeter208
Enforcing the Dry-Run Perimeter209
Testing the Enforced Perimeter210
Cleaning Up211
Final Considerations212
Shared VPC with VPC Service Controls212
VPC Peering with VPC Service Controls212
Exam Questions212
Question 4.1 (Perimeter with Shared VPC)212
Question 4.2 (Dry-Run)213
Chapter 5: Configuring Load Balancing215
Google Cloud Load Balancer Family216
Backend Services and Network Endpoint Groups (NEGs)217
Firewall Rules to Allow Traffic and Health Checks to Backend Services221
Configuring External HTTP(S) Load Balancers Including Backends and
Backend Services with Balancing Method, Session Affinity, and Capacity
Scaling/Scaler223
External TCP and SSL Proxy Load Balancers252
Network Load Balancers256
Internal HTTP(S) and TCP Proxy Load Balancers260
Load Balancer Summary262
Protocol Forwarding265
Accommodating Workload Increases Using Autoscaling vs. Manual
Scaling266
Configuring Cloud Armor Policies268
Security Policies268
Web Application Firewall (WAF) Rules269
Attaching Security Policies to Backend Services270
Configuring Cloud CDN273
Interaction with HTTP(S) Load Balancer273
Enabling and Disabling Cloud CDN275
Cacheable Responses275
Using Cache Keys276
Cache Invalidation279
Signed URLs281
Custom Origins283
Best Practices285
Use TLS Everywhere285
Restrict Ingress Traffic with Cloud Armor and Identity-Aware Proxy
(IAP)285
Enable Cloud CDN for Cacheable Content286
Enable HTTP/2 As Appropriate286
Optimize Network for Performance or Cost Based on Your
Requirements286
Leverage User-Defined HTTP Request Headers to Manage Metadata287
Exam Questions287
Question 5.1 (Backend Services)287
Question 5.2 (Backend Services, Max CPU %, Capacity)287
Question 5.3 (Backend Services, Canary A/B Testing)288
Question 5.4 (HTTPS Load Balancer, Cloud CDN)289
Question 5.5 (HTTPS Load Balancer, Autoscale)290
Chapter 6: Configuring Advanced Network Services291
Configuring and Maintaining Cloud DNS291
Managing Zones and Records291
Migrating to Cloud DNS297
DNS Security Extensions (DNSSEC)299
Forwarding and DNS Server Policies300
Integrating On-Premises DNS with Google Cloud300
Split-Horizon DNS303
DNS Peering303
Private DNS Logging307
Configuring Cloud NAT307
Architecture308
Creating a Cloud NAT Instance309
Addressing and Port Allocations310
Customizing Timeouts310
Logging and Monitoring311
Restrictions per Organization Policy Constraints315
Configuring Network Packet Inspection315
Configuring Packet Mirroring315
Packet Mirroring in Single and Multi-VPC Topologies317
Capturing Relevant Traffic Using Packet Mirroring Source and Traffic
Filters322
Routing and Inspecting Inter-VPC Traffic Using Multi-NIC VMs (e.g.,
Next-Generation Firewall Appliances)322
Configuring an Internal Load Balancer As a Next Hop for Highly Available
Multi-NIC VM Routing324
Exam Questions331
Question 6.1 (Cloud DNS)331
Question 6.2 (Cloud NAT)332
Question 6.3 (Cloud DNS)332
Chapter 7: Implementing Hybrid Connectivity335
Configuring Cloud Interconnect335
Dedicated Interconnect Connections and VLAN Attachments336
Partner Interconnect Connections and VLAN Attachments345
Configuring a Site-to-Site IPsec VPN352
High Availability VPN (Dynamic Routing)353
Classic VPN (e.g., Route-Based Routing, Policy-Based Routing)360
Configuring Cloud Router362
Border Gateway Protocol (BGP) Attributes (e.g., ASN, Route Priority/
MED, Link-Local Addresses)363
Default Route Advertisements via BGP366
Custom Route Advertisements via BGP367
Deploying Reliable and Redundant Cloud Routers367
Exam Questions369
Question 7.1 (Interconnect Attachments)369
Question 7.2 (Cloud VPN)370
Question 7.3 (Cloud VPN)370
Question 7.4 (Partner Interconnect)371
Question 7.5 (Cloud Router)372
Chapter 8: Managing Network Operations373
Logging and Monitoring with Google Cloud’s Operations Suite373
Reviewing Logs for Networking Components (e.g., VPN, Cloud Router,
VPC Service Controls)374
Monitoring Networking Components (e.g., VPN, Cloud Interconnect
Connections and Interconnect Attachments, Cloud Router, Load Balancers,
Google Cloud Armor, Cloud NAT)381
Managing and Maintaining Security391
Firewalls (e.g., Cloud-Based, Private)391
Diagnosing and Resolving IAM Issues (e.g., Shared VPC, Security/
Network Admin)392
Maintaining and Troubleshooting Connectivity Issues400
Draining and Redirecting Traffic Flows with HTTP(S) Load Balancing401
Monitoring Ingress and Egress Traffic Using VPC Flow Logs402
Monitoring Firewall Logs and Firewall Insights406
Managing and Troubleshooting VPNs410
Troubleshooting Cloud Router BGP Peering Issues411
Monitoring, Maintaining, and Troubleshooting Latency and Traffic
Flow412
Testing Latency and Network Throughput412
Using Network Intelligence Center to Visualize Topology, Test
Connectivity, and Monitor Performance415
Exam Questions416
Question 8.1 (VPC Flow Logs, Firewall Rules Logs)416
Question 8.2 (Firewall Rules Logs)416
Question 8.3 (IAM)417
Question 8.4 (IAM)418
Question 8.5 (Troubleshooting VPN)418
Index421
About the Author
Dario Cabianca
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification
Companion, Certification Study Companion Series
https://doi.org/10.1007/978-1-4842-9354-6_1
1. Exam Overview
Dario Cabianca1
(1) Georgetown, KY, USA
You are starting your preparation for the Google Professional Cloud
Network Engineer certification. This certification validates your knowledge
to implement and manage network architectures in Google Cloud.
In this chapter, we will set the direction on getting ready for the exam. We
will outline resources that will aid you in your learning strategy. We will
explain how you can obtain access to a free tier Google Cloud account,
which will allow you to practice what you have learned. We will provide
links to useful additional study materials, and we will describe how to sign
up for the exam.
Exam Content
The exam does not cover cloud service fundamentals, but some
questions on the exam assume knowledge of these concepts. Some of the
broad knowledge areas that you are expected to be familiar with are
The main subject areas that are covered on the exam are listed in Table 1-1.
Table 1-1 Exam subject areas
Domain
Google doesn’t provide their weighting ranges, nor does it tell you how you
scored in each domain. The outcome of the exam is pass/fail and is
provided immediately upon submitting your exam.
You are expected to learn all topics according to the exam study guide that
are included in this study companion.
Exam Format
The exam consists of 50–60 questions with a time length of two hours, all
of which are in one of the following formats:
Multiple choice: Select the most appropriate answer.
Multiple select: Select all answers that apply. The question will tell you
how many answers are to be selected.
As long as you come well prepared to take the exam, you should be able to
complete all questions within the allotted time. Some questions on the exam
may be unscored items to gather statistical information. These items are not
identified to you and do not affect your score.
The registration fee for the Google Professional Cloud Network Engineer
certification exam is $200 (plus tax where applicable).
For the latest information about the exam, navigate to the Google Cloud
Certifications page at the following URL:
https://cloud.google.com/certification/cloud-
network-engineer.
To be well prepared for this exam, you will utilize this study companion as
well as other materials, including hands-on experience, on-demand training
courses, the Google Cloud documentation, and other self-study assets.
Google Cloud offers a free tier program with the following benefits:
There is no charge to use the 20+ products up to their specified free usage
limit. The free usage limit does not expire, but is subject to change. Keep in
mind that even with a free tier account, you are still required to provide a
valid credit card, although charges start after you use your allotted $300
credit or after 90 days.
Fill out the form and click “Save” as indicated in Figure 1-2.
Figure 1-2 Creating a Webassessor account
Upon creating your Webassessor account, you are all set and you can
schedule your exam.
Scroll down until you find the Google Cloud Certified Professional
Cloud Network Engineer (English) exam in the list as indicated in Figure 1-
4. You will see a “Buy Exam” blue button. In my case, since I am already
certified the button is unavailable. Click the “Buy Exam” button.
Figure 1-4 Selecting the exam
You will be asked whether you want to take the exam at a test center
(Onsite Proctored) or online at home (Remote Proctored). Select your
preferred choice.
Regardless of where you will take the exam, you will need to present a
government-issued identification (ID) before you start your exam.
If you will take your exam online at your home, you will also need a
personal computer or a mac that has a reliable webcam and Internet
connection and a suitable, distraction-free room or space where you will be
taking your exam.
TipIf you take your exam online, make sure you use your own personal
computer or mac to take the exam. Do not attempt to take the exam using
your company’s laptop or a computer in the office. This is because a
company-owned computer typically uses a VPN (virtual private network)
client and software to provide an extra layer of protection to prevent
corporate data exfiltration. This software generates issues with the software
you need to download and install in order to take your exam.
Depending on your selection, the next screen asks you to select a test center
location as indicated in Figure 1-5.
Finally, you will be directed to check out where you will pay your exam fee
($200 plus taxes).
If you need to make any changes to your scheduled exam date or time, you
need to log in to your Webassessor account and click Reschedule or Cancel
next to your scheduled exam.
Exam Results
You are expected to take the exam at the scheduled place and time. After
the completion of the exam, you will immediately receive a Pass/Fail result.
If you achieve a Pass result, your transcript will record the exam as Pass,
and a few days later (it may take a week or even longer), you will receive
an email confirming the result, which includes a link to a Google Cloud
Perks website where you can select a gift.
If you fail, your transcript will record the exam as Fail, and you will also
receive an email to confirm the result. Don't give up if you don’t pass the
exam on your first try. Review all the study materials again, taking into
consideration any weak areas that you have identified after reviewing your
scoring feedback, and retake the exam.
Retake Policy
If you don't pass an exam, you can take it again after 14 days. If you don't
pass the second time, you must wait 60 days before you can take it a third
time. If you don't pass the third time, you must wait 365 days before taking
it again.
All attempts, regardless of exam language and delivery method (onsite or
online testing), count toward the total permissible attempts, and the waiting
period between attempts still applies. Circumventing this retake policy by
registering under a different name or any other means is a violation of the
Google Cloud Exam Terms and Conditions and will result in a denied or
revoked certification.
Summary
In this chapter, we covered all the areas that will help prepare you for the
Google Professional Cloud Network Engineer certification exam. We
provided an overview of the exam content and the type of questions you
will find on the exam. We explained how to access free training resources
from Google Cloud and how to sign up for a free tier Google Cloud
account. The free tier account will allow you to gain hands-on experience
working with Google Cloud.
Then, you will learn how to design a Google Cloud network and how to
pick and choose each component based on your workloads’ requirements
and the preceding areas.
Next, you will learn how to apply these concepts to the design of hybrid and
multi-cloud networks, which are prevalent nowadays. The Google Cloud
network connectivity products will be introduced, and their applicability
and scope will be explained with a number of reference architectures.
Finally, container networking will be presented, and you will see how
Google Cloud provides native networking capabilities that address most of
the container networking concerns.
Two key factors in designing the network architecture of your workload are
resilience and high availability. The network architecture of your workload
will vary if your workload serves during peak time a few dozen concurrent
requests originating from a single region, as opposed to several thousand
concurrent requests from multiple regions with an expected 99.99%
availability and very low-latency response (e.g., submilliseconds).
In the former scenario, you will not need to worry about multi-regional
failover because one of your business requirements states your visitors
originate from a single region. In the latter scenario, multi-regional failover
and a selection of highly performant cloud services will be required to meet
the requirements of global, high availability as well as low latency.
Disaster Recovery
High Availability
Availability is one of the most used SLIs and is defined as the fraction of
the time that a service is usable, for example, 99%.
The smaller your RTO and RPO values are (i.e., the faster your
workload must recover from an interruption), the more your workload will
cost to run, as illustrated in Figure 2-2.
Figure 2-2 Recoverability cost based on RTO/RPO
Now that you are familiar with basic Site Reliability Engineering (SRE)
terms, let’s see how your workload requirements are mapped to SLOs and
how SLOs can be used to define the topology of your network.
The requirements your workload need to fulfill will ultimately translate into
one or more SLOs. SLOs are a means to define thresholds—a lower bound
and an upper bound for a given SLI, for example, availability, which are
intended to determine whether a service is deemed to perform in an
acceptable manner with respect to a given metric.
Once your SLOs are defined, for example:
Figure 2-3 shows how you choose the GCP services you need to
consider to meet your workload availability and disaster recovery
requirements.
Figure 2-3 Process to select GCP services to meet DR and HA requirements
You may be wondering how do you select which GCP services you
need to use in order to meet your workload SLOs? To answer this question,
you need to understand how the availability SLI is related to zones and
regions. Google Cloud generally designs products to deliver the levels of
availability for zones and regions (Table 2-1).
Table 2-1 Availability design goals for zonal and regional GCP services
GCP Availability
Implied
Service Examples Design
Downtime
Locality Goal
GCP Availability
Implied
Service Examples Design
Downtime
Locality Goal
Regional Cloud
Storage, Replicated
52
Regional Persistent Disk, 99.99%
minutes/year
Regional Google
Kubernetes Engine
As a result, the selection of the GCP services for your workload is based
upon the comparison of the GCP availability design goals against your
acceptable level of downtime, as formalized by your workload SLOs.
For example, if your workload has an availability SLO greater than 99.99%,
you’ll probably want to exclude zonal GCP services because zonal GCP
services are guaranteed an availability of 99.9% only, as indicated in
Table 1-1.
Figure 2-4 illustrates how GCP compute, storage, and network services
are broken down by locality (zonal, regional, multi-regional).
Figure 2-4 Breakdown of main GCP services by locality
Figure 2-5 shows the tasks you need to complete to execute an effective
DNS strategy for your workload overall network architecture.
Figure 2-5 DNS strategy process
Another factor that you need to account for in designing an overall network
architecture for your workload is the protection of your workload sensitive
data.
In this section, you will learn what you need to do in order to effectively
prevent exfiltration of sensitive data from your workload.
With a VPC service perimeter, you can lock down GCP services similar to
how firewalls protect your VMs.
Since any GCP service can be consumed in the form of one or more REST
(Representational State Transfer) API calls (HTTP requests), by limiting
access to who can consume the APIs for which GCP service, you are
essentially establishing a logical perimeter for GCP service access. With a
VPC service perimeter, you can force one or more GCP APIs to be accessed
only by a limited number of authorized GCP projects.
Any API request coming from entities outside the service perimeter is
unauthorized and will result in a 403 HTTP Status Code—Forbidden.
Figure 2-7 illustrates some of the most common use cases where data
exfiltration can be prevented by using a VPC service perimeter. More
details will be covered in the next chapters.
Figure 2-7 Enforcing a service perimeter to prevent data exfiltration
These are
1.
Access from the Internet or from unauthorized VPC networks
2.
Access from authorized VPC networks to GCP resources (e.g., VMs)
in authorized projects
3.
Copy data with GCP service to service calls, for example, from
bucket1 to bucket2
Load Balancing
Unlike other public cloud providers, GCP offers tiered network services.
Conversely, the standard tier leverages the public Internet to carry traffic
between your workload GCP services and your users. While the use of the
Internet provides the benefit of a lower cost when compared to using the
Google global backbone (premium tier), choosing the standard tier results
in lower network performance and availability similar to other public cloud
providers.
Think of choosing between premium and standard tiers as booking a rail
ticket. If cost is your main concern and travel time is not as important, you
should consider traveling with a “standard” train that meets your budget.
Otherwise, if travel time is critical and you are willing to pay extra, you
should consider traveling with a high-speed train, which leverages state-of-
the-art high-speed rail infrastructure, that is, premium tier.
Standard
Category Premium Tier
Tier
Lower
performance,
Performance High performance, low latency
higher
latency
Service Regional
Global SLA
Level SLA
If performance and reliability are your key drivers, then you should opt for
a load balancer that utilizes the premium tier.
If cost is your main driver, then you should opt for a load balancer that
utilizes the standard tier.
The decision tree in Figure 2-10 will help you select the network tier
that best suits the load balancing requirements for your workload.
Figure 2-10 Network service tiers decision tree
One of the drivers that helps shape your workload overall network
architecture is definitely cost. Cost is everywhere: you pay for infrastructure
(physical or virtual), you pay for software (commercial off the shelf or do-
it-yourself), and you pay for the life cycle of your software and your
infrastructure.
One of the benefits of the cloud is that you can limit the cost of your
resource usage by specifying quotas.
Container Networking
Google Cloud allows you to deploy your containers using one of these
three container services (Figure 2-11).
GKE sits in the middle between Cloud Run and Google Compute Engine.
As a result, it provides the right mix of manageability and network controls
you need in order to properly design your workload network architecture.
In this section, we will focus on GKE, and we will address some of the
architectural choices you will need to make when designing an overall
network architecture for your workload.
So, you chose to use GKE as compute service for your workload, and you
also understood at a high level what its components do and how they
interact with each other. Now, the question is how do you design your
workload overall network architecture using GKE? And, most importantly,
where do you start?
One of the best practices to properly plan IP allocation for your pods,
services, and nodes is to use VPC-native clusters, which as of the writing of
this book are enabled by default.
GKE clusters are very dynamic systems. Pods and their constituents (e.g.,
containers, volumes, etc.) are ephemeral entities, which are created,
scheduled, consumed, and ultimately terminated by Kubernetes during its
orchestration process. Any pod in a cluster is designed to be able to
communicate with any other pod in the cluster (without Network Address
Translation (NAT)), whether it be in the same node or in a different node
that belongs to your GKE cluster. As your cluster scales out in response to
high demand, new pods are created and new routes are consumed, thereby
rapidly approaching the quota per project.
When the route quota is reached, just like for other quotas GCP returns a
“quota exceeded” error message.
With VPC-native clusters, you no longer need to worry about running out
of route quotas in response to scaling, nor do you need to worry about
running out of IP addresses for your pods and services.
All these three IP ranges must be disjoint, that is, they cannot overlap. You
have the option to let GKE manage them for you, when you create your
cluster, or you can manage them yourself.
Finally, with VPC-native clusters, your pods and services are more closely
integrated with your VPC network than using route-based clusters. This
allows for advanced VPC capabilities like shared VPC, container-native
load balancing, and many others. Let’s touch base on container-native load
balancing and why this capability is relevant to your workload’s overall
network architecture. We will learn in detail how the other capabilities work
in the upcoming chapters.
NoteTo learn more about ingress, use the official Kubernetes reference:
https://kubernetes.io/docs/concepts/services-
networking/ingress/.
Iptable rules programmed on each node routed requests to the pods. This
approach was effective for small clusters serving a limited number of
incoming requests, but it turned out to be inefficient for larger clusters due
to suboptimal data paths with unnecessary hops between nodes.
Scale
Performance
Reduced network cost
Global
Shareable Features Private Features
Features
No cross-
Firewall rules, routes, VPC Service Controls
region
VPNs configured for enhanced perimeter
VPNs
once security
required
VPC Specifications
A VPC is intended to be a logical routing domain, which allows implicit
connectivity among any compute resource hosted in one of its partitions or
subnets.
1.
Global: VPCs are global resources, that is, they can span across
multiple regions. VPCs are composed of a number of IP range
partitions—denoted by CIDR blocks—which are known as subnets.
The acronym “CIDR” stands for Classless Inter-domain Routing.
More information on CIDR block notation is provided in the
following section.
2.
Regional subnets: Subnets are regional resources, that is, a subnet
is limited to one region.
3.
Firewall rules: Traffic to and from compute resources (e.g., VMs,
pods) is controlled by firewall rules, which are defined on a per-
VPC basis.
4. Implicit connectivity: Compute resources hosted in a (subnet of a)
VPC are allowed to communicate with each other by default, unless
VPC are allowed to communicate with each other by default, unless
Subnets
When you create a VPC, you don’t need to specify an IP range just for the
VPC. Instead, you use subnets to tell which partition of your VPC is
associated to which IP range. As per the VPC specifications, remember that
subnets are regional resources, while VPCs are global.
NoteFrom now on, we will be using the terms “IP range” and “CIDR
block” interchangeably. For more information about CIDR block notation,
refer to https://tools.ietf.org/pdf/rfc4632.pdf.
Each subnet must have a primary IP range and, optionally, one or more
secondary IP ranges for alias IP. The per-network limits describe the
maximum number of secondary ranges that you can define for each subnet.
Primary and secondary IP ranges must be RFC 1918 addresses.
External IP Addresses
The Internet
Resources in another network (VPC)
Services other than Compute Engine (e.g., APIs hosted in other clouds)
In other words, only resources with an external IP address can send and
receive traffic directly to and from outside your VPC.
Internal IP Addresses
Just like for external IP addresses, Compute Engine supports static and
ephemeral internal IP addresses:
Figure 2-17 Two VPCs in the same project connected to the Internet
Another important factor you need to consider during your VPC design is
how you want to enable separation of duties (or concerns).
From one side, a standalone VPC might be a reasonable option for your use
case because your organization is small, with a low growth rate, and you
have a central team that manages the network and the security of all your
workloads.
The extreme opposite is a large organization, with high growth rate and
multiple lines of businesses (LOBs), which requires a high degree of
separation of duties.
In the next sections, you will learn what design considerations you need to
account for to determine whether to choose a standalone or a shared VPC.
Additionally, you will learn the most common “flavors” of a shared VPC
and what each flavor is best suited for.
Standalone
Start with a single VPC network for resources that have common
requirements.
For many simple use cases, a single VPC network provides the features that
you need while being easier to create, maintain, and understand than the
more complex alternatives. By grouping resources with common
requirements and characteristics into a single VPC network, you begin to
establish the VPC network border as the perimeter for potential issues.
For an example of this configuration, see the single project, single VPC
network reference architecture:
https://cloud.google.com/architecture/best-
practices-vpc-design#single-project-single-vpc.
Factors that might lead you to create additional VPC networks include
scale, network security, financial considerations, compliance, operational
requirements, and Identity and Access Management (IAM).
Shared
The Shared VPC model allows you to export subnets from a VPC network
in a host project to other service projects in the same organization or tenant.
With Shared VPC, VMs in the service projects can connect to the shared
subnets of the host project.
Use the Shared VPC model to centralize the network administration
presented when multiple teams work together, for example, the developers
for an N-tier application.
In this scenario, network policy and control for all networking resources are
centralized and easier to manage. Service project departments can configure
and manage their compute resources, enabling a clear separation of
concerns for different teams in the organization.
Single and multiple VPC common use cases are provided as reference
architectures as follows.
Your workload business requirements will help you determine how resilient
your network architecture needs to be. Remember, we discussed in the
“High Availability, Failover, and Disaster Recovery Strategies” section how
SLIs and SLOs are derived by your business continuity and availability
requirements and how these metrics (SLIs) and their respective tolerance
thresholds (SLOs) help you shape the most suitable overall network
architecture and the most appropriate selection of services (zonal, regional,
multi-regional).
The good news is that in either scenarios, for example, highly resilient
architectures (multi-regional VPCs, with multi-regional services) or lowly
resilient architectures (multi-zonal, with regional services), Google Cloud
has the right combination of products and services that meet your business
continuity and availability requirements.
Another key factor that makes your life easier in designing VPC instances
is the fact that Google Cloud VPCs are global resources, which by
definition span multiple regions without using the Internet and—most
importantly—without you having to worry about connecting them
somehow (e.g., using IPsec VPN tunnels, Interconnect, or other means).
This is a remarkable advantage, which will simplify your VPC network
design and will reduce implementation costs.
NoteFor more information about VPC pricing, refer to the official Google
Cloud page: https://cloud.google.com/vpc/pricing.
VPC Network Peering
VPC network peering allows you to extend the advantages of private (RFC
1918) connectivity beyond the boundaries of a single organization. With
this model, you can design a hub and spoke topology where the hub hosts
common core services in the same VPC, also known as the producer VPC.
Typical core services include authentication, directory services, proxies, and
others, which are cross-cutting in nature and thereby are suitable for
consumption at scale across different enterprises.
For the sake of designing VPC networks, peering provides the unique
advantage of letting you use internal IP address (RFC 1918) connectivity
between peered VPCs. Traffic between peered VPCs stays in the Google
Global Backbone network, that is, it does not traverse the Internet.
Examples
Figure 2-21 illustrates how SaaS and native tenant isolation work.
Note that this wouldn’t have happened had the two consumers been part of
the same tenant (or organization).
SaaS services are deployed in the producer VPCs, which are hosted in the
“blue” organization (Org1).
Since the two consumers have an overlapping CIDR block, the producer
project requires two VPCs. The same VPC cannot be peered with multiple
consumers that use overlapping IP addresses.
The consumer in the “green” tenant (Org3) can consume SaaS services
deployed in a subnet in VPC2.
Likewise, the consumer in the “red” tenant (Org2) can consume SaaS
services deployed in a subnet in VPC1.
The example in Figure 2-23 illustrates how VPC network peering can
be used in conjunction with shared VPC. Two isolated environments (i.e.,
production and non-production) require common shared services, which are
deployed in a producer VPC (VPC3 in Project3) and are peered with the
production VPC and the non-production VPC (consumer VPCs). The
consumer VPCs are two shared VPCs, which expose their subnets to a
number of service projects (for the sake of simplicity, only one service
project has been displayed, but the number can—and should—be greater
than one).
Figure 2-23 Multiple shared, peered VPCs with multiple service projects
Firewalls
VPC scope: By default, firewall rules are applied to the whole VPC
network.
Network tag scope: Filtering can be accomplished by applying
firewall rules to a set of VMs by tagging the VMs with a network tag.
Service account scope: Firewall rules can also be applied to a set of
VMs, which are associated to a specific service account. You will need
to indicate whether or not the service account and the VMs are billed in
the same project, and choose the service account name in the
source/target service account field.
Internal traffic: You can also use firewall rules to control internal
traffic between VMs by defining a set of permitted source machines in
the rule.
Exam tipYou cannot delete the implied rules, but you can override them
with your own rules. Google Cloud always blocks some traffic, regardless
of firewall rules; for more information, see blocked traffic:
https://cloud.google.com/vpc/docs/firewalls#blocke
dtraffic.
To monitor which firewall rule allowed or denied a particular connection,
see Firewall Rules Logging:
https://cloud.google.com/vpc/docs/firewall-rules-
logging.
Example
The example in Figure 2-25 illustrates how two firewall rules can be used to
1.
Allow a Global HTTPS load balancer Front End (GFE) to access its
backend, represented by a management instance group with VMs
denoted by the web-server network tag
2.Deny the Global HTTPS load balancer Front End (GFE) direct access
to the VMs in a database server farm, denoted by the db-server
network tag
Likewise, the database instances are billed in their own service project
ServiceProject2, which is also connected to the shared VPC.
The host project handles the network administration of the subnets exposed
to the two service projects. This includes the network security aspects,
which are described by the two aforementioned firewall rules. The firewall
table in the figure is deliberately extended to the width of the shared VPC to
emphasize its global distribution scope, which applies to the entire VPC
network.
More details about firewall configurations and how to effectively use them
to secure the perimeter of your VPC will be provided in the next chapters.
Custom Routes
One common problem you need to address when designing your VPC
network instances is how to link them together efficiently and securely.
Finally, for any other scenarios where system-generated or peer routes are
not suitable to meet your workload requirements—for example, hybrid or
multi-cloud workloads—GCP allows you to create your own custom routes.
These can be of type static or dynamic. The former type (static route)
supports a predefined number of destinations and is best suited for simple
network topologies that don’t change very often. The latter type (dynamic
route) leverages a new resource, that is, Cloud Router, which is intended to
add and remove routes automatically in response to changes. Cloud Router
leverages the Border Gateway Protocol (BGP) to exchange routes with a
peer of a BGP session. More information will be provided in the next
chapter.
Within the context of GCP, the term hybrid cloud denotes a setup in which
common or interconnected workloads are deployed across multiple
computing environments, one hosted in a public cloud and at least one
being private in your data center, that is, on-premises.
The term multi-cloud denotes setups that combine at least two public cloud
providers, including potentially private computing environments.
Business Requirements
Development Requirements
From a development standpoint, common requirements and drivers include
Operational Requirements
Architectural Requirements
On the architecture side, the biggest constraints often stem from existing
systems and can include
Overall Goals
So, you’ve done your research using the rationalization process we just
described and determined that some of your workloads—for the time being
—need to operate in a hybrid (or multi-cloud) network. Now what?
Assuming you already have created your organization in GCP, the next step
is to implement your hybrid (or multi-cloud) network topology, and to do
that, you need to decide how your company’s on-premises data center(s)
will connect to GCP.
GCP offers a number of options, and the choice you need to make
depends on
Latency: For example, do you need high availability and low latency
(e.g., < 2 milliseconds) for your workload?
Cost: For example, is cost a priority, or are you willing to pay more
for lower latency, stronger security, and better resilience?
Security: For example, do you need to fulfill security requirements
due to compliance?
Resilience: For example, how resilient does your workload need to
be?
A key factor in this use case is the fact that this application is internal, that
is, all users will access the application from the company’s network (e.g.,
RFC 1918 IP address space) or its extension to GCP. No access from the
Internet or other external networks is allowed.
Another key factor is the fact that you need fast connectivity between your
company’s data center and GCP to effectively leverage the HSM in order to
authenticate users and authorize their access to the requested resource.
Now, imagine the application we used before grows in popularity and needs
to be upgraded to support external users, who will access it from the
Internet. This new use case will require internal access from your
company’s network and external access from the Internet.
We learned that with Cloud Interconnect internal users can access the
application from the company’s network, all of them using RFC 1918 IP
address space. What about external users who need access from the
Internet? This is where peering comes into play.
1.
IP transit: Buy services made available by an Internet Service
Provider (ISP), which is connected to Google (ASN 15169).
2.
Peering: Connect directly to Google (ASN 15169) in one of the
Google Edge PoPs around the world.
As you can imagine, peering offers lower latency because you avoid the ISP
mediation. Therefore, if your workload requires high-throughput, low-
latency, non-RFC 1918 connectivity, then peering is your best choice.
Just like Interconnect, peering comes in two flavors, direct and carrier.
Direct peering is best suited when your company already has a footprint in
one of Google’s PoPs, for example, it already uses a Dedicated Interconnect
circuit.
Carrier peering is best suited when your company chooses to let a carrier
manage a peering connection between its network and GCP.
IPsec VPN
Both Interconnect and peering leverage connectivity between your
company’s data centers and a GCP PoP without using the Internet. This is a
great benefit in terms of performance and security. By using a dedicated
communication channel, whether directly or using a carrier, you avoid extra
hops while minimizing the chances of data exfiltration.
However, this comes at the expense of high costs. As of the writing of this
book, a 10 Gbps circuit price is $2.328 per hour using Dedicated
Interconnect. A 100 Gbps circuit price is $18.05 per hour. Additionally, you
need to account for costs related to VLAN (Virtual Local Area Network)
attachments (which is where your traffic exchange link is established), as
well as egress costs from your VPCs to your company’s data centers on-
premises.
If you are looking for a more cost-effective solution for your private-to-
private workloads and are willing to sacrifice performance and security for
cost, then a virtual private network (VPN) IPsec tunnel is the way to go.
GCP offers a service called Cloud VPN, which provides just that. With
Cloud VPN, your on-premises resources can connect to the resources
hosted in your VPC using a tunnel that traverses the Internet. RFC 1918
traffic is encrypted and routed to its RFC 1918 destination via a router on-
premises and a router on GCP.
If you need resilience in addition to cost-effectiveness, GCP offers a highly
available VPN solution, called HA VPN, which comes with 99.99%
uptime.
The decision tree in Figure 2-31 can help you determine the first
selection of GCP network connectivity products that fit your workload
network requirements.
Cloud Router
Cloud Router was briefly presented in the “Custom Routes” section, when
the definition of dynamic route was introduced. In this and the upcoming
sections, we will learn more about this product and why it represents an
integral component of your multi-cloud and hybrid strategy.
First and foremost, let’s summarize a few important concepts we already
know about Virtual Private Cloud (VPC) networks and how a Google Cloud
VPC is different from other public cloud providers’ VPC networks.
VPC Routing
1.
By design, it provides internal routing connectivity between all its
subnets in any region.
2.
Internal connectivity means RFC 1918 IP addressing, that is, traffic
stays within the Google Global Backbone and does not traverse the
Internet.
3.
A VPC lives within one (and one only) project, which represents a
container you use for billing, securing, and grouping the Google
Cloud resources for your workload.
4.
A project may have more than one VPC. In this case, the VPCs within
your project are completely disjoint from each other, and their
subnets might even have overlapping CIDR ranges, as shown in
Figure 2-33. Still, they wouldn’t be able to connect to each other—
unless you choose to do so (e.g., using an IPsec VPN tunnel and a
NAT to disambiguate the overlapping CIDR ranges).
Figure 2-33 Two VPCs in the same project with overlapping CIDR 10.140.0/24
Figure 2-34 Cloud Router in a VPC configured with global dynamic routing
NoteThe term “topology” derives from the Greek words topos and logos,
which mean “locus” (i.e., the place where something is situated or occurs)
and “study,” respectively. Therefore, in the context of network engineering,
a topology is a blueprint for networks, whose logical or physical elements
are combined in accordance with a given pattern.
In the next sections, we will review a few hybrid and multi-cloud network
topologies, which will help you execute your workload migration or
modernization strategy. The term “private computing environment” will be
used to denote your company on-premises data centers or another cloud.
Mirrored
Meshed
A meshed topology is the simplest way to extend your network from your
company’s data center(s) into Google Cloud or from another public cloud
provider. The outcome of this topology is a network that encompasses all
your computing environments, whether they be in Google Cloud, on-
premises, or in other clouds.
1.
A peering router must be physically installed in a common Cloud
Interconnect Point of Presence (PoP) facility and configured with two
local address links:
a.
A link to Google Cloud
b.
A link to the desired public cloud provider
2.
You (the customer) are responsible for installing, configuring, and
maintaining the peering router. A similar design is available for
Partner Interconnect connectivity, whereas the partner manages the
router, and the router can be virtualized.
3.Make sure VPC1 and VPC2 have no overlapping CIDRs. Otherwise,
a NAT (Network Address Translation) router will be required.
4.
A VLAN attachment is a construct that tells your Cloud Router which
VPC network can be reached through a BGP session. Once a VLAN
attachment is associated to a Cloud Router, it automatically allocates
an ID and a BGP peering IP address, which are required for your
peering router (in the common Cloud Interconnect PoP facility) to
establish a BGP session with it. If your VPC is configured with
global dynamic routing, any Cloud Router in the VPC automatically
advertises to your peering router (over a BGP session) the VPC
subnets that are in other regions. Put differently, this setup (global
dynamic routing) is a cost-effective way to interconnect your multi-
cloud VPCs because you only need one Cloud Router in your VPC,
which leverages the Google Global Backbone to advertise to its
counterpart—that is, the peering router—all routes for all subnets
located in any region of your VPC.
Gated Egress
With a gated egress topology, you want to expose APIs to your GCP
workloads, which act as consumers. This is typically achieved by deploying
an API gateway in your private computing environment to act as a façade
for your workloads within.
Figure 2-38 displays this setup, with the API gateway acting as a façade
and placed in the private computing environment, whereas the consumers
are GCP VMs hosted in service projects of a shared VPC.
All traffic uses RFC 1918 IP addresses, and communication from your
private computing environment to GCP (i.e., ingress with respect to GCP)
is not allowed.
Gated Ingress
1.
eth0: Connected to a Transit VPC, which receives incoming RFC
1918 traffic from the private computing environment
2.
eth1: Connected to a Shared VPC, where your workloads operate
Figure 2-39 Reference architecture for gated ingress topology
All traffic uses RFC 1918 IP addresses, and communication from GCP to
your private computing environment (i.e., egress with respect to GCP) is
not allowed.
This topology lets your workloads take advantage of the gated egress and
gated ingress benefits at the same time. As a result, it is best suited when
your hybrid or multi-cloud workloads need to consume cross-boundaries
APIs, but they also expose APIs as well.
Figure 2-40 Reference architecture for gated egress and ingress topology
Handover
Data pipelines are established to load data (in batches or in real time) from
your private compute environment into a Google Cloud Storage bucket or a
Pub/Sub topic. GCP workloads (e.g., Hadoop) process the data and deliver
it in the proper format and channel.
This feature controls the behavior of all the Cloud Routers created in your
VPC, by determining whether or not they should take advantage of the
Google Global Backbone in the way they advertise subnet routes to their
BGP peer counterparts.
Regional: This is the default option. With this setting, all Cloud
Routers in this VPC advertise to their BGP peers subnets from their local
region only and program VMs with the router's best-learned BGP routes
in their local region only.
Global: With this setting, all Cloud Routers in this VPC advertise to
their BGP peers all subnets from all regions and program VMs with the
router's best learned BGP routes in all regions.
You can set the bgp-routing-mode flag any time, that is, while
creating or updating your VPC network.
A product like Cloud Router with its unique ability to advertise subnet
routes across the regions of your VPC empowers you with the right tool to
develop your workload resilience and high availability strategy in a cost-
effective, efficient, and easy-to-implement manner.
Additionally, when you supplement Cloud Router with the proper selection
of hybrid and multi-cloud network connectivity products we reviewed
before, you are off to developing a resilient, robust, secure, performant, and
scalable network architecture with a differentiated level of sophistication.
In the next two sections, we will present two reference topologies aimed at
helping you achieve failover, disaster recovery, and high availability (HA)
for your workloads.
1.
VPC1 is configured to use global dynamic routing mode.
2.
Each Google Cloud region has its own Cloud VPN instance (defined
in this use case as HA VPN Gateway), that is, CloudVPN1 and
CloudVPN2.
3.CloudVPN1 and CloudVPN2 have two external IP addresses each,
one per IPsec tunnel.
4.
Each on-premises data center has its own VPN Gateway, that is,
VPNgateway1 and VPNgateway2.
5.
As we will see in the next section, an Inter-region cost is
automatically added for optimal path selection. For the time being,
think of this Inter-region cost as a “toll” you have to pay when your
traffic is routed to another region (or continent). More details will be
provided in Chapters 3 and 7.
Redundant VPC
1.
The four Cloud Router instances R1, R2, R3, R4 are distributed in
two regions.
2.
VPC1 is configured to use global dynamic routing mode.
3.
As a result of point #2, R1, R2, R3, R4 advertise routes for all VPC
subnets, that is, Subnet1 in Region1, Subnet2 in Region2.
4.
R3, R4 advertise routes using Dedicated Interconnect. This is the
primary network connectivity product selected for this reference
topology.
5.
R1, R2 advertise routes using Cloud VPN. This is the secondary
(backup) network connectivity product selected for this reference
topology.
6.
Cloud Router instances are automatically configured to add an Inter-
region cost when they advertise subnet routes for subnets outside of
their region. This value (e.g., 103 in our reference topology) is
automatically added to the advertised route priority—the MED
(Multi-exit Discriminator). The higher the MED, the lower the
advertised route priority. This behavior ensures optimal path selection
when multiple routes are available.
7.When the on-premises BGP peer routers PeerRouter1 and
PeerRouter2 learn about Subnet2 routes in Region2, they
favor routes using Dedicated Interconnect rather than VPN because—
Let’s review now how this reference topology provides failover and disaster
recovery for ingress route advertisement.
In this scenario (Figure 2-44), the advertised route priority can be left
equal (e.g., MED=100) for all advertised on-premises subnet routes.
Figure 2-44 Redundant VPC exhibiting ingress route advertisement
By using the same MED, Google Cloud automatically adds that Inter-region
cost (103) when R1, R2 in Region1 program routes in VPC1 whose
destinations are in Subnet2 ranges.
This setup requires that your workload uses at least a VM (or another type
of compute service) with an external IP address. Exposing your workloads
to the Internet always presents at a minimum security risks and
unpredictable latency due to the ever-evolving topology of the Internet and
the massive amount of data that traverses its many nodes.
What if you want to mitigate these security risks and latency concerns by
choosing not to use external IP addresses?
This is where Private Google Access and Private Service Connect come
into play.
NoteMost, but not all, Google Cloud services expose their API for internal
access. The list of supported services can be found here:
https://cloud.google.com/vpc/docs/configure-
private-service-connect-apis#supported-apis.
Put differently, a VM no longer needs an external IP address in order to
consume Google APIs and services, provided the subnet it belongs to has
been configured to use Private Google Access and a few DNS and route
configurations are made.
Keep in mind that with this setup, the on-premises DNS server has to be
configured to map *.googleapis.com requests to
private.googleapis.com, which resolves to the 199.36.153.8/30 IP
address range.
Figure 2-46 The public DNS A records for private.googleapis.com and restricted.googleapis.com
1.
VM1A, VM1B, VM1C, VM1D can access Bucket1 by consuming the
private.googleapis.com endpoint. This is because they all
share access to subnet1, which is configured to allow Private
Google Access.
2.VM2B can also access Bucket1 by consuming the
private.googleapis.com endpoint. This time—as you may
have noticed—VM2B shares access to subnet2, which is configured
to deny Private Google Access. However, VM2B can leverage its
external IP address to access Bucket1. Even with its external IP
address, traffic remains within Google Cloud without traversing the
Internet.
3.
All VMs that share access to subnet2 and do not have an external
IP address (i.e., VM2A, VM2C, VM2D) cannot access Bucket1. This
is because subnet2 is configured to deny Private Google Access.
Figure 2-47 shows the previous example updated with a PSC endpoint.
Finally, you must configure your on-premises DNS so that it can make
queries to your private DNS zones. If you've implemented the private DNS
zones using Cloud DNS, then you need to complete the following steps:
1.
In VPC1, create an inbound server policy.
2.
In VPC1, identify the inbound forwarder entry points in the regions
where your Cloud VPN tunnels or Cloud Interconnect attachments
(VLANs) are located.
3.
Configure on-premises DNS name servers to forward the DNS names
for the PSC endpoints to an inbound forwarder entry point in the
same region as the Cloud VPN tunnel or Cloud Interconnect
attachment (VLAN) that connects to VPC1.
NoteWith PSC, the target service or API does not need to be hosted or
managed by Google. A third-party service provider can be used instead.
This point and the fact that PSC is fully managed make PSC the
recommended way to let workloads consume services and APIs privately.
IP Address Management Across On-Premises Locations and
Cloud
When it comes to using IP addresses with GKE, you will have to address a
supply and demand challenge.
First and foremost, you should use GKE VPC-native clusters instead of
route-based clusters. You learned the basics about GKE VPC-native clusters
in the “Container Networking” section.
--create-subnetwork name=SUBNET_NAME,range=NO
--cluster-ipv4-cidr=POD_IP_RANGE \
--services-ipv4-cidr=SERVICES_IP_RANGE
A VPC-native GKE cluster uses three unique subnet IP address ranges:
1.
It uses the subnet's primary IP address range, that is,
NODE_IP_RANGE, for all node IP addresses.
2.
It uses one secondary IP address range, that is, POD_IP_RANGE, for
all pod IP addresses.
3.
It uses another secondary IP address range, that is,
SERVICES_IP_RANGE, for all service (cluster IP) addresses.
Figure 2-48 shows how the aforementioned three subnet IP ranges are
allocated with respect to worker nodes, pods, and services.
An early allocation of IP ranges for the pods in your cluster has a twofold
benefit: preventing conflict with other resources in your cluster’s VPC
network and allocating IP addresses efficiently. For this reason, VPC-native
is the default network mode for all clusters in GKE versions 1.21.0-
gke.1500 and later.
So now that you have chosen to use VPC-native clusters, how do you go
about deciding how much IP space your GKE cluster effectively needs?
At the same time, you will also want to mitigate the risk of IP address
exhaustion by making sure whatever masks you choose for your cluster
nodes, pods, and services are not too big for what your workload really
needs.
In CIDR notation, the mask is denoted by the digit after the “/”, that is, /x
where 0 ≤ x ≤ 32.
Assuming you have proper IAM permissions, the preceding code will create
a new subnet your-subnet in the zone us-west1-a and a standard
GKE cluster, which is VPC-native because the flag --enable-ip-
alias is set—resulting in two secondary ranges in the subnet your-
subnet, one for the pod IPs and another for the service IPs.
The cluster will use the IP range 10.4.32.0/28 for its worker nodes, the
IP range 10.0.0.0/24 for its pods, and the IP range 10.4.0.0/25 for
its services.
Finally, each worker node will host no more than 16 pods, as illustrated
in Figure 2-49.
Now, let’s say you want to decrease the pod density from 16 to 8. This may
be because your performance metrics showed you don’t actually need 16
pods per node to meet your business and technical requirements. Reducing
the pod density will allow you to make better use of your preallocated IP
range 10.0.0.0/24 for pods.
You can reduce the pod density from 16 to 8 by creating a node pool in
your existing cluster:
In Figure 2-50, you can visually spot how the reduction of pods' density
applies to the newly created node pool.
Figure 2-50 Flexible pod density configuration by adding a node pool
When you introduce node pools to reduce the pod density of an existing
cluster, GKE automatically assigns a smaller CIDR block to each node in
the node pool based on the value of max-pods-per-node. In our
example, GKE has assigned /28 blocks to Node0, Node1, Node2,
Node3.
Exam tipYou can only set the maximum number of pods per node at
cluster creation or after creating a cluster by using a node pool.
In the previous section, you learned how to use the flexible pod density
GKE feature to efficiently allocate RFC 1918 IP addresses for the pods, the
services, and the nodes of your GKE cluster.
Non-RFC 1918
The first way to expand your GKE cluster’s IP ranges is by using non-RFC
1918 reserved ranges.
Figure 2-51 shows this list of non-RFC 1918 ranges.
From a routing perspective, these IP addresses are treated like RFC 1918
addresses (i.e., 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), and subnet
routes for these ranges are exchanged by default over VPC peering.
However, since these addresses are reserved, they are not advertised over
the Internet, and when you use them, the traffic stays within your GKE
cluster and your VPC network.
Unlike non-RFC 1918, subnet routes for privately used public addresses are
not exchanged over VPC peering by default. Put differently, peer VPC
networks must be explicitly configured to import them in order to use them.
Last, make sure that these IP addresses don’t overlap with any of your VPC
primary and secondary subnet ranges.
By default, you can configure access from the Internet to your cluster's
workloads. Routes are not created automatically.
Think of the control plane as the brain of your GKE cluster, where all
decisions are made to allocate, schedule, and dispose of resources for your
containerized workload (e.g., worker nodes, pods, volumes, etc.).
Conversely, think of the data plane as the body of your GKE cluster, where
the actual work happens by letting the pods in your worker node(s)
compute, process, and store your workload data.
To answer this question, you need to decide who is authorized to access the
control plane and from where. The first concern—the “who”—is a security
concern, which is an important area of a broader topic, that is, Identity and
Access Management (IAM), and will not be covered in this book. This
book focuses on the networking aspects of your GCP workload. As a result,
this section will cover the “where,” which is the location of the user
requesting access to the control plane, and how the request reaches its
destination and is eventually fulfilled.
Private GKE clusters expose their control plane using a public and a private
endpoint. This feature, combined with the fact that all worker nodes do not
have external IPs, differentiates private clusters from other GKE clusters.
There are three ways you can access a private cluster endpoint. Let’s review
them.
This is the most secure private cluster configuration, where the cluster
public endpoint is deliberately disabled to protect its control plane from
requests originating from the Internet. Figure 2-52 illustrates this
configuration.
Notice in the figure the only entry point to the cluster control plane is its
private endpoint, which is implemented by an internal TCP/UDP network
load balancer in the control plane's VPC network.
By default, any worker node, pod, or service in the cluster has access to the
private endpoint. In fact, for private clusters, this is the only way worker
nodes, pods, and services access the control plane.
Since the GKE cluster is private and its control plane public endpoint has
been disabled, the only CIDR blocks you can specify as values of the --
master-authorized-network flag are RFC 1918 ranges.
This configuration is best suited when your GKE workload requires the
highest level of restricted access to the control plane by blocking any access
from the Internet.
There are scenarios where you want to access your private cluster control
plane from the Internet, yet in a restrained manner.
Figure 2-53 GKE private cluster with limited public endpoint access
Subsequently, you leverage the --master-authorized-network
flag to selectively list the CIDR blocks authorized to access your cluster
control plane.
Figure 2-53 illustrates this configuration: the non-RFC 1918 CIDR block
198.37.218/24 is authorized access to the control plane, whereas the
client 198.37.200.103 is denied access.
This is a good choice if you need to administer your private cluster from
source networks that are not connected to your cluster's VPC network using
Cloud Interconnect or Cloud VPN.
This is the least secure configuration in that access to the GKE cluster’s
control plane is allowed from any IP address (0.0.0.0/0) through the HTTPS
protocol.
Figure 2-54 GKE private cluster with unlimited public endpoint access
Summary
This chapter walked you through the important things you need to consider
when designing, planning, and prototyping a network in Google Cloud.
The key drivers that shape the design of the overall network architecture
were introduced, that is, high availability, resilience, performance, security,
and cost. You learned how tweaking your workload’s nonfunctional
requirements has an impact on the resulting network architecture.
Whether your workloads are cloud-native (born in the cloud), or they are
being migrated from your company on-premises data centers (hybrid), or
even from other clouds (multi-cloud), you learned how Google Cloud
provides products, services, and reference architectures to meet your needs.
Last, as the creator of Kubernetes, you learned how Google Cloud offers a
diverse set of unique features that help you choose the best network
capabilities for your containerized workloads, including VPC-native
clusters, container-native load balancing, flexible pod density, and private
clusters.
In the next chapter, we will deep dive into VPC networks and introduce the
tools you need to build the VPCs for your workloads.
Exam Questions
Your company just moved to GCP. You configured separate VPC networks
for the Finance and Sales departments. Finance needs access to some
resources that are part of the Sales VPC. You want to allow the private RFC
1918 address space traffic to flow between Sales and Finance VPCs without
any additional cost and without compromising the security or performance.
What should you do?
A.
Create a VPN tunnel between the two VPCs.
B.
Configure VPC peering between the two VPCs.
C.
Add a route on both VPCs to route traffic over the Internet.
D.
Create an Interconnect connection to access the resources.
Rationale
A is not correct because VPN will hinder the performance and will add
additional cost.
B is CORRECT because VPC network peering allows traffic to flow
between two VPC networks over private RFC 1918 address space
without compromising the security or performance at no additional
cost.
C is not correct because RFC 1918 is a private address space and cannot
be routed via public Internet.
D is not correct because Interconnect will cost a lot more to do the same
work.
You are configuring a hybrid cloud topology for your organization. You are
using Cloud VPN and Cloud Router to establish connectivity to your on-
premises environment. You need to transfer data from on-premises to a
Cloud Storage bucket and to BigQuery. Your organization has a strict
security policy that mandates the use of VPN for communication to the
cloud. You want to follow Google-recommended practices. What should
you do?
A.
Create an instance in your VPC with Private Google Access enabled.
Transfer data using your VPN connection to the instance in your
VPC. Use gsutil cp files gs://bucketname and bq --
location=[LOCATION] load --source_format=
[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE]
[SCHEMA] on the instance to transfer data to Cloud Storage and
BigQuery.
B. Use nslookup -q=TXT spf.google.com to obtain the API
IP endpoints used for Cloud Storage and BigQuery from Google’s
netblock. Configure Cloud Router to advertise these netblocks to
your on-premises router using a flexible routing advertisement. Use
gsutil cp files gs://bucketname and bq --
location=[LOCATION] load --source_format=
[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE]
[SCHEMA] on-premises to transfer data to Cloud Storage and
BigQuery.
C.
Configure Cloud Router (in your GCP project) to advertise
199.36.153.4/30 to your on-premises router using a flexible
routing advertisement (BGP). Modify your on-premises DNS server
CNAME entry from *.googleapis.com to
restricted.googleapis.com. Use gsutil cp files
gs://bucketname and bq --location=[LOCATION]
load --source_format=[FORMAT] [DATASET].
[TABLE] [PATH_TO_SOURCE] [SCHEMA] on-premises to
transfer data to Cloud Storage and BigQuery.
D.
Use gsutil cp files gs://bucketname and bq --
location=[LOCATION] load --source_format=
[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE]
[SCHEMA] on-premises to transfer data to Cloud Storage and
BigQuery.
Rationale
A is not correct because it adds additional operational complexity and
introduces a single point of failure (Instance) to transfer data. This is not
Google-recommended practice for on-premises private API access.
B is not correct because these netblocks can change, and there is no
guarantee these APIs will not move to different netblocks.
C is CORRECT because it enables on-premises Private Google
Access, allowing VPN and Interconnect customers to reach APIs
such as BigQuery and Google Cloud Storage natively across an
Interconnect/VPN connection. The CIDR block 199.36.153.4/30 is
obtained when you try to resolve
restricted.googleapis.com. You need this CIDR block when
adding a custom static route to enable access to Google-managed
services that VPC Service Controls supports. Google Cloud Storage
and BigQuery APIs are eligible services to secure the VPC perimeter
using VPC Service Controls. Therefore, the CNAME type DNS
records should resolve to restricted.googleapis.com.
D is not correct because it will utilize an available Internet link to
transfer data (if there is one). This will not satisfy the security
requirement of using the VPN connection to the cloud.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification
Companion, Certification Study Companion Series
https://doi.org/10.1007/978-1-4842-9354-6_3
In the previous chapter, you learned about Virtual Private Cloud networks
(VPCs) and how to design one or more VPCs from a set of business and
technical requirements for your workload. We showed how every detail of
your VPC design and connectivity traces back to one or more requirements.
At this point, the overall network architecture and topology should have
been designed. With reference to the painting analogy, you should have
your canvas, your shapes, and colors selected.
In this chapter, we will take a step further and will teach you how to build
your VPCs, and most importantly we will teach you how to establish
connectivity among your VPCs and other networks, in accordance with the
topologies developed in Chapter 2.
Configuring VPCs
As your workload gets more hits, incoming traffic increases, and your VPC
may need to gradually expand to more zones and eventually to other
regions in order for your workload to perform and be resilient.
For example, you can extend the IP address range of a subnet anytime. You
can even add a new subnet after a VPC has been created. When you add a
new subnet, you specify a name, a region, and at least a primary IP address
range according to the subnet rules.
There are some constraints each subnet must satisfy, for example, no
overlapping IP address ranges within the same VPC, unique names if the
subnets are in the same region and project, and others we will review later.
The idea is that you can scale and extend your VPC with little configuration
changes and, most importantly, without having the need to recreate it.
There are a number of ways to build and configure VPCs. In this book, you
will learn how to build and configure the VPCs for your workloads using
the gcloud command-line interface (CLI).
You may wonder why we chose to use the gcloud CLI when there are
many other options to build and configure VPCs (and other infrastructure)
like Terraform and other tools, including the console.
The main reason is because you are expected to know the gcloud CLI to
pass the exam, and not necessarily Terraform or other tools.
Another reason is because the gcloud CLI is easy to use and comes free
with your (or your company’s) GCP account. It simply runs using a Cloud
Shell terminal from a modern browser upon sign-in to the console or from a
terminal session in your local machine. The latter requires to install the
gcloud CLI.
NoteTo learn how to install the gcloud CLI on your machine, see
https://cloud.google.com/sdk/gcloud#download_and_i
nstall_the.
Last, the gcloud command line is less likely to change than any other
tools—and typically, changes are backward compatible.
Figure 3-1 shows how to check the gcloud CLI version. Google
maintains the gcloud CLI and all the software available when you use a
Cloud Shell terminal.
Figure 3-1 Using the gcloud CLI from the Google Cloud Shell
All gcloud commands follow this informal syntax:
In the upcoming sections, you will learn how to create a VPC and how to
configure some of its key components, that is, subnets, firewall rules, and
routes.
Creating VPCs
In Google Cloud, VPCs come in two flavors: auto mode (default) and
custom mode.
Auto-mode VPCs automatically create a subnet in each region for you. You
don’t even have to worry about assigning an IP range to each subnet. GCP
will automatically assign an RFC 1918 IP range from a predefined pool of
RFC 1918 IP addresses.
If you want more control in the selection of subnets for your VPC and their
IP ranges, then use custom-mode VPCs. With custom-mode VPCs, you first
create your VPC, and then you manually add subnets to it as documented as
follows.
Creating Subnets
As shown in Figure 3-3, to create a subnet you must specify at a minimum
the VPCs the subnet is a part of—you learned in Chapter 2 that subnets are
partitions of a VPC—as well as the primary IP range in CIDR notation.
Exam tipYou cannot use the first two and the last two IP addresses in a
subnet’s primary IPv4 range. This is because these four IP addresses are
reserved by Google for internal use. In our preceding example, you cannot
use 192.168.0.0 (network), 192.168.0.1 (default gateway),
192.168.0.30 (reserved by Google for future use), and
192.168.0.31 (broadcast). The same constraint does not apply to
secondary IP ranges of a subnet (see the next paragraph for more
information about secondary ranges).
In addition to a primary IP range, a subnet can optionally be assigned up to
two secondary IP ranges. You learned in Chapter 2 that the first and the
second secondary IP ranges are used by GKE to assign IP addresses to its
pods and its services, respectively. More information about secondary IP
ranges will be provided in the upcoming sections.
Listing Subnets
You can list the subnets of your VPC as displayed in Figure 3-4.
Listing VPCs
Likewise, you can list the VPCs in your GCP project as shown in Figure 3-
5.
Figure 3-5 Listing the VPCs in your GCP project
Notice the list shows your-first-vpc and the default VPC, which has
one subnet in each region.
Deleting VPCs
If you want to delete a VPC, you need to make sure any resource that uses
the VPC has been deleted first.
1)
Lower latency and higher security: Traffic between two peered
VPCs is encrypted by default and always remains in the Google
Global Backbone—without traversing the Internet.
2)Lower cost: Since two peered VPCs use internal IP addressing to
communicate with each other, egress costs are lower than external IP
addresses or VPN, which both use connectivity to the Internet.
In the examples in Figures 3-8 to 3-11, you will create two VPCs with
two subnets each. You will then create two VMs, one in each VPC. Finally,
you will peer the two VPCs and verify that the two VMs can communicate
with each other.
Figure 3-8 Creating the first VPC and the first subnet
NoteUnlike default VPCs, custom VPCs require that you explicitly add the
default firewall rules to ssh (Secure Shell) or rdp (Remote Desktop
Protocol) to the VPC.
Figure 3-9 Creating the second subnet in the first VPC
Figure 3-13 shows the current setup. The other three VMs shown in
each subnet are for illustrative purposes to emphasize the internal routing
capability of a VPC.
Figure 3-13 A project containing two VPCs with eight VMs in each VPC
Figure 3-14 illustrates the creation of the two firewall rules. Notice the
direction (ingress or egress) is omitted because ingress is the
default value. Also, as you will learn later in this chapter, firewall rules
apply to the entire VPC.
Figure 3-14 Enabling ssh and ICMP (Internet Control Message Protocol) to the two VPCs
Let’s now log in to vm1 and test connectivity to vm2. As you can see in
Figure 3-15, the ping command will eventually time out because the two
VPCs where vm1 and vm2 reside are completely disjointed.
Figure 3-15 No connectivity exists between the two VMs.
Now, let’s peer the two VPCs as shown in Figures 3-16 and 3-17.
Vice versa, since the peering relation is symmetrical, we need to peer vpc2
to vpc1.
Once the peering has been established, vm1 can ping vm2 and vice
versa, as you can see in Figure 3-18.
A project is a container that you use to group and administer the Google
Cloud resources for your workload, for example, VMs, VPCs, principals,
storage buckets, GKE clusters, cloud SQL databases, and many others. The
resources can belong to any service type, for example, compute, network,
storage, databases, IAM, etc. In this context, a project is a construct
intended to enforce boundaries between resources.
In order to create a shared VPC, your GCP account must exist in the context
of an organization resource. Once an organization is available, you can
create a shared VPC in a project called the host project.
You then share this VPC with other projects in your organization called
service projects. Within a service project, you can create VMs (and other
compute resources) and connect them to some or all the subnets of the
Shared VPC you created in the host project. Since the VMs are created in
the service project, the billing account associated to the service project pays
for the VMs and the other compute resources connected to the shared
subnets.
This construct host-service project has the key benefit of letting you scale
your organization to thousands of cloud developers by centrally
administering your VPCs. This construct also furthers separation of
concerns because all developers in your organization will be focusing on
the development of their applications—within the boundaries of their own
GCP service project—while all network engineers will be focusing on the
administration of shared VPCs, within the boundaries of their GCP host
project.
In this section, you will learn how to provision a shared VPC with two
subnets. You will configure the first subnet subnet-frontend to be
accessed by user joseph@dariokart.com and the second subnet
subnet-backend to be accessed by user
samuele@dariokart.com.
Figures 3-24 and 3-25 illustrate the IAM allow policy setup for principals
joseph@dariokart.com and samuele@dariokart.com, respectively.
So far, we have assigned the necessary IAM roles to the three principals.
Next, we need to create the actual VPC and its two subnets (Figures 3-26 to
3-28).
The next step is to create the two service projects (Figure 3-30).
Figure 3-30 Creating the two service projects frontend-devs and backend-devs
Make sure each of the two newly created projects is linked to a billing
account. Remember that a project can only be linked to one billing account.
Also, remember that a billing account pays for a project, which owns
Google Cloud resources.
Figure 3-31 shows how to link the two newly created service projects to
a billing account. Notice how the project IDs (frontend-devs-7734,
backend-devs-7736) and not the project names (frontend-devs,
backend-devs) are required.
Also, the billing account ID has been redacted, given the sensitivity nature
of this data.
In order to establish a shared VPC, host and service projects must have the
compute API enabled. Figure 3-32 shows you how to enable it.
Figure 3-32 Enabling the compute API to service and host projects
Let’s make sure the newly enabled host project is listed as such in our
organization (Figure 3-34).
Now that you have enabled your host project, you will need to attach the
two newly created service projects, frontend-devs sharing the
subnet-frontend subnet and backend-devs sharing the subnet-
backend subnet.
The intent of this use case is to show you how to configure a shared VPC
with two subnets that are essentially mutually exclusive. As a result,
principals who have permissions to create compute resources in the
subnet-frontend subnet will not be able to create compute resources
in the subnet-backend subnet and vice versa.
As per the use case, we are going to configure two IAM allow policies for
the two subnets as follows:
First, we need to retrieve the current IAM allow policy for subnet-
frontend as displayed in Figure 3-37.
Therefore, we are going to add a new role binding that maps the principal
joseph@dariokart.com to the IAM role
roles/compute.networkUser. This role allows service owners to
create VMs in a subnet of a shared VPC as you will see shortly.
Last, let’s apply this IAM allow policy to our resource. This can be done
by using the gcloud beta compute networks subnets set-
iam-policy as illustrated in Figure 3-40.
For the sake of simplicity, we will first list the subnets each principal can
use, and then we will create a VM in each subnet. Finally, we will
demonstrate that the two VMs can effectively communicate with each other,
even though they are managed and billed separately, that is, they are owned
by two different (service) projects. The ability of the two VMs to
communicate is provided by design because they belong to different subnets
of the same VPC, and internal routing is provided by default.
Creating VMs
Next, let’s create two VMs: the first in subnet-frontend and the
second in subnet-backend. Figures 3-48 and 3-49 display the VM
creation.
Finally, let’s connect via the SSH protocol to each VM and verify the two
VMs can connect to each other.
Figure 3-51 shows the same process to determine the internal IP address
of vm2.
As you can see, the connectivity is successful. Let’s repeat the same test to
validate connectivity from vm2 to vm1.
As shown in Figure 3-53, even though there are 20% packet loss, the
test is still successful because the ping command was interrupted after just a
few seconds.
Deleting VMs
In order to avoid incurring unnecessary costs, if you no longer need the two
VMs we just created it is always a good idea to delete them. This will keep
your cloud cost under control and will reinforce the concept that
infrastructure in the cloud is ephemeral by nature.
Figures 3-54 and 3-55 display how to delete vm1 and vm2, respectively.
Exam tipNotice how gcloud asked in which zone the VMs are located.
Remember, as you learned in Chapter 2, VMs are zonal resources.
Each service project is linked to their own billing account, which pays for
the resources consumed by each service project. This feature furthers the
level of decoupling among resources.
It is not required that each service project shares the same billing account.
In fact, it is best practice that each service project be linked to its own,
separate billing account. This way—for example—cost incurred by
frontend developers are separated from cost incurred by backend
developers.
Projects are not the only way to administer a shared VPC. You can also
administer a Shared VPC using folders.
For example, if you want to isolate your billing pipeline VPC from your
web-app VPC, you can do so by creating two separate folders, each
containing a host project.
A separate shared VPC administrator for each host project can then set up
the respective Shared VPCs and associate service projects to each host
project.
1.
The all-apis bundle provides access to the same APIs as
private.googleapis.com.
2.
The vpc-sc bundle provides access to the same APIs as
restricted.googleapis.com.
The rationale about choosing between the two API bundles is based on the
level of security needed to protect your workloads. If your security
requirements mandate that you protect your workload from data
exfiltration, then your workload will need to consume the vpc-sc bundle.
In most of the remaining use cases, the all-apis bundle will suffice.
This section will teach you what you need to do to let compute resources in
your VPC (e.g., VMs, GKE clusters, serverless functions, etc.) consume the
Google APIs.
Private Google Access is a means to let VMs (or other compute resources)
in your VPC consume the public endpoint of Google APIs without
requiring an external IP address, that is, without traversing the Internet.
To enable GPA, make sure that the user updating the subnet has the
permission
compute.subnetworks.setPrivateIpGoogleAccess. In our
dariokart.com organization, user Gianni needs this permission, and the
role roles/compute.networkAdmin contains this permission. In
Figure 3-57, we update the IAM allow policy attached to the organization,
with a binding that maps the user Gianni to the
roles/compute.networkAdmin.
Next, as shown in Figure 3-58, we update the subnet that requires PGA
and validate the change.
Private Service Connect is another way for VMs (or other compute
resources) running in your VPC to consume Google APIs.
The idea is to allow your VMs to consume Google APIs using only RFC
1918 connectivity.
1.
Avoid using the API public endpoint, for example,
https://storage.googleapis.com.
2.
Allow a direct, private link from an endpoint in your VPC and the
target endpoint of the API your VM needs to consume.
Exam tipThe key difference between PSC and PGA is that Private Google
Access still uses external IP addresses. It allows access to the external IP
addresses used by App Engine and other eligible APIs and services. PSC
lets you access Google APIs via internal IP addresses instead, always
keeping traffic in the Google Global Backbone.
Let’s configure a PSC endpoint in our shared VPC to provide VMs with
access to the all-apis bundle.
First and foremost, you need to make sure your network administrator
has the following roles:
roles/servicedirectory.editor
roles/dns.admin
To discover the exact name of the Service Directory API and the Cloud
DNS API, use the gcloud command in Figure 3-61.
Figure 3-61 Resolving Service Directory API and the Cloud DNS API names
Now, we are ready to enable the two required APIs in the project
(Figure 3-62).
Figure 3-62 Enabling the Service Directory API and the Cloud DNS API
Another prerequisite to enable PSC is that the subnet from which the VMs
will consume the all-apis bundle must have PGA enabled. We already
enabled PGA for subnet-frontend in the previous section.
Now, log in as the Shared VPC administrator, and follow the steps to
create a private IP address (Figure 3-63) and a PSC endpoint (Figure 3-64).
Next, in Figure 3-69, we list the content of our bucket using the
gsutil command from vm2.
As you can see, the command returned the newly created file a.txt. This
confirms that the VM can consume the GCP storage API internally by using
a Private Service Connect endpoint.
VPCs are designed for extensibility. When you design the overall network
architecture for your workloads, you have to make a number of choices, one
of them being how many IP addresses are needed for each component of
your workload. Obviously, this number is an estimate, yet this figure has to
take into consideration the rate of scale for each architectural component of
your workload, for example, frontend web servers, backend servers,
database servers, and others.
The good news is that you can expand the subnets of your VPC if you
discover that the mask you used for your subnet IP ranges is too small.
However, there are a few caveats you need to be aware of in these
scenarios.
First, you can only expand the primary IP range of a subnet, and this
operation cannot be undone, that is, you cannot shrink the subnet primary
IP range once it’s been defined.
Second, the primary IP range of a subnet that is used exclusively for load
balancer proxies cannot be expanded. You will learn more about this
constraint in Chapter 5.
Last, if your VPC is auto-mode, there is a limit of /16 for the mask you are
allowed to use as the broadest prefix.
Configuring Routing
Each VM has a controller that is kept informed of all applicable routes from
the VPC’s routing table. Each packet leaving a VM is delivered to the
appropriate next hop of an applicable route based on a routing order. When
you add or delete a route, the set of changes is propagated to the VM
controllers by using an eventually consistent design.
Static Routes
Another required flag is the next hop, which can be one, and one only,
of the following flags:
The relevant optional flags you need to know for the exam are
Dynamic Routes
Custom dynamic routes are managed by one or more cloud routers in your
VPC. You no longer have to create routes using the gcloud command
because a cloud router does the job for you.
Dedicated Interconnect
Partner Interconnect
HA VPN
Classic VPN with dynamic routing
You may be thinking that by using dynamic routing instead of static routing,
we just moved the problem to the cloud router. After all, we still need to
create a cloud router responsible for advertising newly defined routes to its
peer. Conversely, we need to create a cloud router responsible for learning
newly defined routes from its peer.
Put differently, whether you choose to create custom static routes or you
choose to create a cloud router that does the job for you, you still need to
create something manually, for example, using gcloud. So, what’s the
advantage of using dynamic routing vs. static routing?
In conclusion, static routes are great for small networks, which are not
often subject to change. For large enterprises, using static routes is simply
not a sustainable solution. That’s when a cloud router “shines.” Let’s see
how you create a cloud router (Figures 3-78 and 3-79).
The optional flags you need to know for the exam are
Routing Order
A route is a rule that specifies how certain packets should be handled by the
VPC. Routes are associated with VMs by tag, and the set of routes for a
particular VM is called its routing table. For each packet leaving a VM, the
system searches that VM’s routing table for a single best matching route.
Routes match packets by destination IP address, preferring smaller or more
specific ranges over larger ones (see --destination-range). If there
is a tie, the system selects the route with the smallest priority value (see --
priority). If there is still a tie, it uses the layer 3 and 4 packet headers to
select just one of the remaining matching routes.
For dynamic routes, the next hop will be determined by the cloud router
route advertisement.
Packets that do not match any route in the sending virtual machine routing
table will be dropped.
The behavior of your cloud router depends on whether the VPC it’s
associated with has been created with regional (default) or global BGP
routing mode.
Let’s review this setting for our shared VPC (Figure 3-80).
Figure 3-80 BGP routing mode default assignment in your-app-shared-vpc
As highlighted, our shared VPC uses the default value REGIONAL. This
means that any cloud router associated with our shared VPC—remember,
routing is a network-wide configuration—advertises subnets and propagates
learned routes to all VMs in the same region where the router is configured.
If you want your cloud router to advertise subnets and propagate learned
routes to VMs in all subnets of your VPC—regardless of which region they
belong to—then set the –bgp-routing mode flag to GLOBAL, as
indicated in Figure 3-81.
Figure 3-81 Updating BGP routing mode flag in your-app-shared-vpc
Let’s visually explain this concept with the diagram in Figure 3-82.
Figure 3-82 Route exchange between two VPCs configured with bgp-routing-mode set to REGIONAL
Behind the scenes, there are two Cloud VPN gateways, one in each VPC
network, which are responsible for enabling the secure IPsec channels.
What is relevant to this diagram are the two cloud routers, which are
regional resources—both in us-east1—and are responsible for
establishing a BGP session in which subnet and custom routes are being
exchanged.
Let’s focus on the two regional route tables now: one for region us-east1
and another for region us-central1. I intentionally chose to group
routes by region because it’s important that you relate the concept of a route
to a physical region. After all, packets of data are being exchanged across a
medium, which at the physical layer of the OSI model (layer 1) is
implemented by interconnecting physical components, such as network
interface controllers (NICs), Ethernet hubs, network switches, and many
others.
As you can see, the first row in the us-east1 regional table shows a
subnet route advertised by router-a to router-b.
In addition to the network, destination (prefix), and next hop, the cloud
routers include the priority for the advertised route.
You control the advertised priority by defining a base priority for the
prefixes. We will show you how to set up this value in the upcoming section
“Updating the Base Priority for Advertised Routes.” For the time being, we
are using the base priority default value of 100.
NoteThe region for both advertised subnet routes matches the region of the
cloud routers, that is, us-east1.
On the other hand, the us-central1 regional table shows no routes for
any of the two VPCs. This is because the BGP routing mode setting to
REGIONAL in both VPCs has caused the two cloud routers to only
exchange with each other’s routes related to their own region, that is, us-
east1. As a result, VMs in 192.168.9.0/27 cannot reach any VMs in
192.168.0.0/27 or 192.168.1.0/27. Likewise, VMs in
192.168.1.0/27 cannot reach any VMs in 192.168.8.0/27 or
192.168.9.0/27.
Now let’s update both VPCs by setting the BGP routing mode to
GLOBAL. With this simple change as illustrated in Figure 3-83, it’s a very
different story. Let’s see why.
Figure 3-83 Route exchange between two VPCs configured with bgp-routing-mode set to GLOBAL
First, the two us-east1 regional routes we reviewed before are still there
(rows 1 and 3), but this time router-a and router-b are no longer
limited to advertising routes from their own region. Instead, a VPC
configured to use bgp-routing-mode set to GLOBAL always tells its
cloud routers to advertise routes (system-generated, custom, peering) from
any region spanned by the VPC.
You’ll probably wonder why even bother using the default value
REGIONAL of bgp-routing-mode for a VPC if the number of routes
advertised is limited to a single region.
Here is the caveat, and this is why I wanted to reiterate the fact that a cloud
router is a regional resource to denote that it is always tied to a region.
Inter-region routes add latency and egress charges to traffic leaving the
region through the advertised route. As a result, an inter-region cost is
added to the base priority of the route.
You learned so far that a cloud router associated with a VPC configured
with BGP routing mode set to GLOBAL advertises routes in any regional
table the VPC spans across—not just the region where the cloud router
operates.
Unlike custom static routes, which are created by Google in all regional
tables with the same priority, custom dynamic routes automatically add into
consideration the inter-region cost when they are advertised by a cloud
router across regions.
The only required argument is the name of your cloud router, that is, NAME.
If you don’t specify a region using the --region flag, Google will try to
determine the region where the router you are inquiring about operates.
Assuming a cloud router in the specified region exists, the output of this
command includes the following three sections:
In the previous example, you learned the effect of toggling the --bgp-
routing-mode flag in a VPC. We mentioned that you control the value
of the base priority assigned during a BGP session between a cloud router
and its peer. Keep in mind that the value you choose applies to all routes
advertised during a BGP session.
You need to be mindful on how to choose a suitable value for route base
priority because this value has an impact on how dynamic routes are chosen
during a conflict, for example, when multiple routes with the same
destination are available.
Here are a few guidelines on what to consider when setting a value B for
advertised route base priority:
With this in mind, you can set the base priority for advertised routes
with the gcloud compute routers update-bgp-peer command
(Figures 3-86 and 3-87).
At a minimum, you need to specify the name of your cloud router NAME
and the name of its BGP peer PEER_NAME and substitute the highlighted
ADVERTISED_ROUTE_PRIORITY with the value B we just discussed.
First, let’s start from the example in Figure 3-83 and connect the two VPC
networks with another pair of HA VPN gateways (one gateway per VPC),
this time between the two subnets subnet-backend in the us-
central1 region.
Each HA VPN gateway has a cloud router associated with it. Figure 3-
88 illustrates the updated topology, where you can see router-c and
router-d located in the region us-central1 of your-app-
shared-vpc and your-app-connected-vpc, respectively.
Figure 3-88 Adding another pair of HA VPN gateways between us-central1 subnets
There are no more routes with priority 510 (lowest priority in the
regional routing tables).
Each subnet has exactly two routes with the same priority, that is,
305.
If multiple custom routes exist for the same destination and have the same
priority and an active (IKE (Internet Key Exchange) SA (Security
Association) established) next hop tunnel, Google Cloud uses equal-cost
multipath (ECMP) routing to distribute packets among the tunnels.
This balancing method is based on a hash, so all packets from the same
flow use the same tunnel as long as that tunnel is up.
To achieve this, you need to update the base route priority for advertised
routes that use the other HA VPN tunnels, that is, tunnel1-a and
tunnel1-b. Remember, higher numbers mean lower priorities. You can
update this value by using the gcloud compute routers update-
bgp-peer command as illustrated in Figure 3-87. In our use case, by
setting a value of 10200 (as shown in Figure 3-89), we are sure that any
other path leading to the same destination has a higher priority, including
the path that uses tunnel2-c—our preferred route.
Figure 3-90 illustrates the effect of updating the base priority for
advertised routes.
Keep in mind you don’t have to take the advertised base route priority of
the tunnel1-a route to this extreme (10200) in order to make it lower
priority than the route that uses tunnel2-c. In fact, any number greater
than 100 would have been sufficient to make the tunnel1-a route a less
preferable path than tunnel2-c. However, since you don’t have control
over the inter-region costs (as of the writing of this book, its range is 201 ≤
C ≤ 9999), it is always a good idea to use a large number.
Another example of a routing policy is the use of network (or instance) tags
to limit the VMs the route will be applied to. This type of routing policy is
specific to static routes, because with static routes you manually create the
route. As a result, you have direct access to the --tags flag as explained
in the “Static Routes” section.
Instead, with dynamic routes a cloud router manages the routes for you, and
the --tags flag is not available to limit the VMs that can use the
advertised dynamic routes. In fact, the only way to limit the VMs is by
using the --set-advertisement-ranges flag during creation or
update of a cloud router. In other words, the selection of VMs the dynamic
routes apply to is by CIDR block only, and not by instance tag.
To learn how a routing policy with tags works, let’s say you are a member
of the network administrators group for your company. You installed
Network Address Translation (NAT) software in a VM you created in a
subnet of your Google Cloud shared VPC.
As you can see, this time the gcloud CLI reported an error because it
expected an existing VM in order to add to it the server tag.
Instead, for new VMs we should have used the gcloud compute
instances create command with the --tags=server flag.
Figure 3-94 NAT implementation with internal load balancer as a next hop
There are a few design constraints you need to know when using an ILB
as a next hop:
You learned in the beginning of this chapter the construct of VPC peering as
a way to extend the internal routing capabilities of a VPC to another VPC.
When you establish a peering relation between two VPCs, the subnet routes
of each VPC are automatically exchanged between the two. You don’t have
to worry about it—after all, the definition refers to extension of internal
routing, which means subnet routes.
But what happens to the custom routes you or a Cloud Router might have
created in each VPC?
To answer this question, let’s walk you through an example of a hub and
spoke topology (Figure 3-95).
Figure 3-95 Hub and spoke with internal load balancer as a next hop
The network vpc1 is a hub network, which exposes shared capabilities to a
number of consumers—or spokes.
The consumer networks vpc3 and vpc4 are peered to the hub and use
these capabilities to fulfill the business and technical requirements for their
workloads.
To allow for scalability, we created an ILB in the hub network, which will
distribute incoming TCP/UDP traffic to the interface nic0 of four
backends nva-be1, nva-be2, nva-be3, and nva-be4. Each of these
backends is a network virtual appliance, which allows IP forwarding and
uses another interface nic1 to route traffic to vpc2 following packet
inspection.
Likewise, since vpc1 and vpc4 are peered, vm3 and vm4 can also ping
vm5 and vice versa.
For packets egressing the VMs in the spokes to reach VMs (vm6) in vpc2,
you need to configure your peering relations in the hub network—vpc1-
vpc3 and vpc1-vpc4—so that custom routes from the hub can be
exported into the spokes.
The diagram shows the gcloud commands you need to know in order to
create peering relations with the ability to export or import custom routes.
In our example, the hub VPC has a custom static route goto-ilb, which
routes traffic to an internal TCP/UDP load balancer, configured to distribute
incoming traffic to four multi-NIC NVAs.
The top of the diagram also shows the creation of two hub peerings:
The exact same setup can be performed for all other spokes, for example,
vpc4, thereby allowing vm3 and vm4 to send packets to vm6.
Without explicitly configuring custom route export from the hub (or
provider) network and custom route import from the spoke (or consumer)
network, no packets sent from any VM in the spokes would have reached
VMs in vpc2.
One of the ways to classify GKE clusters is based on how pods route traffic
with each other. A cluster that uses alias IPs is called a VPC-native cluster.
A cluster that uses Google Cloud routes is called a route-based cluster.
Put differently, with alias IP ranges you don’t have to create custom static
routes to let pods communicate between each other or to reach resources
outside of the cluster.
Provided firewall rules and network policies allow traffic (we’ll review
these topics in the upcoming sections), since the cluster is VPC-native each
pod can route traffic to and from any other pod in the cluster by leveraging
subnet-a’s alias IP range.
Exam tipThe inter-pod routes denoted in blue come free with VPC-native
clusters and don’t count against the project route quota.
Now that you learned what VPN-native clusters are and why you should
use them, you may wonder how do you create one. The “secret” to create a
VPC-native cluster is to leverage the --enable-ip-alias flag in the
gcloud container clusters create command, as described in
Figure 3-97.
Keep in mind that when you use the --enable-ip-alias flag to create
your VPC-native GKE cluster, you have the option to select your own
CIDR ranges for the pods and services, or you can let Google do the job for
you.
This approach gives you all the tools and capabilities to allow for expansion
of your DevOps teams while keeping the necessary level of isolation
between apps.
Whether your company’s DevOps teams are skilled to build apps running in
VMs, or in containers, the shared VPC network design will still work.
However, when the target artifacts are containerized apps instead of apps
running in VMs, there are a few caveats you need to be aware of.
First, if you intend to run your containerized apps in a shared VPC using
GKE, your GKE clusters must be VPC-native. This means when you create
your cluster—assuming you use the gcloud CLI—you must use the --
enable-ip-alias flag.
Exam tipOnce your route-based GKE cluster has been created, if you
change your mind and decide to make it VPC-native, you cannot update the
cluster. The --enable-ip-alias flag is only available at creation
time. As a result, before spinning up your clusters, make sure you have a
good understanding of the business and the technical requirements for your
apps. Do you need a route-based or a VPC-native cluster? Will the cluster
live in a standalone or a shared VPC network?
Second—as you will see in the upcoming example—a principal with the
network admin role in the host project must create the subnet where your
cluster will live as well as its secondary IP ranges for its pods and services.
As you learned in the “Shared VPC Deep Dive” section, the service admin
who will create the cluster must have subnet-level IAM permissions to
create the cluster in the subnet being shared.
To summarize, when you deploy a GKE cluster in a shared VPC, the cluster
must be VPC-native, its alias IP assignment must be user-managed, and a
special role needs to be granted in the host project to the each service
project’s GKE service account.
In the upcoming sections, you will learn how to set up the necessary
compute infrastructure for each developer team (frontend and backend) in
the form of GKE clusters using our original shared VPC your-app-
shared-vpc.
First and foremost, you need to enable the container API in the host project
and in all the service projects whose shared subnets will be hosting the
intended clusters (Figure 3-98).
Figure 3-98 Enabling the container API to service and host projects
In order to get the name, we need to find out what the project number is
(Figure 3-99).
You will see in the next section how the project number relates to the GKE
service account name.
Notice the syntax of the policy file in JSON format with the sections I
described in the previous note, including bindings, members, and
role (Figure 3-101).
Figure 3-101 Viewing subnet-frontend-policy.json IAM policy
Now we let the project built-in service accounts and the GKE service
account do the same (Figure 3-102).
Finally, let’s enforce the newly updated IAM policy to the subnet
subnet-backend.
Figure 3-107 Applying IAM policy to subnet-backend
Figure 3-108 Updating host project IAM policy with binding for subnet-frontend
Figure 3-109 Updating host project IAM policy with binding for subnet-backend
Before creating the clusters in our service projects, let’s make sure the IAM
permissions for our principals have been correctly set up. The principals we
will be using are the two developers joseph@dariokart.com and
samuele@dariokart.com, who have permissions to deploy compute
infrastructure (GKE clusters) and build their apps in their own (service)
projects, that is, frontend-devs and backend-devs, respectively.
Notice the command is similar to the one we used in the shared VPC deep
dive example at the beginning of this chapter, with the difference that the
second keyword in gcloud is container instead of compute.
As highlighted in blue, the IP ranges for the pods and services look good as
per our specification.
As highlighted in green, the IP ranges for the pods and services look good
as per our specification.
At this point, all the preliminary steps to deploy our clusters in our shared
VPC have been completed.
The command is quite long. Let’s find out why the command failed.
As a result, the overall CIDR range for pods should have been at least
192.168.13.0/23 instead of the existing 192.168.13.0/24. This
miscalculation has caused the cluster creation to fail due to IP space
exhaustion in the pod range.
Now you know how to fix the error! As you correctly guessed, let’s try
again, but this time by forcing a limit on the maximum number of pods per
node. Instead of the default value of 256 (actually, to reduce address reuse,
GKE uses 110 as the default maximum number of pods per node), let’s use
a maximum of 10 pods per node, as shown in Figure 3-113.
Figure 3-113 Successful attempt to create a cluster in subnet-frontend
The deployment will take a few minutes. Upon completion, make sure to
review the summary of the deployment, which indicates, among others, the
cluster name, the location, the master IP address (the control plane
endpoint), the worker node’s machine type, the number of worker nodes,
and the status.
Let’s make sure the worker nodes use the correct CIDR range
192.168.0.0/26 (Figure 3-114).
Figure 3-114 Listing GKE frontend-cluster worker nodes
Let’s make sure the worker nodes use the correct CIDR range
192.168.1.0/26 (Figure 3-116).
Figure 3-117 illustrates the resources we created so far and how they
relate to each other from network, security, and cost points of view.
Testing Connectivity
In this example, we will use the Secure Shell (SSH) protocol to connect into
one of the two frontend-cluster worker nodes, and we will perform
a few connectivity tests.
With our travel metaphor, there may be routes connecting location A with
location B, but if a checkpoint in the middle won’t allow traffic, you won’t
make it to location B.
In this example, the VPC your-app-shared-vpc was already
configured to allow ingress SSH, TCP, and ICMP (Internet Control
Message Protocol) traffic as illustrated in Figure 3-29.
Nevertheless, let’s list all firewall rules for the VPC and double-check
(Figure 3-118).
Notice that behind the scenes GKE—as a fully managed Kubernetes engine
—has already created ingress firewall rules to allow traffic into the VPC
using a predefined group of protocols and ports.
When I say “can use” I mean they can natively consume containerized apps
in subnet-backend. There is no intermediary between the two, which is
good because latency is reduced and risks of packet losses or even data
exfiltration are minimized.
This native connectivity is the product of Shared VPC and VPC-native
clusters.
This time, the ping failed, why? The reason it failed is because we told
GKE to create a cluster with a maximum of ten pods per node and an initial
size of two nodes (see Figure 3-115). The available IP space we set up the
backend-cluster with was /24 (see pod-cidr-backend in Figure
3-111) for a total of 256 IP addresses.
By access we mean ingress and egress traffic directed to and from the pods
of your GKE cluster. By control we mean validation at the IP address,
protocol (TCP, UDP, SCTP), and port level (OSI layer 3 or 4).
Network policies are coded in the form of a file—this approach is also
called policy-as-code. This file tells the GKE API server what entities are
allowed to communicate with your pods.
Other pods
Namespaces
CIDR blocks
You specify which pods or namespaces you are allowing ingress or egress
traffic by using a selector. Allowed entities (pods, namespaces) are the ones
that match the expression in the selector. Selectors are not used to specify
CIDR blocks.
The best way to explain how network policies work is with an example.
Let’s say you want a containerized web server app labeled app=hello to
only receive incoming requests from another containerized app labeled
app=foo. This will be our first use case.
Conversely, you want the containerized app labeled app=foo to only send
traffic to the containerized web app labeled app=hello, and nothing else.
This will be our second use case.
The GKE sample apps provide great examples on how to use GKE. We will
use this repo to explain how network policies work.
The gcloud CLI, which is available with Google Cloud Shell, is all you
need to go through this example because it comes with the kubect utility
(among many other developer runtimes), which is the official Kubernetes
command-line tool.
Figure 3-127 Cloning the GKE sample app repo from GitHub
First, we need to tell the GKE API server that the GKE cluster where the
two containerized apps will live will enforce network policies.
Enabling the container API is the first prerequisite to create and use a GKE
cluster. Without this action, you won’t be able to create your GKE cluster.
NoteIf you performed the cluster with Shared VPC deep dive example, the
container API has already been enabled for the project frontend-devs,
so you won’t need to perform this step. Also, in this example, we will use
the default VPC, and not your-app-shared-vpc. The default
VPC is an auto-mode network that has one subnet in each region. On the
other hand, our shared VPC is a custom-mode network with only two
subnets subnet-frontend in us-east1 and subnet-backend in
us-central1.
Next, let’s create our GKE cluster (Figure 3-129). Notice that a region or a
zone must be specified to denote respectively a regional or a zonal cluster.
Let’s run the containerized web app first (Figure 3-130). The web app
will expose the endpoint http://hello-web:8080.
The network policy in the first use case is intended to limit incoming
traffic to only requests originating from another containerized application
labeled app=foo. Let’s review how this rule is expressed in the form of
policy-as-code by viewing the hello-allow-from-foo.yaml file
(Figure 3-131).
Figure 3-131 Viewing cluster test ingress network policy
The network policy is a YAML file (Figure 3-132), and there are a few
things to mention:
Let’s now apply this ingress network policy to the cluster (Figure 3-
133).
To validate the ingress network policy, let’s run a temporary pod with a
containerized app labeled app=foo, and from the pod, let’s make a request
to the endpoint http://hello-web:8080.
With the ingress network policy in effect, the former request succeeded,
whereas the latter timed out.
Let’s view the manifest for this egress network policy (Figures 3-136
and 3-137).
The manifest declares the policy type as egress, and it applies to pods in
the test cluster whose label’s key app matches the value foo.
Notice the egress section denotes two targets, that is, pods whose label’s
key app matches the value hello and the set of protocol-ports pairs
{(TCP, 53), (UDP,53)}.
Let’s now apply this egress network policy to the cluster (Figure 3-138).
Figure 3-138 Enforcing cluster test egress network policy
As you can see in Figure 3-139, to validate the egress network policy all we
have to do is to run a temporary pod with a containerized app labeled
app=foo and from the pod make requests to
With the egress network policy in effect, the only allowed outbound
connection from the containerized app labeled app=foo is the endpoint
http://hello-web:8080. This is because this endpoint is exposed by
a service that lives in a pod labeled app=hello, which is an allowed
connection as shown in the egress section of Figure 3-137.
Since there is no entry denoted by app=hello-2 in the egress section of
the manifest YAML file, the containerized app labeled app=foo is not
allowed to connect to the endpoint http://hello-web-2:8080.
As a good practice to avoid incurring charges, let’s now delete our test
GKE cluster, which is still up and running along with its three e2-medium
nodes all located in us-east1-c (Figure 3-140).
Upon deleting the cluster, the list is now empty (Figure 3-142).
Additional Guidelines
There are a couple of things to remember about cluster network policies,
which are important for the exam.
First, in a cluster network policy, you define only the allowed connections
between pods. There is no action allowed/denied like in firewall rules (you
will learn more about firewall rules in the following section). If you need to
enable connectivity between pods, you must explicitly define it in a
network policy.
Second, if you don’t specify any network policies in your GKE cluster
namespace, the default behavior is to allow any connection (ingress and
egress) among all pods in the same namespace.
In other words, all pods in your cluster that have the same namespace can
communicate with each other by default.
If you want to deny only ingress traffic to your pods, but allow egress
traffic, just remove the Egress item from the list in the policyTypes
node as shown in Figure 3-144.
Conversely, if you want to allow incoming traffic for all your pods, just
add the ingress node and indicate all pods {} as the first item in the list
as illustrated in Figure 3-145.
Figure 3-145 Allow all ingress network policy
Before we start this section, let’s refresh our knowledge on how GKE
works.
1.
Control plane(s): A group of VMs in a google-owned-project, which
hosts the Kubernetes API server, the Kubernetes scheduler, the
Kubernetes key-value store (named etcd), the Google Cloud
controller manager, and the core controller manager. Zonal clusters
(e.g., --zone=us-central1-c) have a single control plane.
Regional clusters (e.g., --region=us-central1) have replicas
of the control plane in different zones of the same region where the
cluster lives.
Worker nodes: Another group of VMs in your-project, where each
g p y p j ,
2.
VM hosts the container runtime, an agent named kubelet—
responsible for ensuring containers are running healthy in a pod—and
a network proxy named kube-proxy, responsible for the interaction
with other worker nodes in the cluster and the interaction with the
control plane.
The control plane is the single pane of glass for your GKE cluster, and the
Kubernetes API server is the hub for all interactions with the cluster. The
control plane exposes a public endpoint by default and a private endpoint.
Both endpoints can be directly accessed via the HTTPS or gRPC (gRPC
Remote Procedure Call) protocols.
The VPC network where the control plane Google-managed VMs live is
peered with the VPC where your cluster worker nodes live. The worker
nodes and the control plane VMs interact with each other using the
Kubernetes API server.
In the previous example (Figure 3-117), each worker node had an internal
IP address denoted in green and an external IP address denoted in pink.
You learned that the internal IP address is used to route RFC 1918 traffic
between VMs in the same VPC network. It doesn’t matter if the two VMs
are located in the same subnet or in different subnets. The only two
constraints for the traffic to flow are that the subnets are part of the same
VPC network and the VPC firewall rules allow traffic to flow between the
two VMs.
This network design can be accomplished by creating your cluster with the
--enable-private-nodes flag.
Figure 3-146 illustrates how the control plane can be accessed from
authorized CIDR blocks.
Figure 3-146 Access to GKE control plane endpoints
Now, how do you tell GKE which CIDR block is authorized to access the
cluster control plane?
Going from the top-right box down, more access restrictions are added,
resulting in a reduction of the attack surface of your cluster’s control plane.
VPC scope: By default, firewall rules are applied to the whole VPC
network, not its partitions, that is, its subnets.
Network tag target: However, you can restrict the scope of a
firewall rule to a specific group of VMs in your VPC. This is where the
concept of a target comes into play. You can configure the firewall rule
to only target a set of VMs in your VPC by adding a network tag (also
referred to as instance tag) to a specific group of VMs and then by
applying the firewall rule to the VMs with that tag.
Service account target: You can also configure a firewall rule to
only target specific VMs by selecting their associated service account.
To do so, choose the specified service account, indicate whether the
service account is in the current project or another one under Service
account scope, and set the service account name in the Source/Target
service account field.
VM-to-VM traffic control: You can also use firewall rules to
control internal traffic between VMs by defining a set of permitted
source machines in the rule.
Figure 3-148 shows the global, distributed nature of two firewall rules
for our vpc-host-nonprod VPC. As you can see, the firewall
protection spans the whole perimeter of the VPC, which includes subnets in
two different regions.
Figure 3-148 Distributed firewall in a VPC
The first firewall rule allows only incoming traffic over the TCP protocol
and port 443 targeting the VMs denoted by the web-server network tag.
The second firewall rule denies incoming traffic over the TCP protocol and
port 5432 targeting the VMs denoted by the db-server network tag.
NoteThe Source CIDR blocks in Figure 3-148 refer to Google Front Ends
(GFEs), which are located in the Google Edge Network and are meant to
protect your workload infrastructure from DDoS (Distributed Denial-of-
Service) attacks. You will learn more about GFEs in Chapter 5.
Target Network Tags and Service Accounts
You can use network tags or service accounts (one of the two, not both) to
selectively target the VMs in your VPC you want to apply firewall rules on.
Keep in mind that when you add a network tag to a VM, and subsequently
use the network tag as the target in a firewall rule, there is no additional
access-control check that happens by default. Nobody keeps one from
creating a network tag whose instance can anonymously expose sensitive
data (e.g., PII (Personally Identifiable Information), PHI (Protected Health
information), PCI (Payment Card Industry) data). To prevent this security
risk, GCP has introduced the ability to create firewall rules that target
instances associated to service accounts.
Exam tipService accounts and network tags are mutually exclusive and
can’t be combined in the same firewall rule. However, they are often used
in complementary rules to reduce the attack surface of your workloads.
The target of a firewall rule indicates a group of VMs in your VPC network,
which are selected by network tags or by associated service accounts. The
definition of a target varies based on the rule direction, that is, ingress or
egress.
If the direction is ingress, the target of your firewall rule denotes a group of
destination VMs in your VPC, whose traffic from a specified source outside
of your VPC is allowed or denied. For this reason, ingress firewall rules
cannot use the destination parameter.
Conversely, if the direction is egress, the target of your firewall rule denotes
a group of source VMs in your VPC, whose traffic to a specified destination
outside of your VPC is allowed or denied. For this reason, egress firewall
rules cannot use the source parameter.
Similarly to routes, firewall rules are defined on a per-VPC basis. You don’t
associate a firewall rule to a single subnet or a single VM. As shown in
Figure 3-149, with the gcloud compute firewall-rules
create command, you specify the VPC the firewall rules is associated to
by setting the flag --network to the name of the VPC you want to
protect.
Figure 3-149 gcloud compute firewall-rules create syntax
Use the parameters as follows. More details about each parameter are
available in the SDK reference documentation.
Priority
A rule with a deny action overrides another with an allow action only if the
two rules have the same priority. Using relative priorities, it is possible to
build allow rules that override deny rules, and vice versa.
Example
The priority of the second rule determines whether TCP traffic on port
80 is allowed for the webserver network targets:
If the priority of the second rule > 1000, it will have a lower priority,
so the first rule denying all traffic will apply.
If the priority of the second rule = 1000, the two rules will have
identical priorities, so the first rule denying all traffic will apply.
If the priority of the second rule < 1000, it will have a higher priority,
thus allowing traffic on TCP 80 for the webserver targets. Absent
other rules, the first rule would still deny other types of traffic to the
webserver targets, and it would also deny all traffic, including TCP 80,
to instances without the webserver network tag.
If you do not
specify a
protocol, the
No protocol firewall rule
—
and port applies to all
protocols and
their applicable
ports
Specification Example Explanation
If you specify a
protocol
without any
port
information,
Protocol tcp the firewall
rule applies to
that protocol
and all of its
applicable
ports
If you specify a
protocol and a
single port, the
Protocol and
tcp:80 firewall rule
single port
applies to just
that port of the
protocol
Specification Example Explanation
If you specify a
protocol and a
port range, the
Protocol and
tcp:20-22 firewall rule
port range
applies to just
the port range
for the protocol
You can
specify various
combinations
of protocols
and ports to
icmp,tcp:80,tcp:443,udp:67- which the
Combinations
69 firewall rule
applies. For
more
information,
see creating
firewall rules
Direction
The direction of a firewall rule can be either ingress or egress. The direction
is always defined from the perspective of your VPC.
The ingress direction describes traffic sent from a source to your VPC.
Ingress rules apply to packets for new sessions where the destination of
the packet is the target.
The egress direction describes traffic sent from your VPC to a
destination. Egress rules apply to packets for new sessions where the
source of the packet is the target.
If you omit a direction, GCP uses ingress as default.
Example
In the context of implementing VPCs, these are the important points you
need to know for the exam:
Firewall Rules Logging allows you to audit, verify, and analyze the
effects of your firewall rules. For example, you can determine if a
firewall rule designed to deny traffic is functioning as intended. Logging
is also useful if you need to determine how many connections are
affected by a given firewall rule.
You enable Firewall Rules Logging individually for each firewall rule
whose connections you need to log. Firewall Rules Logging is an option
for any firewall rule, regardless of the action (allow or deny) or direction
(ingress or egress) of the rule.
When you enable logging for a firewall rule, Google Cloud Platform
(GCP) creates an entry called a connection record each time the rule
allows or denies traffic. You can export these connection records to
Cloud Logging, Cloud Pub/Sub, or BigQuery for analysis.
Each connection record contains the source and destination IP addresses,
the protocol and ports, date and time, and a reference to the firewall rule
that applied to the traffic.
Tables 3-2 and 3-3 summarize the Google Cloud firewall rule syntax.
Table 3-2 Ingress firewall rule description
T
Target
Priority Action Enforcement (Defines the Source
Destination)
The target
parameter
specifies
the
destination. One of the
Integer It can be following:
from 0 one of the • Range of
(highest) Either Either following: addresses;
to 65535 allow enabled • All default is a
(lowest), or (default) or instances in (0.0.0.
inclusive; deny disabled the VPC • Instances
default network service acc
1000 • Instances • Instances
by service network ta
account
• Instances
by network
tag
Exam tipFor the exam, you will need to remember that destination ranges
are not valid parameters for ingress firewall rules. Likewise, source ranges
are not valid parameters for egress rules. A good way to remember this is
by memorizing the timezone acronyms IST and EDT, respectively, for
ingress rules and egress rules: in the former scenario (Ingress direction),
you use Source and Target parameters, whereas in the latter (Egress
direction), you use Destination and Target parameters only.
Exam Questions
Question 3.1 (Routing)
A.
Create a network tag with a value of backup for the new static route.
B.
Set a lower priority value for the new static route than the existing
static route.
C.
Set a higher priority value for the new static route than the existing
static route.
D.
Configure the same priority value for the new static route as the
existing static route.
Rationale
You create a VPC named Prod in custom mode with two subnets, as shown
in Figure 3-150. You want to make sure that (1) only app VM can access
the DB VM instance, (2) web VM can access app VM, and (3) users outside
the VPC can send HTTPS requests to web VM only. Which two firewall
rules should you create?
A.
Block all traffic from source tag “web.”
B.
Allow traffic from source tag “app” to port 80 only.
C.
Allow all traffic from source tag “app” to target tag “db.”
D.Allow ingress traffic from 0.0.0.0/0 on port 80 and 443 for target tag
“web.”
E.
Allow ingress traffic using source filter = IP ranges where source IP
ranges = 10.10.10.0/24.
Rationale
You created two subnets named Test and Web in the same VPC network.
You enabled VPC Flow Logs for the Web subnet. You are trying to connect
instances in the Test subnet to the web servers running in the Web subnet,
but all of the connections are failing. You do not see any entries in the
stackdriver logs. What should you do?
A.
Enable VPC Flow Logs for the Test subnet also.
B.
Make sure that there is a valid entry in the route table.
C. Add a firewall rule to allow traffic from the Test subnet to the Web
subnet.
D.
Create a subnet in another VPC, and move the web servers in the
new subnet.
Rationale
A is not correct because enabling the flow logs in subnet “Test” will still
not provide any data as the traffic is being blocked by the firewall rule.
B is not correct because subnets are part of the same VPC and do not
need routing configured. The traffic is being blocked by the firewall rule.
C is CORRECT because the traffic is being blocked by the firewall
rule. Once configured, the request will reach to the VM and the flow
will be logged in the stackdriver.
D is not correct because the traffic is being blocked by the firewall rule
and not due to subnet being in the same VPC.
You want to allow access over ports 80 and 443 to servers with the tag
“webservers” from external addresses. Currently, there is a firewall rule
with priority of 1000 that denies all incoming traffic from an external
address on all ports and protocols. You want to allow the desired traffic
without deleting the existing rule. What should you do?
A.
Add an ingress rule that allows traffic over ports 80 and 443 from
any external address in the rules prior to the deny statement.
B.
Add an ingress rule that allows traffic over ports 80 and 443 from
any external address to the target network tag “webservers” with a
priority value of 500.
C.
Add an egress rule that allows traffic over ports 80 and 443 from any
external address in the rules prior to the deny statement.
D.
Add an egress rule that allows traffic over ports 80 and 443 from any
external address to the target network tag “webservers” with a
priority value of 1500.
Rationale
A is incorrect because the firewall denies traffic if both the permit and
deny have the same priority regardless of rule order.
B is CORRECT because the firewall will allow traffic to pass with
the proper allow ingress rule with a priority lower than the default
value of 1000.
C is incorrect because the scenario described does not apply to egress
traffic. By design, the firewall is stateful, and if the tunnel exists, traffic
will pass.
D is incorrect because the scenario described does not apply to egress
traffic. By design, the firewall is stateful, and if the tunnel exists, traffic
will pass and the priority value is set higher than the default, meaning the
rule would not be considered.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification
Companion, Certification Study Companion Series
https://doi.org/10.1007/978-1-4842-9354-6_4
In this chapter, we will deep dive into VPC Service Controls, and you will
understand how this GCP service is a lot more than a product to prevent
data exfiltration. When effectively implemented, VPC Service Controls will
strengthen your enterprise security posture by introducing context-aware
safeguards into your workloads.
To get started, the concepts of an access policy and an access level will be
formally defined, unambiguously. These are the building blocks of a service
perimeter, which is the key GCP resource you use to protect services used
by projects in your organization.
You will then learn how to selectively pick and choose what services
consumed by your workloads need context-aware protection.
We will walk you through these constructs through a deep dive exercise on
service perimeters, which will help you visualize and understand how all
these pieces fit together.
You will then learn how to connect service perimeters by using bridges, and
you will get familiar with Cloud Logging to detect perimeter violations and
respond to them.
In the preceding scenario, your VPC firewall rule would not know that the
service account has been compromised, resulting in an allow action to the
VPC desired resources—let’s say a VM storing sensitive data. Once access
is allowed, VPC firewall rules are not expressive enough to determine
access based on the verb the service account is going to perform on the
resource. For example, if the malicious user wanted to copy sensitive data
stored in the disk of a VM, the firewall rule would have no way to prevent
this action.
VPC Service Controls and their “companions” VPC service perimeters are
a means to further validate the authenticity of a request based on the API
for the service being requested on a resource in the VPC.
Just like VPC firewall rules determine access to resources hosted in a VPC
based on IP ranges, ports, protocols, network tags, or service accounts, VPC
Service Controls determine access to resources based on the Google Cloud
API that is needed to consume the resource.
The “who,” “what,” and “when” aspects of an access request are all
captured by the components of the perimeter. The perimeter then will
determine whether the access is granted or denied.
The outcome will be based on the mode the perimeter operates under:
enforced or dry-run. In the former case, access will be simply denied or
granted. In the latter case, any access violation will not result in a deny
action. Instead, the violation will be tracked as an audit log.
Perimeters
So what are the components of a perimeter, and most importantly how does
a service perimeter differ from a network perimeter (i.e., VPC firewall
rules)?
The relevant, optional flags you need to know for the exam are
Access levels (--access-levels=[LEVEL, …]): It denotes a
comma-separated list of IDs for access levels (in the same policy) that an
intra-perimeter request must satisfy to be allowed.
Resources (--resources=[RESOURCE, …]): It’s a list of
projects you want to protect by including them in the perimeter and is
denoted as a comma-separated list of project numbers, in the form
projects/<projectnumber>.
Restricted services (--restricted-services=[SERVICE,
…]): It denotes a comma-separated list of Google API endpoints to
which the perimeter boundary does apply (e.g.,
storage.googleapis.com).
VPC allowed services (--vpc-allowed-services=
[SERVICE, …]): It requires the flag --enable-vpc-
accessible-services and denotes a comma-separated list of
Google API endpoints accessible from network endpoints within the
perimeter. In order to include all restricted services, use the keyword
RESTRICTED-SERVICES.
Ingress policies (--ingress-policies=YAML_FILE): It
denotes a path to a file containing a list of ingress policies. This file
contains a list of YAML-compliant objects representing ingress policies,
as described in the API reference.
Egress policies (--ingress-policies=YAML_FILE): It
denotes a path to a file containing a list of egress policies. This file
contains a list of YAML-compliant objects representing egress policies,
as described in the API reference.
Perimeter type (--perimeter-type=PERIMETER_TYPE): It
must be either the keyword regular (default) or the keyword
bridge. You will learn perimeter bridges in the upcoming “Perimeter
Bridges” section.
Access Levels
Whether the request originates from the Internet, from your corporate
network, or from network endpoints within the perimeter, an access level
performs the validation you specify and determines whether the access to
the requested resource is granted or denied.
When you create an access level, you need to decide whether you need a
basic access level or a custom access level. For most use cases, a basic level
of validation suffices, while a few ones require a higher degree of
sophistication.
When you use the gcloud command to create or update an access level,
both types (basic and custom) are expressed in the form of a YAML file,
whose path is assigned to the --basic-level-spec or the --
custom-level-spec flag, respectively. The two flags are mutually
exclusive.
A basic access level YAML spec file is a list of conditions built using
assignments to a combination of one or more of the five following
attributes:
The reference guide to the complete list of basic access level attributes can
be found at
https://cloud.google.com/access-context-
manager/docs/access-level-attributes#ip-
subnetworks
For more complex access patterns, use a custom access level. A custom
access level YAML spec file contains a list of Common Expression
Language (CEL) expressions formatted as a single key-value pair:
expression: CEL_EXPRESSION.
Similarly to basic access levels, the spec file lets you create expressions
based on attributes from the following four objects:
origin: Contains attributes related to the origin of the request, for
example, (origin.ip == "203.0.113.24" &&
origin.region_code in ["US", "IT"])
request.auth: Contains attributes related to authentication and
authorization aspects of the request, for example,
request.auth.principal ==
"accounts.google.com/1134924314572461055"
levels: Contains attributes related to dependencies on other access
levels, for example, level.allow_corporate_ips where
allow_corporate_ips is another access level
device: Contains attributes related to devices the request originates
from, for example, device.is_corp_owned_device == true
In addition to LEVEL, that is, the fully qualified identifier for the level, and
the access policy POLICY (required only if you haven’t set a default access
policy), you must specify a title for your access level.
As you learned before, the level type flags are mutually exclusive. With
basic access level (as noted in Figure 4-2), you have to decide whether all
or at least one condition must be true for the validation to pass or fail. This
can be achieved by setting the --combine-function flag to the value
"and" (default) or the value "or".
In the next section, we will put these concepts to work by walking you
through a simple example of a perimeter and an access level. With a real
example, all these concepts will make sense, and you’ll be ready to design
perimeters and access levels in Google Cloud like a “pro.”
First, to create a perimeter and access levels, we need to enable the access
context manager API. The cloud resource manager API is also needed in
this exercise to update the metadata of some resource containers. Figure 4-3
shows you how to enable both APIs.
Exam tipDo not confuse an IAM allow policy with an access policy. Both
constructs use the term “access” after all—IAM stands for Identity and
Access Management. However, an IAM policy, also known as an allow
policy, is strictly related to what identities (or principals) are allowed to do
on a given resource, whether it be a VM, a Pub/Sub topic, a subnet, a
project, a folder, or an entire organization. Each of these resources (or
containers of resources) has an IAM policy attached to them. Think of it as
a sign that lists only the ones who are allowed to do something on the
resource. The “something” is the list of verbs—permissions—and is
expressed in the form of an IAM role, for example,
roles/networkUser or roles/securityAdmin, which is indeed
a set of permissions. Conversely, while access policies are also focused on
access, they take into consideration a lot more than just identity and role
bindings. Unlike IAM policies, access policies are applicable to resource
containers only, that is, projects, folders, and organizations (one only), and
they are used to enable conditional access to resources in the container
based on the device, request origin (e.g., source CIDR blocks), request
authentication/authorization, and dependencies with other access levels.
In our exercise, for the sake of simplicity we are going to create an access
policy, whose scope is the entire dariokart.com organization, as
displayed in Figure 4-4.
Exam tipThe only required flags are the access policy title (--title)
and its parent organization (--organization). You can also create an
access policy scoped to a specific folder or a specific project in your
organization. This can be achieved by setting the folder (or project) number
as value to the --scopes flag. For more details, visit
https://cloud.google.com/sdk/gcloud/reference/acce
ss-context-manager/policies/create#--scopes.
The name of the access policy is system generated. You only get to choose
its title. It’s also a good idea to set the policy as the default access policy for
our organization, as illustrated in Figure 4-5.
Figure 4-5 Setting the default organization access policy
With our access policy in place, we can now create a basic access level
dariokart_level, which will be associated to the perimeter for the
service projects frontend-devs and backend-devs and their
associated host project vpc-host-nonprod.
To get started, we need first to create a YAML file (Figure 4-6), which
declaratively specifies the conditions that determine who is authorized to
access the service perimeter.
Since the access level is basic, the YAML file is a simple list of
conditions. The conditions apply to the attributes of any of these four
objects:
The syntax of the YAML file uses the Common Expression Language. See
https://github.com/google/cel-
spec/blob/master/doc/langdef.md for more details.
Our YAML file is very simple. We want to enforce a basic access level
stating that only user gianni@dariokart.com is authorized to perform
storage API actions.
The second constraint, that is, preventing any user other than
gianni@dariokart.com from consuming
storage.googleapis.com, will be enforced when we create the
perimeter in the next section.
With the YAML file saved, we can now create our access level (Figure
4-8).
Figure 4-8 Creating dariokart_level
Creating a Perimeter
NoteYou should always use the least privilege principle when designing
the security architecture for your workloads. However, this exercise is
solely intended to explain how access levels and perimeters work together
to enforce access control over the Google Cloud Storage API, and for the
sake of simplicity, we haven’t strictly used the principle.
With all permissions in place, we can finally create the perimeter
dariokart_perimeter and associate it to our newly created access
level dariokart_level. Figure 4-10 shows you how to create the
perimeter.
As you can see, the response returned an HTTP status code 403, which
clearly explained the reason why the request failed, namely, the perimeter
blocked the request after checking the organization access policy.
To answer this question, we need to mention that not every Google API can
be protected by VPC Service Controls. However, the many Google APIs
supported—including storage.googleapis.com—are only
accessible with routes whose destination is the CIDR block
199.36.153.4/30 and whose next hop is the default-internet-
gateway.
Instead, this block can only be routed from within the Google Global
Backbone.
In other words, if you try to ping this CIDR block from a terminal in a
computer connected to your Internet Service Provider (ISP), you will get a
request timeout. However, if you try from a VM in a subnet of your VPC,
you will get a response.
Perimeters are not just about protecting your data from unauthorized
Google API access that originates outside your perimeter.
When you create or update a perimeter, you can also limit the Google APIs
that can be accessed using Private Google Access from network endpoints
within the perimeter.
The Google APIs supported by VPC service controls are exposed using
the domain name restricted.googleapis.com, which resolves to
the restricted VIP (Virtual IP address range) 199.36.153.4/30. These
four public IP addresses 199.36.153.4, 199.36.153.5,
199.36.153.6, 199.36.153.7 are not routable on the Internet, as
you can see from my attempt using my Internet Service Provider (Figure 4-
19).
So how do you limit access to the Google APIs exposed by the restricted
VIP from network endpoints within the perimeter?
The answer depends on whether you are creating a new perimeter or you
are updating an existing perimeter.
--enable-vpc-accessible-services
--vpc-allowed-services=[API_ENDPOINT,...]
Keep in mind that the perimeter boundary is only enforced on the list of
API endpoints assigned to the --restricted-services flag,
regardless of whether they are on the list assigned to the --vpc-
allowed-services flag.
The list of API endpoints assigned to the --vpc-allowed-
services flag has a default value of all services, that is, all services on
the configured restricted VIP are accessible using Private Google Access by
default. If you want to be more selective, provide a comma-delimited list as
follows:
--no-enable-vpc-accessible-services
--clear-vpc-allowed-services
Perimeter Bridges
You would think that the operation succeeded, right? However, as you
can see from the console in Figure 4-21—in this exceptional case, we use
the console because the console reveals more details (the perimeter type)
than gcloud—the perimeter type has not changed!
Figure 4-23 shows what we just learned and visualizes the key
characteristics of perimeter bridges.
Audit Logging
By default, VPC Service Controls write to Cloud Logging all requests that
are denied because of security policy violations. More information about
Cloud Logging—as it pertains to network operations and optimization—
will be provided in Chapter 8.
Dry-Run Mode
You can think of VPC service controls as a firewall that controls which
Google APIs the components of your workload are authorized to consume.
The VPC service perimeter construct was introduced to establish the
boundary around these accessible APIs.
In order to further verify the identity of each request, you may attach one or
more access levels to a VPC service perimeter.
When used in conjunction with other network controls (e.g., firewall rules
and other network services you will learn in the next chapter), VPC service
controls are an effective way to prevent data exfiltration.
However, you need to use caution when configuring VPC service controls
for your workload. A misconfiguration may be overly permissive by
allowing requests to access some Google APIs they shouldn’t be allowed to
consume. Conversely, a misconfiguration may also be too restrictive to the
point that even authorized requests are mistakenly denied access to one or
more accessible Google APIs.
You may wonder, why not test a VPC service control in a non-production
environment first, and then if it works and it passes all forms of test,
“promote” this configuration to production?
This is a feasible approach, but it’s not practical because a VPC service
control configuration takes into consideration a large number of “moving
parts,” which include a combination of identity data, contextual data (e.g.,
geolocation, origination CIDR block, device type, device OS, etc.), and
infrastructure data.
Moreover, your non-production VPC service control configuration may
work in your non-production cloud environment, but it may fail in
production because, for example, your request didn’t originate from the
expected CIDR block using the expected principal and a trusted device.
Dry-Run Concepts
A VPC service perimeter dry-run configuration can be in one of these four
states:
1.
Inherited from enforced: By default, the dry-run configuration is
identical to the enforced configuration. This happens, for example, if
you create a perimeter in enforced mode from scratch, as we did in
the service perimeter deep dive before.
2.
Updated: The dry-run configuration is different from the enforced
configuration. This happens when you want to apply additional
restrictions to the perimeter, and you choose to test them in the dry-
run configuration instead of the enforced configuration. The
upcoming perimeter dry-run deep dive exercise will show you exactly
this approach.
3.
New: The perimeter has a dry-run configuration only. No enforced
configurations exist for the perimeter. The status persists as new until
an enforced configuration is created for the perimeter.
4.
Deleted: The perimeter dry-run configuration was deleted. The status
persists as deleted until a dry-run configuration is created.
Perimeter Dry-Run
In this exercise, you will learn how to use a dry-run mode perimeter
configuration to test your workload security posture.
First, we will configure subnet-frontend with private connectivity to
Google APIs and services.
Last, we will limit the access of Google APIs from network endpoints
within the perimeter to only the storage API. This change constrains even
more the actions an authorized principal can do—whether these actions
originated from inside or outside of the perimeter. Put differently, the only
action allowed will be consumption of the storage API from authorized
requestors within the perimeter.
You will test the perimeter dry-run configuration, and if this works as
expected, then you will enforce it.
Unlike routes and firewall rules which are scoped for the entire VPC
network, private connectivity is defined on a per-subnetwork basis.
First, you need to make sure the subnet is configured to use Private Google
Access.
This is achieved by using the --enable-private-ip-google-
access flag when you create or update a subnet. We already enabled
Private Google Access for subnet-frontend in the “Private Google
Access” section of Chapter 3 (Figure 3-58). Let’s check to make sure this
flag is still enabled (Figure 4-24).
As you can see in Figure 4-24, Private Google Access is enabled for
subnet-frontend but is disabled for subnet-backend.
Second, we need to make sure there are routes whose next step is the
default-internet-gateway and whose destination is the restricted VIP.
Figure 4-27 shows how to create the new default Internet route.
So, just to make sure I have the routes I need, in Figure 4-28 I listed all
the routes that use the default Internet gateway in our shared VPC.
The first route will allow egress traffic to the Internet, which will allow me
(among many things) to ssh to VMs with external IPs I created in any of
my subnets.
The second route will allow egress traffic to the restricted VIP.
Exam tipIn order for your workloads to privately consume Google APIs,
you need to allow egress traffic with protocol and port tcp and 443,
respectively, to reach the restricted VIP.
Third, with Private Google Access enabled and a route to the restricted VIP,
we need to make sure firewall rules won’t block any outbound tcp:443
traffic whose destination is the restricted VIP.
Fourth, we need to set up a DNS resource record set to resolve the fully
qualified domain name compute.googleapis.com.—yes, the trailing
dot is required!—to one of the four IPv4 addresses in the restricted VIP.
Last, in Figure 4-31 we need to add a rule that lets the fully qualified
domain name compute.googleapis.com. resolve to one of the four
IPv4 addresses in the restricted VIP CIDR block 199.36.153.4/30.
With these five steps successfully completed, you can rest assured that
tcp:443 egress traffic from subnet-frontend, whose destination is
the compute Google API, will reach its destination without using the
Internet.
Next, we need to update our access level to allow also the two principals
joseph@dariokart.com and samuele@dariokart.com to use
the restricted service, that is, the storage API.
All we need to do is to add them to the basic access level spec YAML
file (Figure 4-32), save the file, and update dariokart_level access
level (Figure 4-33).
Now we need to make sure the updated access level is reassigned to the
perimeter. This is accomplished by leveraging the --set-access-
levels flag in the gcloud access-context-manager
perimeters update command (Figure 4-34).
In this section, we will test the perimeter by showing that the principal
joseph@dariokart.com is allowed to create a GET request targeting
the compute Google API endpoint in order to list the VMs (instances) in its
project.
As a result, all the VPC accessible and allowed services are available to
consumers within the perimeter, as noted with {} in Figure 4-35.
Since all VPC accessible APIs can be consumed from requestors inside the
perimeter, we expect a request from joseph@dariokart.com
originated from a VM in subnet-frontend to succeed.
NoteFor the request to hit the restricted VIP, we need to make sure the VM
has no external IP address.
Let’s try exactly this scenario.
project
In this test, we will create an HTTP GET request, whose target is the
compute Google API endpoint, with a URI constructed to invoke the
instances.list method.
See
https://cloud.google.com/compute/docs/reference/re
st/v1/instances/list for more details.
I copied the access token, and I pasted it as the value of the Bearer key in
the HTTP header section.
As expected, the request was successful and the Google compute API
returned a response listing the details of vm1—the only instance in the
project frontend-devs.
In Figure 4-41, you can see the detailed response in JSON format. The
access token was redacted.
Since we don’t want to enforce this new configuration just yet, we are going
to leverage the dry-run feature of service perimeters in order to thoroughly
test this configuration and to make sure it provides the expected level of
protection.
When you use a dry-run configuration for the first time, it is by default
inherited from the perimeter enforced configuration.
As a result, we want to first remove all VPC allowed services from the
inherited dry-run configuration—step 1.
The “+” sign denotes a row that is present in the perimeter dry-run
configuration, but is absent from the perimeter enforced configuration.
Conversely, the “-” sign denotes a row that is present in the perimeter
enforced configuration, but is absent from the perimeter dry-run
configuration.
Now that you learned the difference between dariokart_perimeter’s
enforced and dry-run configurations, we are ready to test the perimeter dry-
run configuration.
Since the dry-run configuration has not been enforced (yet!), the expected
behavior is that the exact GET request in Figure 4-41 will succeed, but a
perimeter violation will be appended to the audit logs.
Figure 4-46 shows the successful request to obtain the list of VMs in the
frontend-devs project initiated by the principal
joseph@dariokart.com we saw before.
Even though the request succeeded, the audit logs tell a different story.
Figure 4-47 illustrates how to retrieve from Cloud Logging the past ten
minutes (--freshness=600s) of logs that are reported by VPC Service
Controls (read
'protoPayload.metadata.@type:"type.googleapis.com/
google.cloud.audit.VpcServiceControlAuditMetadata"
').
Figure 4-47 Reading VPC Service Control audit logs in the past ten minutes
Notice the dryRun: true key-value pair (surrounded by the dark blue
rectangle) to denote the audit log was generated by a perimeter dry-run
configuration.
Now that we are satisfied with the level of protection provided by the
perimeter dry-run configuration, we are all set to enforce the configuration.
To do so, use the gcloud access-context-manager
perimeters dry-run enforce command, as illustrated in Figure 4-
49.
Let’s now test the perimeter with its newly enforced configuration, which
was derived from the validated dry-run one.
Figure 4-50 demonstrates that the same test we tried before (Figure 4-
46) this time fails with an HTTP status code 403
(“PERMISSION_DENIED”).
In this exercise, you learned about the powerful dry-run feature available to
GCP perimeter resources. When you design your service perimeters, it is
best practice to thoroughly test their configurations before enforcing them
to your production workloads.
Cleaning Up
Second, the ping command failed. This is because by design any restricted
VIP can only be reached by TCP traffic on port 443. Instead, the ping
command operates using the ICMP protocol.
Final Considerations
There are a few “gotchas” you need to be aware of for the exam. They
relate to scenarios that include the combination of VPC Service Controls
and VPCs connected in a number of ways. The main two common scenarios
are briefly explained in the upcoming sections.
When using Shared VPC, treat the whole Shared VPC (i.e., host and all
service projects) as one service perimeter. This way, when an operation
involves resources distributed between the host and service projects, you
don’t incur a VPC Service Controls violation.
A.
Add the host project containing the Shared VPC to the service
perimeter.
B.
Add the service project where the Compute Engine instances reside
to the service perimeter.
C.
Create a service perimeter between the service project where the
Compute Engine instances reside and the host project that contains
the Shared VPC.
D.
Create a perimeter bridge between the service project where the
Compute Engine instances reside and the perimeter that contains the
protected BigQuery datasets.
Rationale
You need to enable VPC Service Controls and allow changes to perimeters
in existing environments without preventing access to resources. Which
VPC Service Controls mode should you use?
A.
Cloud Run
B.
Native
C.
Enforced
D.
Dry-run
Rationale
A is not correct because Cloud Run has nothing to do with VPC Service
Controls.
B is not correct because the “native” perimeter mode does not exist.
C is not correct because the “enforced” mode achieves the opposite of
what the requirement is.
D is CORRECT because the “dry-run” mode is intended to test the
perimeter configuration and to monitor usage of services without
preventing access to resources. This is exactly what the requirement
is.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification Companion, Certification Study Companion Series
https://doi.org/10.1007/978-1-4842-9354-6_5
Enterprise applications are architected, designed, and built for elasticity, performance, security, cost-effectiveness,
and resilience. These are the five pillars of the well-architected framework.
NoteThe well-architected framework is an organized and curated collection of best practices, which are intended
to help you architect your workloads by taking full advantage of the cloud. Even though each public cloud
provider has created its own framework, all these frameworks have in common the aforementioned five pillars, as
key tenets. If you want to learn more, I recommend starting from the Google Cloud Architecture framework:
https://cloud.google.com/architecture/framework.
In this chapter, you will learn how to choose the most appropriate combination of Google Cloud load balancing
services that will help you architect, design, and build your workloads to be elastic, performant, secure, cost-
effective, and resilient.
By elasticity we mean the ability of a workload to scale horizontally (scale in or scale out) based on the number of
incoming requests it receives. If the workload receives low traffic, then the workload should only consume a small
set of compute, network, and storage resources. As the number of incoming requests increases, the workload
should be able to gradually increase the number of compute, network, and storage resources to be able to serve the
increasing load while maintaining its SLOs.
Performance is typically measured in terms of requests per second (RPS) and request latency, that is, how many
requests per second the workload can process in order to meet its SLOs and what is the average time (usually
expressed in milliseconds—ms) for the workload to serve a request. More metrics may be added, but for the scope
of this chapter, we will be mainly focused on RPS and latency as SLIs for performance.
The focus areas of security are identity protection, network protection, data protection, and application protection.
Data protection includes encryption at rest, in transit, and in use.
Cost is another important pillar that has to be considered when designing the architecture for your workloads.
Remember, in the cloud you pay for what you use, but ultimately your workloads need a network to operate. You
will learn in this chapter that all the load balancing services available in Google Cloud come in different “flavors”
based on the network tier you choose, that is, premium tier or standard tier—we introduced the two network
service tiers in Chapter 2. The former is more expensive than the latter, but has the key benefit of leveraging the
highly reliable, highly optimized, low-latency Google Global Backbone network instead of the Internet to connect
the parts of your load balancing service. Some load balancing services are only available in a network tier
(premium) rather than the other (standard). For more information, visit
https://cloud.google.com/network-tiers#tab1.
Last, resilience indicates the ability of the workload to recover from failures, whether they be in its frontend, in its
backends, or any other component of its architecture.
Google Cloud offers a number of load balancing services that are meant to help you take full advantage of its
globally distributed, performant, and secure network infrastructure. This is the same infrastructure that powers
billion-user services you probably use every day, for example, Google Search, Gmail, Google Maps, YouTube,
Google Workspace, and others.
Figure 5-1 provides an overview of the Google Cloud load balancing services.
NoteDo not confuse the scope of a load balancer (global vs. regional) with its “client exposure,” that is, external
vs. internal. An external load balancer denotes a load balancer that accepts Internet traffic, whereas an internal
load balancer only accepts RFC 1918 traffic.
In addition to the nine load balancer types, each type—not all—may come in the two network tiers you learned in
Chapter 2.
To avoid confusion, from now on I will denote each load balancer type with a specific number as indicated in
Figure 5-1.
For the exam, you are required to know which load balancer type is best suited for a specific use case. You will
need to determine the most appropriate type of load balancer, as well as a combination of Google Cloud services
that meet the requirements in the question. These are typically requirements related to the pillars of the well-
architected framework, that is, operational efficiency, performance, resilience, security, and cost-effectiveness.
I have included in Figure 5-1 Cloud Armor’s compatibility, which is a key service that goes hand in hand with
global external load balancers. You will learn Cloud Armor in detail at the end of this chapter. For the time being,
think of Cloud Armor—as the name suggests—as a GCP network service intended to provide an extra layer of
protection for your workloads.
When an HTTPS load balancer combines “forces” with two other network services, that is, Cloud Armor and
Identity-Aware Proxy, your workload is also more secure because it is protected from Distributed Denial-of-
Service (DDoS) and other layer 7 attacks, for example, the Open Web Application Security Project (OWASP) top
ten vulnerabilities.
You will learn first the common components of a load balancer, regardless of its type. Once you have acquired a
good understanding of the parts that make up a load balancer, you will learn the differentiating capabilities each
load balancer has.
Backend services are the means a load balancer uses to send incoming traffic to compute resources responsible for
serving requests. The compute resources are also known as backends, while a backend service is the Google Cloud
resource that mediates between the load balancer frontend and the backends.
A backend service is configured by setting the protocol it uses to connect to the backends, along with distribution
and session settings, health checks, and timeouts. These settings determine how your load balancer behaves.
If your workload serves static content using the HTTP(S) protocol, then there is really no compute resource
needed. This is because the response is not the result of a computation performed by a system as an effect of the
incoming HTTP(S) request and its metadata (e.g., HTTP headers). Instead, for the same HTTP(S) requests,
identical HTML pages are returned in the HTTP(S) responses. As a result, the easiest way to manage static content
is to use a Cloud Storage bucket to store the static content and let the HTTP(S) load balancer return the HTML
page to the requestor.
Conversely, if your workload serves dynamic content, then backends are required to compute the data from the
request and create a response to be sent back to the requestor.
A number of backend options are available, and you—as a network engineer—will need to make the right choice
based on the business, technical, and security requirements for your workload.
If the preceding requirements drive you toward an IaaS (Infrastructure as a Service) approach, then managed
instance groups (MIGs) are an excellent option for your workload backends.
On the other hand, if your workload’s architecture was designed to be cloud-native, then the good news is that you
can take full advantage of container-native or serverless services provided by Google Cloud (e.g., GKE, Cloud
Run, Cloud Functions, App Engine). These services are nicely integrated with your Google Cloud load balancer as
we will see in the next paragraph.
Google Cloud has generalized the “network endpoint group” construct (referred in the upcoming sections as NEG)
from container-native only to a number of different “flavors” in order to meet the nonfunctional requirements
presented by hybrid and multi-cloud topologies.
The exam will not require deep knowledge on each of the five different NEG types—the last five columns from
the right in Figure 5-2. As a result, in this chapter we will just focus on how to configure managed instance groups
and zonal network endpoint groups.
A managed instance group treats a group of identically configured VMs as one item. These VMs are all modeled
after an instance template, which uses a custom image (selected to keep the boot time relatively short), and do not
need an external IP address. This is because the load balancer backend services implement a mechanism to interact
with each VM’s internal IP address.
For proxy-based, external load balancers (1, 2, 3, 4, 5 in Figure 5-1), backends are connected to the load balancer
Google Front End (GFE) with a named-port.
The value of the backend service attribute --port-name (e.g., port1) must match a value in the --
named-ports list of key-value pairs for the instance group (e.g., port1:443,port2:80,port3:8080):
In this case, the backend service uses the value 443 as the port to use for communication with the instance group’s
VMs over the https protocol.
This is because port1 matches the key-value pair port1:443 in the --named-ports instance group list,
thereby resolving port1 to https:443:
A network endpoint group (NEG) abstracts the backends by letting the load balancer communicate directly with a
group of backend endpoints or services.
Zonal NEGs are collections of either IP addresses or IP address/port pairs—see endpoint API scheme for zonal
NEGs in Figure 5-2—for Google Cloud resources within a single subnet.
NEGs are useful because they allow you to create logical groupings of IP addresses and ports representing
software services instead of entire VMs.
The following example creates an HTTP load balancing NEG and attaches four zonal network endpoints to the
NEG. The zonal network endpoints will act as backends to the load balancer. It assumes you already have a VPC
network network-a.
1.
Create a subnet with alias IP addresses:
3.
Create the NEG:
4.
Update NEG with endpoints:
5.
Create a health check:
8.
Create a URL map using the backend service backendservice1:
9.
Create the target http proxy using the url-map urlmap1:
10.
Create the forwarding rule, that is, the key component of a frontend load balancer’s configuration, and attach i
When you configure a load balancer’s backend services, you are required to specify one or more health checks for
its backends.
As its name suggests, a health check is a Google Cloud resource, whose only job is to determine whether the
backend instances of the load balancer they are associated with are healthy. This determination is based on the
ability of the backend to respond to incoming traffic. But what type of traffic do the backends need to respond to in
order to be deemed as healthy? The answer depends on the load balancer type. For HTTPS load balancers, the
response must use the HTTPS protocol (layer 7); for TCP/UDP load balancers, the response must use the
TCP/UDP protocol (layer 4).
This is why when you create a health check, you need to specify a protocol, optionally a port, and a scope, as
shown in Figure 5-3.
At the time of writing this book, the protocol can be one of the following options: grpc, http, https, http2,
ssl, tcp. Legacy health checks only support http and https.
The scope indicates whether the backends are located in a single region or in multiple regions.
Keep in mind that health checks must be compatible with the type of load balancer and the backend types. For
example, some load balancers only support legacy health checks—that is, they only support http and https
protocols—while others support grpc, http, https, http2, ssl, and tcp.
In addition to compatibility, for the health check to work effectively, an ingress allow firewall rule that allows
traffic to reach your load balancer backends must be created in the VPC where the backends live.
Figure 5-4 summarizes the minimum required firewall rules for each load balancer.
The two TCP/UDP network load balancers (denoted by 8 and 9 in Figure 5-4) are the only nonproxy-based load
balancers. Put differently, these load balancers preserve the original client IP address because the connection is not
terminated until all packets reach their destination in one of the backends. Due to this behavior, the source ranges
for their compatible health checks are different from the ones applicable to proxy-based load balancers (1–7 in
Figure 5-4).
Also, regional load balancers that use the open source Envoy proxy (load balancers 5, 6, 7 in Figure 5-4) require an
additional ingress allow firewall rule that accepts traffic from a specific subnet referred to as the proxy-only subnet.
These load balancers terminate incoming connections at the load balancer frontend. Traffic is then sent to the
backends from IP addresses located on the proxy-only subnet.
As a result, in the region where your Envoy-based load balancer operates, you must first create the proxy-only
subnet.
A key requirement is to set the --purpose flag to the value REGIONAL_MANAGED_PROXY, as shown in
the following code snippet, where we assume the VPC network lb-network already exists:
gcloud compute networks subnets create proxy-only-subnet \
--purpose=REGIONAL_MANAGED_PROXY \
--role=ACTIVE \
--region=us-central1 \
--network=lb-network \
--range=10.129.0.0/23
Exam tipThere can only be one active proxy-only subnet per region at a given time. All Envoy-based load
balancers will use backend services configured to share the same proxy-only subnet to test the health of their
backends. This is why it is recommended to use a CIDR block with a mask of /23 for the proxy-only subnet.
Upon creation of the proxy-only subnet, you must add another firewall rule to the load balancer VPC—lb-
network in our preceding example. This firewall rule allows proxy-only subnet traffic flow in the subnet where
the backends live—the two subnets are different, even though they reside in the load balancer VPC. This means
adding one rule that allows TCP port 80, 443, and 8080 traffic from the range of the proxy-only subnet, that is,
10.129.0.0/23.
In the preceding example, the firewall rule targets all VMs that are associated with the network tag allow-
health-checks.
Configuring External HTTP(S) Load Balancers Including Backends and Backend Services with
Balancing Method, Session Affinity, and Capacity Scaling/Scaler
External HTTP(S) load balancers (load balancers 1, 2, 5 in Figure 5-1) are proxy-based, layer 7 load balancers,
which enable you to run and scale your workloads behind a single external IP address, that is, a virtual IP (VIP).
Proxy-based means that the end-to-end HTTP(S) connection between the client and the backend serving the
client is broken in two connections:
The first connection originates from the client and is terminated at the target-proxy component of the load
balancer.
The second connection starts at the target proxy and ends at the backend, which is responsible for serving
the client request.
Layer 7 means that these load balancers are “smart” enough to exhibit load balancing behavior based on the OSI
layer 7, application-specific characteristics. The OSI layer 7 is the application layer and is the closest layer to the
end-user experience. As a result, this type of load balancers can route traffic based on—for example—HTTP
headers, session affinity, specific URL patterns, and many other layer 7 features. None of these traffic routing and
distribution capabilities are available to layer 4 load balancers.
External HTTP(S) load balancers distribute HTTP and HTTPS traffic to backend services hosted on a number of
Google Cloud compute services (such as Compute Engine, Google Kubernetes Engine (GKE), Cloud Run, and
many others), as well as backend buckets hosted on Google Cloud Storage. External backends can also be
connected over the Internet or via hybrid connectivity.
Before delving into more details, it is important to emphasize the scope of the load balancer, that is, global vs.
regional.
A regional (external HTTP(S)) load balancer is intended to serve content from a specific region, whereas a global
(external HTTP(S)) load balancer is intended to serve content from multiple regions around the world. A typical
use case for a regional (external HTTP(S)) load balancer is compliance due to sovereignty laws requiring backends
to operate in a specific geolocation.
Because of that, global external HTTP(S) load balancers leverage a specific Google infrastructure, which is
distributed globally in the Google Edge Network and operates using Google’s global backbone network and
Google’s control plane. This infrastructure is called the Google Front End (GFE) and uses the CIDR blocks
130.211.0.0/22, 35.191.0.0/16.
Exam tipThere are a few IP ranges you need to remember for the exam. The GFE ones, that is,
130.211.0.0/22, 35.191.0.0/16, are definitely in the list to be remembered.
Modes of Operation
Global external HTTP(S) load balancer: This is the load balancer type 1 in Figure 5-1 and is implemented as
a managed service on Google Front Ends (GFEs). It uses the open source Envoy proxy to support advanced
traffic management capabilities such as traffic mirroring, weight-based traffic splitting, request/response-based
header transformations, and more.
Global external HTTP(S) load balancer (classic): This is the load balancer type 2 in Figure 5-1 and is global
in premium tier, but can be configured to be regional in standard tier. This load balancer is also implemented on
Google Front Ends (GFEs).
Regional external HTTP(S) load balancer: This is the load balancer type 5 in Figure 5-1 and is implemented
as a managed service on the open source Envoy proxy. It includes advanced traffic management capabilities
such as traffic mirroring, weight-based traffic splitting, request/response-based header transformations, and
more.
Architecture
The architecture of all external HTTP(S) load balancers (global and regional) shows common components, such as
forwarding rules, target proxies, URL maps, backend services, and backends.
Yet, there are a few differences among the three types (1, 2, 5 in Figure 5-1) we just described.
Instead of listing each common component of the architecture, and explaining what its intended purpose is, and
how it differs across the three types, I like to visualize them in a picture and go from there.
Figure 5-5 illustrates the architecture of the two global external HTTP(S) load balancers (“regular” and classic,
respectively, types 1 and 2), whereas the bottom part shows the regional external HTTP(S) load balancer (type 5)
architecture. Let’s start from the client, located in the very left part of the figure.
Figure 5-5 Architecture of an external HTTP(S) load balancer
Forwarding Rule
An external forwarding rule specifies an external IP address, port, and target HTTP(S) proxy. Clients use the IP
address and port to connect to the load balancer from the Internet.
Exam tipType 1, 2, 5 load balancers only support HTTP and HTTPS traffic on TCP ports 80 (or 8080) and TCP
port 443, respectively. If your clients require access to your workload backend services using different TCP (or
UDP) ports, consider using load balancer types 3, 4, 6, 7, 8, 9 instead (Figure 5-1). More information will be
provided in the upcoming sections.
Global forwarding rules support external IP addresses in IPv4 and IPv6 format, whereas regional forwarding rules
only support external IP addresses in IPv4 format.
As an effect of terminating the connection, the original client IP is no longer accessible in the second connection,
that is, the connection that starts at the target HTTP(S) proxy and ends in one of the backends. For these types of
load balancer, responses from the backend VMs go back to the target HTTP(S) proxy, instead of reaching directly
the client.
Upon terminating the connection, the target HTTP(S) proxy evaluates the request by using the URL map (and
other layer 7 attributes) to make traffic routing decisions.
For HTTPS requests, the target HTTP(S) proxy can also leverage SSL certificates to identify itself to clients and
SSL policies to enforce TLS compliance, for example, to accept only TLS handshakes if the TLS version is greater
than or equal to 1.2.
You attach the SSL certificate to the load balancer’s HTTP(S) target proxy either while creating the load balancer
or any time after. Any changes made to SSL certificates don’t alter or interrupt existing load balancer connections.
You can configure up to the maximum number of SSL certificates per target HTTP(S) proxy (there is a quota to
limit the number of certificates). Use multiple SSL certificates when your workload is serving content from
multiple domains using the same load balancer VIP address and port, and you want to use a different SSL
certificate for each domain.
When you specify more than one SSL certificate, the first certificate in the list of SSL certificates is considered
the primary SSL certificate associated with the target proxy, as illustrated in Figure 5-6.
Figure 5-6 Multiple SSL certificates feature of global HTTP(S) load balancer
Self-managed SSL certificates are certificates that you obtain, provision, and renew yourself. This type can be any
of
Google-managed SSL certificates are certificates that Google Cloud obtains and manages for your domains,
renewing them automatically. Google-managed certificates are Domain Validation (DV) certificates. They don’t
demonstrate the identity of an organization or individual associated with the certificate, and they don’t support
wildcard common names.
All global external HTTP(S) load balancers support Google-managed and self-managed SSL certificates, whereas
regional external HTTP(S) load balancers only support self-managed SSL certificates.
SSL Policies
From now on, the term SSL refers to both the SSL (Secure Sockets Layer) and TLS (Transport Layer Security)
protocols.
SSL policies define the set of TLS features that external HTTP(S) load balancers use when negotiating SSL with
clients.
For example, you can use an SSL policy to configure the minimum TLS version and features that every client
should comply with in order to send traffic to your external HTTP(S) load balancer.
Exam tipSSL policies affect only connections between clients and the target HTTP(S) proxy (Connection 1 in
Figure 5-5). SSL policies do not affect the connections between the target HTTP(S) proxy and the backends
(Connection 2).
To define an SSL policy, you specify a minimum TLS version and a profile. The profile selects a set of SSL
features to enable in the load balancer.
Three preconfigured Google-managed profiles let you specify the level of compatibility appropriate for your
application. The three preconfigured profiles are as follows:
COMPATIBLE: Allows the broadest set of clients, including clients that support only out-of-date SSL
features, to negotiate SSL with the load balancer
MODERN: Supports a wide set of SSL features, allowing modern clients to negotiate SSL
RESTRICTED: Supports a reduced set of SSL features, intended to meet stricter compliance requirements
The SSL policy also specifies the minimum version of the TLS protocol that clients can use to establish a
connection. A profile can also restrict the versions of TLS that the load balancer can negotiate. For example,
ciphers enabled in the RESTRICTED profile are only supported by TLS 1.2. Choosing the RESTRICTED profile
effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version.
If you do not choose one of the three preconfigured profiles or create a custom SSL policy, your load balancer uses
the default SSL policy. The default SSL policy is equivalent to an SSL policy that uses the COMPATIBLE profile
with a minimum TLS version of TLS 1.0.
Use the gcloud compute target-https-proxies create or update commands to attach an SSL
policy (--ssl-policy) to your target HTTP(S) proxy.
Exam tipYou can attach an SSL policy to more than one target HTTP(S) proxy. However, you cannot configure
more than one SSL policy for a particular target proxy. Any changes made to SSL policies don’t alter or interrupt
existing load balancer connections.
URL Map
The target HTTP(S) proxy uses a URL map to decide where to route the new request (Connection 2 in Figure 5-5)
—remember the first request, which carried the original client IP, has been terminated.
Since the HTTP(S) load balancer operates at layer 7, it is fully capable of determining where to route the request
based on HTTP attributes (i.e., the request path, cookies, or headers). When I say “where to route the request,” I
really mean to which backend service or backend bucket the target HTTP(S) proxy should route the request to.
In fact, the target HTTP(S) proxy forwards client requests to specific backend services or backend buckets. The
URL map can also specify additional actions, such as sending redirects to clients.
Backend Service
A backend service distributes requests to healthy backends. Unlike regional global external HTTP(S) load
balancers, the two global external HTTP(S) load balancers also support backend buckets. The key components of a
backend service are
Backends
Backends are the ultimate destination of your load balancer incoming traffic.
Upon receiving a packet, they perform computation on its payload, and they send a response back to the client.
As shown in Figure 5-5, the type of backend depends on the scope of the external HTTP(S) load balancer, as well
as its network tier.
In gcloud, you can add a backend to a backend service using the gcloud compute backend-services
add-backend command.
When you add an instance group or a NEG to a backend service, you specify a balancing mode, which defines
a method measuring backend load and a target capacity. External HTTP(S) load balancing supports two balancing
modes:
RATE, for instance groups or NEGs, is the target maximum number of requests (queries) per second (RPS,
QPS). The target maximum RPS/QPS can be exceeded if all backends are at or above capacity.
UTILIZATION is the backend utilization of VMs in an instance group.
How traffic is distributed among backends depends on the mode of the load balancer.
In this exercise, you will learn how to configure container-native load balancing using a global external HTTP(S)
load balancer.
As we mentioned in Chapter 2, when a load balancer—the target HTTP(S) proxy of the load balancer to be precise
—can use containers as backends (instead of VMs or other endpoints), this type of load balancing is called
container-native load balancing. Container-native load balancing allows load balancers to target Kubernetes pods
directly and to evenly distribute traffic to pods.
There are several ways to create the networking infrastructure required for the load balancer to operate in
container-native mode. You will learn two ways to accomplish this.
The first way is called container-native load balancing through ingress, and its main advantage is that all you have
to code is the GKE cluster that will host your containers, a deployment for your workload, a service to mediate
access to the pods hosting your containers, and a Kubernetes ingress resource to allow requests to be properly
distributed among your pods. The deployment, the service, and the Kubernetes ingress can all be coded
declaratively using YAML files, as we’ll see shortly.
Upon creating these four components, Google Cloud does the job for you by creating the global external HTTP(S)
load balancer (classic), along with all the components you learned so far. These are the backend service, the NEG,
the backends, the health checks, the firewall rules, the target HTTP(S) proxy, and the forwarding rule. Not bad,
right?
In the second way, we let Google Cloud create only the NEG for you. In addition to the GKE cluster, your
workload deployment, and the Kubernetes service, you will have to create all the load balancing infrastructure
earlier. This approach is called container-native load balancing through standalone zonal NEGs.
While this approach sounds like more Infrastructure as a Service oriented, it will help you consolidate the
knowledge you need in order to master the configuration of a global external HTTP(S) load balancer that operates
in container-native load balancing mode. Let’s get started!
As shown in Figure 5-7, we start by creating a GKE VPC-native, zonal cluster, whose worker nodes are hosted in
the service project frontend-devs of our shared VPC.
Figures 5-8, 5-9, and 5-10 display the YAML manifests for the workload deployment, the service, and the
ingress Kubernetes resources, respectively.
The creation of a Kubernetes ingress had the effect of triggering a number of actions behind the scenes. These
included the creation of the following resources in the project frontend-devs:
1.
A global external HTTP(S) load balancer (classic).
2.
A target HTTP(S) proxy.
3.
A backend service in each zone—in our case, since the cluster is a single zone, we have only a backend
service in us-east1-d.
4.
A global health check attached to the backend service.
5.
A NEG in us-east1-d. The endpoints in the NEG and the endpoints of the Service are kept in sync.
Figure 5-14 displays the description of the newly created Kubernetes ingress resource.
Figure 5-15 Creating a firewall rule to allow health checks to access backends
In our example, when we tested the load balancer, we received a 502 status code “Bad Gateway” (Figure 5-16).
There are a number of reasons why the test returned a 502 error. For more information on how to troubleshoot 502
errors, see https://cloud.google.com/load-balancing/docs/https/troubleshooting-
ext-https-lbs.
This exercise was intended to show you the automatic creation of the load balancer and all its components from the
declarative configuration of the ingress manifest and the annotation in the Kubernetes service manifest.
If you are curious to learn more, I pulled the logs and I discovered that the target HTTP(S) proxy was unable to
pick the backends (Figure 5-17). Can you discover the root cause?
Last, let’s clean up the resources we just created in order to avoid incurring unexpected charges (Figure 5-18).
Notice how the --zone flag is provided because the nodes of the cluster are VMs, and VMs are zonal resources.
It’s worth to mention that by deleting the cluster, Google Cloud takes care of automatically deleting the load
balancer and all its components except the NEG (this is a known issue) and the firewall rule we had to create in the
shared VPC (this is because we deployed the cluster in a shared VPC).
To make it simpler, as we will be typing a number of gcloud commands, let’s set some environment variables
(Figure 5-19).
Next, let’s create another GKE VPC-native, zonal cluster, this time in the subnet-backend, which is also
hosted in your-app-shared-vpc and is located in the region us-central1 (Figure 5-20).
Unlike before, in this exercise we just create two Kubernetes resources (instead of three), that is, a deployment and
a service, which targets the deployment. There is no ingress Kubernetes resource created this time.
The manifest YAML files for the deployment and the service are displayed in Figures 5-21 and 5-22,
respectively.
Notice the difference between the two in the metadata.annotation section of the manifest (Figure 5-22,
lines 5–7).
During the creation of the service, the Kubernetes-GKE system integrator reads the annotation section, and—this
time—it doesn’t tell Google Cloud to create the load balancer and all its components anymore (line 7 in Figure 5-
22 is commented).
Let’s see what happens after creating the deployment and the service (Figure 5-23).
To validate what we just described, let’s list the NEGs in our project (Figure 5-24).
Indeed, here it is! Our newly created NEG uses a GCE_VM_IP_PORT endpoint type to indicate that incoming
HTTP(S) requests resolve to either the primary internal IP address of a Google Cloud VM’s NIC (one of the GKE
worker nodes, i.e., 192.168.1.0/26) or an alias IP address on a NIC, for example, pod IP addresses in our
VPC-native clusters neg-demo-cluster, that is, 192.168.15.0/24.
See Figure 3-11 as a reminder of the list of usable subnets for containers in our shared VPC setup in Chapter 3.
So, now that you have your NEG ready, what’s next? With container-native load balancing through standalone
zonal NEGs, all you have automatically created is your NEG, nothing else. To put this NEG to work—literally, so
the endpoints (pods) in the NEG can start serving HTTP(S) requests—you are responsible for creating all required
load balancing infrastructure to use the NEG.
I’ll quickly walk you through this process to get you familiarized with each component of an external global
HTTP(S) load balancer as well as the related gcloud commands.
First, we need an ingress firewall rule to allow health check probes and incoming HTTP(S) traffic to reach the
NEG. Incoming traffic originates at the start point of Connection 2 in Figure 5-5, which in our case is the GFE
because our external HTTP(S) load balancer is global. The target of the ingress firewall rule is the GKE worker
nodes, whose network tags are assigned at creation time by GKE, as shown in Figure 5-25.
Since our network setup uses a shared VPC, the principal samule@dariokart.com is not responsible for
managing the network.
Let’s set the CLUSTER_NAME and NETWORK_TAGS environment variables for a principal who has
permissions to create the firewall rule, for example, itsmedario@dariokart.com (Figure 5-26).
Figure 5-26 Setting environment variables for itsmedario@dariokart.com
We need to create the backend components, beginning from a health check, which will be attached to a new
backend service. Upon creating the backend service, we need to add a backend, which is our newly created NEG
in Figure 5-24.
Figure 5-28 describes the creation of the backend components of the HTTP(S) load balancer.
Last, we need to create the frontend components, that is, the URL map, which will be attached to a new target
HTTP(S) proxy, which will be attached to a global forwarding rule.
Figure 5-29 describes the creation of the aforementioned frontend components of the HTTP(S) load balancer.
Notice how the URL map uses the newly created backend service neg-demo-cluster-lb-backend to
bridge the load balancer frontend and backend resources.
This is also visually represented in the URL map boxes in Figure 5-5.
As you can see, container-native load balancing through standalone zonal NEGs requires extra work on your end.
However, you get more control and flexibility than using container-native load balancing through ingress because
you create your own load balancing infrastructure and you configure it to your liking.
Last, remember to delete the load balancer components and the cluster to avoid incurring unexpected charges
(Figure 5-30).
Figure 5-30 Deleting load balancer infrastructure and the GKE cluster
As you saw in the previous examples, a cloud load balancer is not a single Google Cloud resource. Instead, it’s a
group of resources, which need each other for the load balancer to work.
In Figure 5-5, we visualized all the resources, which are parts of an external HTTP(S) load balancer.
In this exercise, you will learn how to set up a global external HTTPS load balancer (classic)—load balancer 2 in
Figure 5-1—with all its underlying resources.
This time, we will be using a management instance group (MIG) as backend, and the setup will be hosted in the
service project backend-devs of our shared VPC.
Exam tipThere are multiple ways to set up an HTTPS load balancer in a shared VPC. One way is to use a service
project with ID backend-devs-7736 to host all the load balancer resources. Another way is to distribute the
frontend resources (i.e., the reserved external IP address, the forwarding rule, the target proxy, the SSL
certificate(s), the URL map) in a project (it can be the host or one of the service projects) and the backend
resources (i.e., the backend service, the health check, the backend) in another service project. This latter approach
is also called cross-project backend service referencing.
Figure 5-31 gives you an idea of what our setup will look like upon completion.
Notice the setup will be using the HTTPS protocol, which requires the creation of an SSL certificate in use by the
target HTTPS proxy.
For global external HTTPS load balancers, the SSL certificate is used for domain validation, and it can be a self-
managed or a Google-managed SSL certificate.
Exam tipRegional external HTTPS load balancers can only use self-managed SSL certificates.
First and foremost, let’s refresh our memory on who can do what in the service project backend-devs. This is
done by viewing the project’s IAM allow policy as illustrated in Figure 5-32.
As per Google recommendation, we will create a Google-managed SSL certificate (Figure 5-33).
Notice the --domains flag, which requires a comma-delimited list of domains. Google Cloud will validate each
domain in the list. For the load balancer to work, we will need to wait until both the managed.status and the
managed.domainStatus properties are ACTIVE. This process is lengthy and may take one or more hours,
depending on how quickly your domain provider can provide evidence that you own the domain. In my case, the
validation took about an hour because I bought my domain dariokart.com from Google workspace.
Nevertheless, while we are waiting for the SSL certificate to become active, we can move forward with the
remaining steps.
Next, we need to create an instance template, which will be used by the managed instance group to create new
instances (VMs) as incoming traffic increases above the thresholds we will set up in the backend service.
Since our global external HTTPS load balancer is intended to serve HTTPS requests, we need to make sure our
instances come with an HTTP server preinstalled. As a result, our instance template will make sure Apache2 will
be installed on the VM upon startup.
Also, the startup script is configured to show the hostname of the VM, which will be used by the backend
service to serve the incoming HTTPS request, as illustrated in Figure 5-34.
NoteThe VMs will be using a network tag allow-health-check to be allowed to be health-checked from the
CIDRs of Google Front End (GFE). We will use this network tag when we configure the load balancer frontend.
With the instance template ready, we can create our managed instance group (MIG). The MIG will start with two
n1-standard-1 size VMs in us-central1-a, as shown in Figure 5-35.
NoteWith a shared VPC setup, you need to make sure the zone (us-central1-a) you are using to host your
MIG’s VMs is part of the region (us-central1) where the instance template operates, which in turn needs to
match the region of the subnet where your backend service will run (us-central1).
Once the MIG is ready, we need to tell it which named port its VMs will be listening to in order to serve incoming
HTTPS traffic. Guess what named port we will use for HTTPS traffic? HTTPS, right ☺? No, it’s going to be
HTTP actually!
This is because our backends will be running in Google Cloud, and when backends run in Google Cloud, traffic
from the Google Front End (GFE) destined to the backends is automatically encrypted by Google. As a result,
there is no need to use an HTTPS named port.
Figure 5-36 illustrates how to set the list of key-value pairs for the desired MIG’s protocol and port. In our
case, the list contains only one key-value pair, that is, http:80.
Exam tipAn instance group can use multiple named ports, provided each named port uses a different name. As a
result, the value of the --named-ports flag is a comma-delimited list of named-port:port key:value
pairs.
With our managed instance group properly created and configured, we need to make sure health probes can access
the two backend VMs using the named port http (see previous command in Figure 5-36), which is mapped to the
port 80 in the MIG.
As a result, the ingress firewall rule (IST type, i.e., Ingress Source and Target) in Figure 5-37 must be created
in our shared VPC—as you learned in Chapter 3, firewall rules are defined on a per–VPC network basis—to allow
incoming HTTP traffic (tcp:80) originating from the GFE CIDR blocks (130.211.0.0/22,
35.191.0.0/16) to reach the two backend VMs, which are tagged with the network tag allow-health-
check.
Next, we need to reserve a static external IP address with global scope. To make it simple, we use an IPv4
version (Figure 5-38).
To make sure our backend VMs are continuously checked for health, we need to create a health check.
Exam tipAs previously mentioned, just because you are using HTTPS—like in this exercise—on your forwarding
rule (Connection 1 in Figure 5-5), you don’t have to use HTTPS in your backend service and backends
(Connection 2). Instead, you can use HTTP for your backend service and backends. This is because traffic
between Google Front Ends (GFEs) and your backends is automatically encrypted by Google for backends that
reside in a Google Cloud VPC network.
In Figure 5-39, we will create a health check to monitor the health status of our backend VMs. Then, we will use
the newly created health check to create a backend service. Finally, we will add our newly created managed
instance group (the MIG we created in Figure 5-35) to the backend service.
With the backend service ready, we can create the remaining frontend resources, that is, the URL map, which is
required along with our SSL certificate to create the target HTTPS proxy, and the globally scoped forwarding rule.
Figure 5-40 displays the creation of the aforementioned frontend GCP resources.
Figure 5-40 URL map, target HTTPS proxy, and forwarding rule creation
The last step in this setup is to add a DNS A record, which is required to point our domain dariokart.com to
our load balancer.
Since I bought my domain from Google Workspace, I will use the Google Workspace Admin Console and
Google Domains to create two DNS A records. Figure 5-41 shows the newly created DNS A records.
Figure 5-41 Adding DNS A records to resolve domain names to the VIP
When you save the changes, you get a notification that it may take some time to propagate the DNS changes over
the Internet, but in most cases, the propagation happens within an hour or less, depending on how fast your domain
registration service operates.
Now it’s a matter to wait for the SSL certificate to become active so that an SSL handshake can be established
between the clients and the target HTTPS proxy.
After about an hour, the SSL certificate became active, as you can see in Figure 5-42.
Figure 5-42 The SSL certificate’s managed domain status becomes active
Figure 5-43 confirms all tests “hitting” the domain with HTTPS were successful!
Figure 5-43 Testing the HTTPS load balancer from Cloud Shell
More specifically, the first HTTPS request was served by the VM lb-backend-example-t0vc, and the
second request was served by the VM lb-backend-example-1dbf.
To further validate that the HTTPS load balancer can serve traffic from the Internet, I tried
https://dariokart.com from my phone. The result was also successful (Figure 5-44).
Figure 5-44 Testing the HTTPS load balancer from a mobile device
Now it’s time to clean up—each resource we just created is billable—not to mention that the HTTPS load balancer
is exposed to the public Internet.
Figures 5-45 and 5-46 show you how to delete the load balancer’s resources.
After saving the changes, the two A records are gone (Figure 5-49).
You learned when to use and how to configure external HTTP(S) load balancers in different flavors (types 1, 2,
and 5).
These types of load balancer operate at layer 7 of the OSI model, resulting in advanced routing and session-level
capabilities, which are not available to load balancers that operate at lower layers of the OSI model.
For example, URL mapping is a feature only available to HTTP(S) load balancers, which makes sense because
the URL construct is specific to the HTTP protocol. Likewise, session affinity based on HTTP headers is another
feature unique to HTTP(S) load balancers.
What if your workload requires a solution to load-balance traffic other than HTTP(S) instead?
To answer this question, you need to find out where the clients of your workload are located and whether you want
the load balancer to remember the client IP address.
If your workload requires access from the Internet (i.e., if your load balancer forwarding rule is external), and its
compute backends need to be distributed in more than one region—for example, because of reliability
requirements—then Google Cloud offers the global external SSL proxy load balancer (type 3) and the global
external TCP proxy load balancer (type 4).
Figure 5-50 illustrates the architecture for these types of load balancer.
As noted in the Global Target (SSL or TCP) Proxy column in Figure 5-50, these global load balancers are offered
in the premium or in the standard tier. The difference is that with the premium tier, latency is significantly reduced
(compared to the standard tier) because ingress traffic enters the Google Global Backbone from the closest Point of
Presence (PoP) to the client—PoPs are geographically distributed around the globe. In contrast, with standard tier
ingress traffic stays in the Internet for a longer period of time and enters the Google Global Backbone in the GCP
region where the global forwarding rule lives.
Moreover, because these load balancers use a target proxy as an intermediary, they “break” the connection from
the client in two connections. As a result, the second connection has no clue of the client IP address where the
request originated.
Exam tipEven though SSL proxy and TCP proxy load balancers don’t preserve the client IP address by default,
there are workarounds on how to let them “remember” the client IP address. One of these workarounds is by
configuring the target (SSL or TCP) proxy to prepend a PROXY protocol version 1 header to retain the original
connection information as illustrated in the following: gcloud compute target-ssl-proxies update
my-ssl-lb-target-proxy \
--proxy-header=[NONE | PROXY_V1]
1.
They support IPv4 and IPv6 clients.
2.
They terminate the connection from the client in a Google Front End.
3.
The connection from the target proxy to the backends supports the TCP or the SSL (TLS) protocols.
4.
They support the following backends:
a.
Instance groups
b.
Zonal standalone NEGs
c.
Zonal hybrid NEGs
With an SSL (TLS) proxy load balancer, SSL sessions are terminated in one of the Google Front Ends (GFEs),
then the load balancer creates a new connection and sends traffic to your workload backends using SSL
(recommended) or TCP.
It is best practice to use end-to-end encryption for your SSL proxy deployment. To do this, you must configure
your backend service to accept traffic over SSL. This ensures that client traffic decrypted in a GFE is encrypted
again before being sent to the backends for processing.
End-to-end encryption requires you to provision certificates and keys on your backends so they can perform SSL
processing.
As you can see from the architecture in Figure 5-50, an external TCP proxy load balancer shares many of the SSL
proxy load balancer features.
However, since the only supported protocol from client requests is TCP, its target TCP proxies (located in the
worldwide GFEs) do not need to perform SSL authentication. As a result, no SSL certificates nor SSL policies are
supported.
Exam tipExternal TCP proxy load balancers do not support end-to-end (client to backend) encryption. However,
connection 2 (proxy to backend) does support SSL if needed.
As a result, a network load balancer doesn’t have layer 7 (application layer) advanced routing and session-level
capabilities like the HTTP(S) load balancers (types 1, 2, and 5 and their internal companion type 6), which is why
there are no URL map boxes in the architecture represented in Figure 5-51.
A key differentiating feature of a Google Cloud network load balancer is that it preserves the source
client IP address by default.
This is because there are no proxies to terminate the client connection and start a new connection to direct
incoming traffic to the backends.
As you can see in Figure 5-51, there is no target proxy between the forwarding rule and the backend service, and
there is only one connection from the clients to the backends. For this reason, a network load balancer is also
referred to as a pass-through load balancer to indicate that the source client IP address is passed to the backends
intact.
A network load balancer comes in two flavors based on whether the IP address of the forwarding rule is external
(type 9) or internal (type 8). The former allows client IP addresses to be denoted in IPv6 format, but this feature
requires the premium network tier. The latter doesn’t allow clients in IPv6 format because the IP space is RFC
1918, and there is no concern about IP space saturation. For this reason, clients for type 8 network load balancers
are denoted with the Google Cloud Compute Engine icon (VM) to indicate that they are either VMs in Google
Cloud or VMs on-premises.
Exam tipNetwork load balancers are regionally scoped to indicate that backends can only be located in a single
region.
The way session affinity is handled is another differentiating factor that makes a network load balancer unique
when compared to others in Google Cloud.
While HTTP(S)-based load balancers leverage HTTP-specific constructs (e.g., cookies, headers, etc.), a network
load balancer—by virtue of being a layer 4 load balancer—can leverage any combination of source, destination,
port, and protocols to determine session affinity semantics.
The following are some of the common use cases when a network load balancer is a good fit:
Your workload requires to load-balance non-TCP traffic (e.g., UDP, ICMP, ESP) or an unsupported TCP
port by other load balancers.
It is acceptable to have SSL traffic decrypted by your backends instead of by the load balancer. The network
load balancer cannot perform this task. When the backends decrypt SSL traffic, there is a greater CPU burden
on the backends.
It is acceptable to have SSL traffic decrypted by your backends using self-managed certificates. Google-
managed SSL certificates are only available for HTTP(S) load balancers (types 1, 2, 5, and 6) and external SSL
proxy load balancers (type 3).
Your workload is required to forward the original client packets unproxied.
Your workload is required to migrate an existing pass-through-based workload without changes.
Your workload requires advanced network DDoS protection, which necessitates the use of Google Cloud
Armor.
Examples
A simple example of a type 8 load balancer is a shopping cart application as illustrated in Figure 5-52. The load
balancer operates in us-central1, but its internal regional forwarding rule is configured with the –allow-
global-access flag set to true.
As a result, clients in a different region (scenario “a”) of the same VPC can use the shopping cart application.
Also, clients in a VPC peered to the ILB’s VPC can use the application (scenarios “b” and “d”). Even on-premises
clients can use the shopping cart application (scenario “c”).
Figure 5-53 illustrates another example of a type 8 load balancer, which is used to distribute TCP/UDP traffic
between separate (RFC 1918) tiers of a three-tier web application.
Figure 5-53 Three-tier application with a regional internal network TCP/UDP load balancer
Implementation
Let’s see now how a regional internal network TCP/UDP load balancer works.
As illustrated in Figure 5-54, a type 8 load balancer is a software-defined load balancer, which leverages the
Google Cloud software-defined network virtualization stack Andromeda. Andromeda acts as the middle proxy,
thereby eliminating the risk of choke points and helping clients select the optimal backend ready to serve requests.
Not every workload requires access from the Internet. That’s a suitable use case for an internal load balancer
(ILB), where all components (i.e., client, forwarding rule, proxy, URL map, backend services, and backends)
operate in an RFC 1918 IP address space.
You have already learned an example of an internal load balancer in the previous section, that is, the regional
internal TCP/UDP network load balancer (type 8). This load balancer is internal, but it operates at layer 4 of the
OSI model.
What if your workload requires a solution to load-balance internal HTTP(S) traffic without burdening
the backends with SSL offload instead?
You definitely need an HTTPS proxy solution, which terminates the connection from the clients and takes the
burden of decrypting the incoming packets.
The Google Cloud internal HTTP(S) load balancer (type 6) does just that!
The top part of Figure 5-55 illustrates the architecture of this type of load balancer.
This load balancer type is regional and proxy-based, operates at layer 7 in the OSI model, and enables you to run
and scale your services behind an internal IP address.
These services operate in the form of backends hosted on one of the following:
By default, the Google Cloud internal HTTP(S) load balancer (type 6) is accessible only from IPv4 clients in the
same region of the internal regional forwarding rule’s region.
If your workload requires access from clients located in a region other than the internal forwarding rule’s region,
then you must configure the load balancer internal forwarding rule to allow global access as indicated in Figure 5-
55.
Last, an internal HTTP(S) load balancer is a managed service based on the open source Envoy proxy. This enables
rich traffic control capabilities based on HTTP(S) parameters. After the load balancer has been configured, Google
Cloud automatically allocates Envoy proxies in your designated proxy-only subnet to meet your traffic needs.
Another “flavor” of Google Cloud internal, proxy-based load balancers is the Google Cloud internal TCP proxy
load balancer (type 7). See the bottom part of Figure 5-55.
This type of load balancer is essentially the internal version of the Google Cloud external TCP proxy load balancer
(type 4), with a couple of caveats you need to remember for the exam.
Exam tipThe type 7 load balancer’s target proxy is an Envoy proxy managed by Google Cloud. As a result, this
proxy is located in your designated proxy-only subnet instead of a GFE location.
Unlike the case of type 4 load balancers, Regional Private Service Connect NEGs are a valid backend option for
this type of load balancer. This makes sense because the load balancer is internal, and by operating in RFC 1918
IP address space, it has full access to Google APIs and services with a proper subnet configuration (gcloud
compute networks subnets update --enable-private-ip-google-access).
Whether your workload needs to serve requests from the Internet or from your corporate network (external or
internal access, respectively), using L7 or L4 OSI layers (i.e., application layer or transport layer, respectively),
requiring SSL termination in the load balancer itself or its backends, Google Cloud has your load balancing needs
covered.
As you learned so far, Google Cloud offers a wide range of load balancing services. Each service is available—in
most cases—in the premium network tier or standard network tier, resulting in a significant number of load
balancing available options.
You, as a professional Google Cloud network engineer, need to decide the best option for your workload load
balancing requirements. This decision is based on the five pillars of the well-architected framework we introduced
at the beginning of this chapter, that is, elasticity, performance, security, cost-effectiveness, and resilience (or
reliability).
Figure 5-56 summarizes some of the decision criteria you need to consider in the decision process.
Figure 5-56 Google Cloud load balancer comparison
Additionally, the decision tree in Figure 5-57 is also provided to help you choose what load balancer best suits
your workload load balancing requirements.
There are many criteria that drive a decision on the best-suited load balancer. These can be grouped in
categories that map to the five pillars of the well-architected framework, as outlined in the following:
The decision tree in Figure 5-57 is not exhaustive, but highlights the right mix of criteria (in the features column)
you need to consider for your load balancer, for example, backends and security.
Exam tipCloud Armor is supported for all external load balancer types but the regional external HTTP(S) load
balancer (type 5). Identity-Aware Proxy (IAP) is supported by all HTTP(S) load balancer types (types 1, 2, 5, and
6). SSL offload is supported by all external proxy-based load balancer types but the global external TCP proxy
load balancer (type 4).
Protocol Forwarding
Protocol forwarding is a Compute Engine feature that lets you create forwarding rule objects that can send packets
to a single target Compute Engine instance (VM) instead of a target proxy.
A target instance contains a single VM that receives and handles traffic from the corresponding forwarding rule.
NoteThe preceding command assumes you already have a VM your-vm, and it creates a target instance Google
Cloud resource, which is different from the actual VM resource.
You then use the newly created target instance resource to create the forwarding rule with protocol forwarding:
For external protocol forwarding and virtual private networks (VPNs), Google Cloud supports protocol forwarding
for the AH (Authentication Header), ESP (Encapsulating Security Payload), ICMP (Internet Control Message
Protocol), SCTP (Stream Control Transmission Protocol), TCP (Transmission Control Protocol), and UDP (User
Datagram Protocol) protocols.
For internal protocol forwarding, only TCP and UDP are supported.
Google Cloud Compute Engine offers autoscaling to automatically add or remove VM instances to or from a
managed instance group (MIG) based on increases or decreases in load. Autoscaling lets your apps gracefully
handle increases in traffic, and it reduces cost when the need for resources decreases. You can autoscale a MIG
based on its CPU utilization, Cloud Monitoring metrics, schedules, or load balancing serving capacity.
When you set up an autoscaler to scale based on load balancing serving capacity, the autoscaler watches the
serving capacity of an instance group and scales in or scales out when the VM instances are under or over
capacity, respectively.
Exam tipThe serving capacity of a load balancer is always defined in the load balancer’s backend service. When
you configure autoscaling for a MIG that serves requests from an HTTP(S) load balancer (types 1, 2, 5, and 6), the
serving capacity of your load balancer is based on either utilization or rate (requests per second, i.e., RPS, or
queries per second, i.e., QPS) as shown in Figure 5-5.
The values of the following fields in the backend services resource determine the backend’s behavior:
A balancing mode, which defines how the load balancer measures backend readiness for new requests or
connections.
A target capacity, which defines a target maximum number of connections, a target maximum rate, or target
maximum CPU utilization.
A capacity scaler, which adjusts overall available capacity without modifying the target capacity. Its value can
be either 0.0 (preventing any new connections) or a value between 0.1 (10%) and 1.0 (100% default).
These fields can be set using the gcloud compute backend-services add-backend command,
whose synopsis is displayed in Figure 5-58.
Each load balancer type supports different balancing modes, and the balancing mode is based on what backend
type is associated to the backend service.
CONNECTION: Determines how the load is spread based on the number of concurrent connections that the
backend can handle.
RATE: The target maximum number of requests (queries) per second (RPS, QPS). The target maximum
RPS/QPS can be exceeded if all backends are at or above capacity.
UTILIZATION: Determines how the load is spread based on the utilization of instances in an instance
group.
Figure 5-59 shows which balancing mode is supported for each of the nine load balancer types based on
backends.
You heard about Cloud Armor at the beginning of the chapter when we listed the load balancer types that support
advanced DDoS (Distributed Denial-of-Service) protection—since this is a hot topic for the exam, let’s repeat one
more time; these are all external load balancer types except the regional external HTTP(S), that is, types 1, 2, 3, 4,
and 9.
In this section, you will learn about Cloud Armor and how it can be used to better protect your workloads whether
they operate in Google Cloud, in a hybrid, or a multi-cloud environment.
Security Policies
Google Cloud Armor uses security policies to protect your application from common web attacks. This is achieved
by providing layer 7 filtering and by parsing incoming requests in a way to potentially block traffic before it
reaches your load balancer’s backend services or backend buckets.
Each security policy is comprised of a set of rules that filter traffic based on conditions such as an incoming
request’s IP address, IP range, region code, or request headers.
Google Cloud Armor security policies are available only for backend services of global external HTTP(S) load
balancers (type 1), global external HTTP(S) load balancers (classic) (type 2), global external SSL proxy load
balancers (type 3), or global external TCP proxy load balancers (type 4). The load balancer can be in a premium or
standard tier.
The backends associated to the backend service can be any of the following:
Instance groups
Zonal network endpoint groups (NEGs)
Serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
Internet NEGs for external backends
Buckets in Cloud Storage
Exam tipWhen you use Google Cloud Armor to protect a hybrid or a multi-cloud deployment, the backends must
be Internet NEGs. Google Cloud Armor also protects serverless NEGs when traffic is routed through a load
balancer. To ensure that only traffic that has been routed through your load balancer reaches your serverless NEG,
see Ingress controls.
Google Cloud Armor also provides advanced network DDoS protection for regional external TCP/UDP network
load balancers (type 9), protocol forwarding, and VMs with public IP addresses. For more information about
advanced DDoS protection, see Configure advanced network DDoS protection.
Adaptive Protection
Google Cloud Armor Adaptive Protection helps you protect your Google Cloud applications, websites, and
services against L7 DDoS attacks such as HTTP floods and other high-frequency layer 7 (application-level)
malicious activity. Adaptive Protection builds machine learning models that do the following:
Full Adaptive Protection alerts are available only if you subscribe to Google Cloud Armor Managed Protection
Plus. Otherwise, you receive only a basic alert, without an attack signature or the ability to deploy a suggested
rule.
Google Cloud Armor comes with preconfigured WAF rules, which are complex web application firewall (WAF)
rules with many signatures that are compiled from open source industry standards.
Each signature corresponds to an attack detection rule in the ruleset. Incoming requests are evaluated against the
preconfigured WAF rules.
Each signature has also a sensitivity level, which ranges between zero (no rules are enabled by default) and four
(all rules are enabled by default).
A lower sensitivity level indicates higher confidence signatures, which are less likely to generate a false positive.
A higher sensitivity level increases security, but also increases the risk of generating a false positive.
When you select a sensitivity level for your WAF rule, you opt in signatures at the sensitivity levels less than or
equal to the selected sensitivity level. In the following example, you tune a preconfigured WAF rule by selecting
the sensitivity level of 1:
evaluatePreconfiguredWaf('sqli-v33-stable', {'sensitivity': 1})
In addition to using the preconfigured WAF rules, you can also define prioritized rules with configurable match
conditions and actions in a security policy.
A rule takes effect, meaning that the configured action is applied, if the rule is the highest priority rule whose
conditions match the attributes of the incoming request.
A basic match condition, which contains lists of IP addresses or lists of IP address ranges (a mixed list of
addresses and ranges is allowed)
An advanced match condition, which contains an expression with multiple subexpressions to match on a
variety of attributes of an incoming request
The custom rules language is used to write the expressions in advanced match conditions for security policy rules.
The Google Cloud Armor custom rules language is an extension of the Common Expression Language (CEL).
For example, the following expression uses the attributes origin.ip and 9.9.9.0/24 in the operation
inIpRange(). In this case, the expression returns true if origin.ip is within the 9.9.9.0/24 IP address
range:
inIpRange(origin.ip, '9.9.9.0/24')
Once created, a security policy is a Google Cloud resource that can be attached to one (or more) backend
service(s) in order to enforce the rules expressed within the policy.
The following are the high-level steps for configuring Google Cloud Armor security policies to enable rules
that allow or deny traffic to global external HTTP(S) load balancers (type 1) or global external HTTP(S) load
balancers (classic) (type 2):
1.Create a Google Cloud Armor security policy.
2.
Add rules to the security policy based on IP address lists, custom expressions, or preconfigured expression
sets.
3.
Attach the security policy to a backend service of the global external HTTP(S) load balancer or global
external HTTP(S) load balancer (classic) for which you want to control access.
4.
Update the security policy as needed.
In the example displayed in Figure 5-60, you create two Google Cloud Armor security policies and apply them
to different backend services.
Figure 5-60 Example of two security policies applied to different backend services
In the example, these are the Google Cloud Armor security policies:
You apply mobile-clients-policy to the game service, whose backend service is called games, and you
apply internal-users-policy to the internal test service for the testing team, whose corresponding
backend service is called test-network.
If the backend instances for a backend service are in multiple regions, the Google Cloud Armor security policy
associated with the service is applicable to instances in all regions. In the preceding example, the security policy
mobile-clients-policy is applicable to instances 1, 2, 3, and 4 in us-central1 and to instances 5 and 6
in us-east1.
Example
l d t it li i l d t 2147483647 \
gcloud compute security-policies rules update 2147483647 \
--security-policy mobile-clients-policy \
--action "deny-404"
gcloud compute security-policies rules update 2147483647 \
--security-policy internal-users-policy \
--action "deny-502"
In the preceding commands, the first (and only) positional argument denotes the security policy priority, which is
an integer ranging from 0 (highest) to 2147483647 (lowest).
In the preceding commands, the two CIDR blocks 192.0.2.0/24 and 198.51.100.0/24 denote Internet
reserved IP addresses scoped for documentation and examples.
Cloud CDN (Content Delivery Network) is another network service, which uses Google’s global edge network to
serve content closer to your users, which reduces latency and delivers better website and application browsing
experiences.
Cloud CDN uses the concept of a cache, which stores data so that future requests for that data can be served faster;
the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.
Since the cache concept is a tenet of the HTTP protocol design specification, and since the cache data store is
located in one of the many Google Front Ends in the Google’s global edge network, Cloud CDN is naturally
supported by the two global external HTTP(S) load balancers (types 1 and 2 in premium tier).
Instance groups
Zonal network endpoint groups (NEGs)
Serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
Internet NEGs for external backends
Buckets in Cloud Storage
A cache is a data store that uses infrastructure located in the Google Edge Network, as close as possible to the
users of your application.
The cached content is a copy of cacheable content that is stored on origin servers. You will learn what “cacheable”
means in the “Cacheable Responses” section. For the time being, assume that not all responses from the origin
servers can be stored in a Cloud CDN cache.
You can toggle the use of Cloud CDN by enabling or disabling Cloud CDN in the configuration of your
HTTP(S) load balancer’s backend service (serving dynamic content) or backend bucket (serving static content) as
shown in Figure 5-61 where the origin servers are a zonal network endpoint group.
The GFE determines whether a cached response to the user’s request exists in the cache, and if it does, it returns
the cached response to the user without any further action. This interaction is called cache hit, because Cloud CDN
was able to serve the request from the user by retrieving the cached response directly from the cache, thereby
avoiding an extra round-trip to the origin servers (backends), as well as the time spent regenerating the content. A
cache hit is displayed with a green arrow in Figure 5-61.
Conversely, if the GFE determines that a cached response does not exist in the cache—for example, when the
cache has no entries or when a request has been sent for the first time—the request is forwarded to the HTTP(S)
load balancer and eventually reaches the origin servers (backends) for processing. Upon completion, the computed
content is packaged in an HTTP(S) response and is sent back to the cache, which becomes replenished, and is then
sent back to the user. This interaction is called a cache miss, because the GFE failed to retrieve a response from the
cache and was forced to reach the origin servers in order to serve the request from the user. A cache miss is
displayed with a red arrow in Figure 5-61.
If the origin server’s response to this request is cacheable, Cloud CDN stores the response in the Cloud CDN
cache for future requests. Data transfer from a cache to a client is called cache egress. Data transfer to a cache is
called cache fill.
As shown in Figure 5-61, Cloud CDN can be enabled (--enable-cdn) or disabled (--no-enable-cdn) by
configuring the global external HTTPS load balancer’s backend service or backend bucket as follows:
Optional flags:
--no-cache-key-include-protocol
--no-cache-key-include-host
--no-cache-key-include-query-string
Additionally, upon enabling Cloud CDN you can choose whether Cloud CDN should cache all content, only
static content, or selectively pick and choose which content to cache based on a setting in the origin server. This
can be achieved using the --cache-mode optional flag, whose value can be one the following:
FORCE_CACHE_ALL, which caches all content, ignoring any private, no-store, or no-cache
directives in Cache-Control response headers.
CACHE_ALL_STATIC, which automatically caches static content, including common image formats,
media (video and audio), and web assets (JavaScript and CSS). Requests and responses that are marked as
uncacheable, as well as dynamic content (including HTML), aren’t cached.
USE_ORIGIN_HEADERS, which requires the origin to set valid caching headers to cache content.
Responses without these headers aren’t cached at Google’s edge and require a full trip to the origin on every
request, potentially impacting performance and increasing load on the origin servers.
WarningSetting the cache-mode to FORCE_CACHE_ALL may result in Cloud CDN caching private, per-user
(Personally Identifiable Information—PII) content. You should only enable this on backends that are not serving
private or dynamic content, such as storage buckets. To learn more, visit
https://csrc.nist.gov/glossary/term/PII.
Cacheable Responses
A cacheable response is an HTTP response that Cloud CDN can store and quickly retrieve, thus allowing for faster
load times resulting in lower latencies and better user experiences. Not all HTTP responses are cacheable. Cloud
CDN stores responses in cache if all the conditions listed in Figure 5-62 are true.
Exam tipYou don’t need to memorize all the preceding criteria for the exam. However, the ones you should
remember are the first two and the last, that is, Cloud CDN must be enabled for a backend service or a backend
bucket; only responses to GET requests may be cached, and there is a limit to cacheable content size.
For backend services, Cloud CDN defaults to using the complete request URI as the cache key. For example,
https://dariokart.com/images/supermario.jpg is the complete URI for a particular request for
the supermario.jpg object. This string is used as the default cache key. Only requests with this exact string
match. Requests for
http://dariokart.com/images/supermario.jpg
https://dariokart.com/images/supermario.jpg?user=user1
do not match.
For backend buckets, Cloud CDN defaults to using the URI without the protocol or host. By default, only query
parameters that are known to Cloud Storage are included as part of the cache key (e.g., “generation”).
Thus, for a given backend bucket, the following URIs resolve to the same cached object
/images/supermario.jpg:
http://dariokart.com/images/supermario.jpg
https://dariokart.com/images/supermario.jpg
https://dariokart.com/images/supermario.jpg?user=user1
http://dariokart.com/images/supermario.jpg?user=user1
https://dariokart.com/images/supermario.jpg?user=user2
https://media.dariokart.com/images/supermario.jpg
https://www.dariokart.com/images/supermario.jpg
Exam tipYou can change which parts of the URI are used in the cache key. While the filename and path must
always be part of the key, you can include or omit any combination of protocol, host, or query string when
customizing your cache key.
Customizing Cache Keys
You can override the default behavior of cache key definition for the backend service and backend bucket Google
Cloud resources. The latter doesn’t have flags to include (or exclude) protocol and host in a cache key, because
protocol and host do not influence how objects are referenced within a Cloud Storage bucket.
This can be achieved using some of the flags that control HTTP constructs such as the query string, HTTP headers,
and HTTP cookies as explained in the following sections.
Enabling Cloud CDN
First and foremost, you need to enable Cloud CDN on your HTTP(S) load balancer’s backend service or backend
bucket:
As you learned earlier, by default backend services configured to use Cloud CDN include all components of the
request URI in cache keys. If you want to exclude the protocol, host, and query string, proceed as follows:
These instructions readd the protocol, host, and query string to the cache key for an existing backend service that
already has Cloud CDN enabled:
These instructions set CDN cache keys to use an include or exclude list with query string parameters.
Use this command to set the strings user and time to be in the include list:
These instructions set Cloud CDN cache keys to use HTTP headers:
These instructions set Cloud CDN cache keys to use HTTP cookies:
Cache Invalidation
After an object is cached, it remains in the cache until it expires or is evicted to make room for new content. You
can control the expiration time through the standard HTTP header Cache-Control (www.rfc-
editor.org/rfc/rfc9111#section-5.2).
Cache invalidation is the action of forcibly removing an object (a key-value pair) from the cache prior to its
normal expiration time.
Exam tipThe Cache-Control HTTP header field holds the directives (instructions) displayed in Figure 5-63—
in both requests and responses—that control caching behavior. You don’t need to know each of the sixteen
Cache-Control directives, but it’s important you remember the two directives no-store and private.
The former indicates not to store any content in any cache—whether it be a private cache (e.g., local cache in your
browser) or a shared cache (e.g., proxies, Cloud CDN, and other Content Delivery Network caches). The latter
indicates to only store content in private caches.
Path Pattern
Each invalidation request requires a path pattern that identifies the exact object or set of objects that should be
invalidated. The path pattern can be either a specific path, such as /supermario.png, or an entire directory
structure, such as /pictures/*. The following rules apply to path patterns:
Exam tipIf you have URLs that contain a query string, for example, /images.php?
image=supermario.png, you cannot selectively invalidate objects that differ only by the value of the query
string. For example, if you have two images, /images.php?image=supermario.png and
/images.php?image=luigi.png, you cannot invalidate only luigi.png. You have to invalidate all
images served by images.php, by using /images.php as the path pattern.
The next sections describe how to invalidate your Cloud CDN cached content.
For example, if a file located at /images/luigi.jpg has been cached and needs to be invalidated, you can use
several methods to invalidate it, depending on whether you want to affect only that file or a wider scope. In each
case, you can invalidate for all hostnames or for only one hostname.
--path "/images/luigi.jpg"
To invalidate a single file for a single host, add the --host flag as follows:
By default, the Google Cloud CLI waits until the invalidation has completed. To perform the invalidation in the
background, append the --async flag to the command line.
To invalidate the whole directory for all hosts, use the command
To invalidate the whole directory for a single host, add the --host flag as follows:
To perform the invalidation in the background, append the --async flag to the command line.
Invalidate Everything
To invalidate all directories for a single host, add the --host flag as follows:
Signed URLs
Signed URLs give time-limited resource access to anyone in possession of the URL, regardless of whether the user
has a Google Account.
A signed URL is a URL that provides limited permission and time to make a request. Signed URLs contain
authentication information in their query strings, allowing users without credentials to perform specific actions on
a resource. When you generate a signed URL, you specify a user or service account that must have sufficient
permission to make the request associated with the URL.
After you generate a signed URL, anyone who possesses it can use the signed URL to perform specified actions
(such as reading an object) within a specified period of time.
You enable support for Cloud CDN signed URLs and signed cookies by creating one or more keys on a Cloud
CDN–enabled backend service, backend bucket, or both.
For each backend service or backend bucket, you can create and delete keys as your security needs dictate. Each
backend can have up to three keys configured at a time. We suggest periodically rotating your keys by deleting the
oldest, adding a new key, and using the new key when signing URLs or cookies.
You can use the same key name in multiple backend services and backend buckets because each set of keys is
independent of the others. Key names can be up to 63 characters. To name your keys, use the characters A–Z, a–z,
0–9, _ (underscore), and - (hyphen).
When you create keys, be sure to keep them secure because anyone who has one of your keys can create signed
URLs or signed cookies that Cloud CDN accepts until the key is deleted from Cloud CDN. The keys are stored on
the computer where you generate the signed URLs or signed cookies. Cloud CDN also stores the keys to verify
request signatures.
To keep the keys secret, the key values aren’t included in responses to any API requests. If you lose a key, you
must create a new one.
Exam tipKeep the generated key file private, and do not expose it to users or store it directly in source code.
Consider using a secret storage mechanism such as Cloud Key Management Service to encrypt the key and
provide access to only trusted applications.
First, generate a strongly random key and store it in the key file with the following command:
head -c 16 /dev/urandom | base64 | tr +/ -_ > KEY_FILE_NAME
To list the keys on a backend service or backend bucket, run one of the following commands:
When URLs signed by a particular key should no longer be honored, run one of the following commands to
delete that key from the backend service or backend bucket. This will prevent users from consuming the URL that
was signed with KEY_NAME:
Signing URLs
Use these instructions to create signed URLs by using the gcloud compute sign-url command as follows:
This command reads and decodes the base64url encoded key value from KEY_FILE_NAME and then outputs a
signed URL that you can use for GET or HEAD requests for the given URL.
creates a signed URL that expires in one hour. For more information about time format, visit
https://cloud.google.com/sdk/gcloud/reference/topic/datetimes.
Exam tipThe URL must be a valid URL that has a path component. For example, http://dariokart.com
is invalid, but https://dariokart.com/ and https://dariokart.com/whatever are both valid
URLs.
If the optional --validate flag is provided, this command sends a HEAD request with the resulting URL
and prints the HTTP response code.
If the signed URL is correct, the response code is the same as the result code sent by your backend.
If the response code isn’t the same, recheck KEY_NAME and the contents of the specified file, and make sure that
the value of TIME_UNTIL_EXPIRATION is at least several seconds.
If the --validate flag is not provided, the following are not verified:
The inputs
The generated URL
The generated signed URL
The URL returned from the Google Cloud CLI can be distributed according to your needs.
NoteWe recommend signing only HTTPS URLs, because HTTPS provides a secure transport that prevents the
signature component of the signed URL from being intercepted. Similarly, make sure that you distribute the
signed URLs over secure transport protocols such as TLS/HTTPS.
Custom Origins
A custom origin is an Internet network endpoint group (NEG), that is, a backend that resides outside of Google
Cloud and is reachable across the Internet.
Similar to configuring Cloud CDN with your endpoints deployed in Google Cloud, you can use the network
endpoint group (NEG) API to add your server as a custom origin for Cloud CDN.
To specify the custom origin, use an Internet NEG. An Internet NEG has one of the endpoint types specified in
Figure 5-64.
The best practice is to create the Internet NEG with the INTERNET_FQDN_PORT endpoint type and an FQDN
(Fully Qualified Domain Name) value as an origin hostname value. This insulates the Cloud CDN configuration
from IP address changes in the origin infrastructure. Network endpoints that are defined by using FQDNs are
resolved through public DNS. Make sure that the configured FQDN is resolvable through Google Public DNS.
After you create the Internet NEG, the type cannot be changed between INTERNET_FQDN_PORT and
INTERNET_IP_PORT. You need to create a new Internet NEG and change your backend service to use the new
Internet NEG.
Figure 5-65 shows an Internet NEG used to deploy an external backend with HTTP(S) load balancing and
Cloud CDN.
Figure 5-65 An example of a custom origin in a hybrid topology
Best Practices
In this last section, you will learn a few load balancing best practices I want to share based on my experience and
my research with GCP.
Google doesn’t charge for TLS, so you should take advantage of this “bonus” feature while keeping your sensitive
data always encrypted in transit.
Restrict Ingress Traffic with Cloud Armor and Identity-Aware Proxy (IAP)
Use Cloud Armor to secure at the edge by filtering ingress traffic, and enforce context-aware access controls for
your workloads with Identity-Aware Proxy (IAP).
Leverage the OSI layer 3–7 protection and the geolocation and WAF (web application firewall) defense
capabilities offered by Cloud Armor.
When Cloud Armor combines forces with IAP, you are significantly strengthening your workloads’ security
posture.
Identity-Aware Proxy is a Google Cloud service that accelerates you on your way to a Zero Trust Security Model.
If the content served by your backends is cacheable, enable Cloud CDN (Content Delivery Network). The
enablement of Cloud CDN is easy, and your users will be happy with a superior navigation experience.
You learned in the “Configuring External HTTP(S) Load Balancers” section that the HTTP/2 protocol is supported
by all four HTTP(S) load balancers, that is, types 1, 2, 5, and 6.
When compared to HTTP/1.1, the HTTP/2 protocol has the main advantage of supporting the QUIC (Quick UDP
Internet Connections) protocol and the gRPC high-performance Remote Procedure Call (RPC) framework.
All of this results in better performance, lower latency, and better user experiences.
Exam tipQUIC is a transport layer protocol (layer 4) developed by Google, which is faster, more efficient, and
more secure than earlier protocols, for example, TCP. For the exam, you need to know that QUIC is only
supported by global HTTP(S) load balancers, that is, types 1 and 2. For increased speed, QUIC uses the UDP
transport protocol, which is faster than TCP but less reliable. It sends several streams of data at once to make up
for any data that gets lost along the way, a technique known as multiplexing. For better security, everything sent
over QUIC is automatically encrypted. Ordinarily, data has to be sent over HTTPS to be encrypted. But QUIC has
TLS encryption built-in by default.
QUIC results in lower latency due to less handshakes.
When traffic egresses the Google global backbone, you incur outbound data transfer costs. As a result, to architect
your workload for cost-effectiveness and performance, you should consider using the Google premium network
tier because this tier minimizes egress-related costs by letting your traffic stay in the Google global backbone as
long as possible.
However, you may use free tier services for nonmission-critical workloads. As a result, in order to reduce costs—
at the expense of performance and resilience—you may choose to configure your load balancer to use the standard
network tier, which uses the Internet more than the Google global backbone.
Nevertheless, Google Cloud gives you the option to choose between premium network tier and standard network
tier on a per–load balancer basis or a per-project basis (i.e., all your load balancers in your projects will default to
your chosen network tier).
Your workload backends can leverage metadata sent in the form of HTTP request headers (e.g., client geolocation,
cache-control properties, etc.) to make decisions.
Take advantage of the URL map advanced HTTP capabilities to route traffic to the proper backend services or
backend buckets.
Exam Questions
A.
A zonal managed instance group
B.
A regional managed instance group
C.
An unmanaged instance group
D.
A network endpoint group
Rationale
A is not correct because it would only allow the use of a single zone within a region.
B is CORRECT because it allows the application to be deployed in multiple zones within a region.
C is not correct because it does not allow for autoscaling.
D is not correct because traffic cannot be distributed across multiple subnets and is a singular NEG as opposed
to multiple NEGs.
You have the Google Cloud load balancer backend configuration shown in Figure 5-66. You want to reduce your
instance group utilization by 20%. Which settings should you use?
A.
Maximum CPU utilization: 60 and Maximum RPS: 80
B.
Maximum CPU utilization: 80 and Capacity: 80
C.
Maximum RPS: 80 and Capacity: 80
D.
Maximum CPU: 60, Maximum RPS: 80, and Capacity: 80
Rationale
A is not correct because this reduces both the CPU utilization and requests per second, resulting in more than a
20% reduction.
B is CORRECT because you are changing the overall instance group utilization by 20%.
C is not correct because this reduces the requests per second by more than 20%.
D is not correct because this reduces both max CPU and RPS, resulting in greater than 20%.
Your company offers a popular gaming service. The service architecture is shown in Figure 5-67. Your instances
are deployed with private IP addresses, and external access is granted through a global load balancer. Your
application team wants to expose their test environment to select users outside your organization. You want to
integrate the test environment into your existing deployment to reduce management overhead and restrict access to
only select users. What should you do?
A.
Create a new load balancer, and update VPC firewall rules to allow test clients.
B.
Create a new load balancer, and update the VPC Service Controls perimeter to allow test clients.
C.
Add the backend service to the existing load balancer, and modify the existing Cloud Armor policy.
D.
Add the backend service to the existing load balancer, and add a new Cloud Armor policy and target test-
network.
Rationale
A is not correct because the HTTPS load balancer acts as a proxy and doesn’t provide the correct client IP
address.
B is not correct because VPC Service Controls protects Google Managed Services.
C is not correct because this change would allow everyone to access the test service.
D is CORRECT because this provides integration and support for multiple backend services. Also, a
Cloud Armor Network Security Policy is attached to backend services in order to whitelist/blacklist
client CIDR blocks, thus allowing traffic to specific targets. In this case, a Cloud Armor Network
Security Policy would allow (whitelist) incoming requests originated by the selected testers’ IP ranges to
reach the test-network backend and deny them (blacklist) access to the game backends.
One of the secure web applications in your GCP project is currently only serving users in North America. All of
the application’s resources are currently hosted in a single GCP region. The application uses a large catalog of
graphical assets from a Cloud Storage bucket. You are notified that the application now needs to serve global
clients without adding any additional GCP regions or Compute Engine instances. What should you do?
A.Configure Cloud CDN.
B.
Configure a TCP proxy.
C.
Configure a network load balancer.
D.
Configure dynamic routing for the subnet hosting the application.
Rationale
A is CORRECT because Cloud CDN will front (cache) static content from a Cloud Storage bucket and
move the graphical resources closest to the users.
B,C are not correct because Cloud CDN requires and HTTPS proxy.
D is not correct because dynamic routing will not help serve additional web clients.
You have implemented an HTTP(S) load balancer to balance requests across Compute Engine virtual machine
instances. During peak times, your backend instances cannot handle the number of requests per second (RPS),
which causes some requests to be dropped. Following Google-recommended practices, you want to efficiently
scale the instances to avoid this scenario in the future. What should you do?
A.
Use unmanaged instance groups, and upgrade the instance machine type to use a higher-performing CPU.
B.
Use unmanaged instance groups, and double the number of instances you need at off-peak times.
C.
Use managed instance groups, and turn on autoscaling based on the average CPU utilization of your
instances.
D.
Use managed instance groups, turn on autoscaling for HTTP(S) load balancing usage (RPS), and set target
load balancing usage as a percentage of the serving rate.
Rationale
A is not correct because the stated limitation is not the result of CPU utilization, and this method is inefficient.
B is not correct because doubling the number of instances is inefficient.
C is not correct because the stated limitation is on requests per second, not CPU utilization.
D is CORRECT because the autoscaling method leverages the load balancer and efficiently scales the
instances.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification Companion, Certification Study Companion Series
https://doi.org/10.1007/978-1-4842-9354-6_6
You learned in Chapter 5 how Google Cloud load balancing comprises an ecosystem of products and services.
Load balancing alone includes nine different types of load balancers, and most of them are available in the two
network service tiers, that is, premium and standard.
While load balancing focuses on the performance and reliability aspects of your workloads, there are other
important factors you need to consider when designing the network architecture of your workloads.
In this chapter, our focus will shift toward security. I already mentioned it once, but you should also have started to
notice how security and networking are two sides of the same coin: there is no well-architected workload designed
without addressing network and security—altogether.
In this chapter, you will learn how to configure three advanced network services, that is, Cloud DNS, Cloud NAT,
and Packet Mirroring policies.
These three advanced network services supplement nicely the capabilities offered by the GCP load balancers and
when properly used will reinforce the security posture of your workloads.
DNS is a hierarchical distributed database that lets you store IP addresses and other data and look them up by
name. Cloud DNS lets you publish your zones and records in DNS without the burden of managing your own DNS
servers and software.
Cloud DNS offers both public zones and private managed DNS zones.
A public zone hosts DNS records that are visible to the Internet, whereas a private zone hosts DNS records that are
visible only inside your organization. This is done by setting up one or more VPC networks and by connecting
them to your organization’s data centers with VLAN attachments or IPsec tunnels.
Cloud DNS supports Identity and Access Management (IAM) permissions at the project level and individual DNS
zone level. This approach allows for separation of duties at the level that best suits your security requirements.
A managed zone is the container for all of your DNS records that share the same domain name, for example,
dariokart.com. Managed zones are automatically assigned a set of name servers when they are created to
handle responding to DNS queries for that zone. A managed zone has quotas for the number of resource records
that it can include.
To create a new managed zone, run the dns managed-zones create command with the --visibility
flag set to public:
NoteCloud DNS creates NS (Name Server) and SOA (Start of Authority) records for you automatically when you
create the zone. Do not change the name of your zone’s NS record, and do not change the list of name servers that
Cloud DNS selects for your zone.
A private managed zone is a container of DNS records that is only visible from one or more VPC networks that
you specify.
To create a private zone, run the dns managed-zones create command with the --visibility flag
set to private:
A forwarding zone overrides normal DNS resolution of the specified zones. Instead, queries for the specified zones
are forwarded to the listed forwarding targets:
When two networks are peered, they do not automatically share DNS information. With DNS peering, you can
have one network (consumer network) forward DNS requests to another network (producer network). You can do
this by creating a peering zone in the consumer network that forwards matching DNS requests to the producer
network.
Exam tipVPC network peering is not the same as DNS peering. VPC network peering allows VMs in multiple
projects (even in different organizations) to reach each other, but it does not change name resolution. Resources in
each VPC network still follow their own resolution order.
In contrast, through DNS peering, you can allow requests to be forwarded for specific zones to another VPC
network. This lets you forward requests to different Google Cloud environments, regardless of whether the VPC
networks are connected.
VPC network peering and DNS peering are also set up differently. For VPC network peering, both VPC networks
need to set up a peering relationship to the other VPC network. The peering is then automatically bidirectional.
DNS peering unidirectionally forwards DNS requests and does not require a bidirectional relationship between
VPC networks. A VPC network referred to as the DNS consumer network performs lookups for a Cloud DNS
peering zone in another VPC network, which is referred to as the DNS producer network. Users with the IAM
permission dns.networks.targetWithPeeringZone on the producer network’s project can establish
DNS peering between consumer and producer networks. To set up DNS peering from a consumer VPC network,
you require the DNS peer role for the producer VPC network’s host project. We will discuss DNS peering in detail
shortly, but if you can’t wait to see how this works, have a look at Figure 6-3.
Managing Records
Managing DNS records for the Cloud DNS API involves sending change requests to the API. This page describes
how to make changes, consisting of additions and deletions to or from your resource record sets collection. This
page also describes how to send the desired changes to the API using the import, export, and transaction
commands.
Before learning how to perform an operation on a DNS resource record, let’s review the list of resource record
types. Figure 6-1 displays the complete list.
Figure 6-1 Resource record types
You add or remove DNS records in a resource record set by creating and executing a transaction that specifies the
operations you want to perform. A transaction is a group of one or more record changes that should be propagated
altogether and atomically, that is, either all or nothing in the event the transaction fails. The entire transaction
either succeeds or fails, so your data is never left in an intermediate state.
You start a transaction using the gcloud dns record-sets transaction start command as
follows:
To add a record to a transaction, you use the transaction add command as follows:
where
where
To remove a record as part of a transaction, you use the remove command as follows:
where
To replace an existing record, issue the remove command followed by the add command.
NoteYou can also edit transaction.yaml in a text editor to manually specify additions, deletions, or
corrections to DNS records. To view the contents of transaction.yaml, run
gcloud dns record-sets transaction describe
To import record sets, you can use import and export to copy record sets into and out of a managed zone. The
formats you can import from and export to are either BIND zone file format or YAML records format:
To export a record set, use the dns record-sets export command. To specify that the record sets are
exported into a BIND zone–formatted file, use the --zone-file-format flag. For example:
Exam tipIf you omit the --zone-file-format flag, the gcloud dns record-sets export
command exports the record set into a YAML-formatted records file.
For example, the command
To display the current DNS records for your zone, use the gcloud dns record-sets list command:
The command outputs the JSON response for the resource record set for the first 100 records (default). You
can specify these additional parameters:
Cloud DNS supports the migration of an existing DNS domain from another DNS provider to Cloud DNS. This
procedure describes how to complete the necessary steps.
To create a zone, run the gcloud dns managed-zones create command you learned in the previous
section:
To export your zone file, see your provider's documentation. Cloud DNS supports the import of zone files in BIND
or YAML records format.
For example, in AWS Route 53, which does not support export, you can use the open source cli53 tool.
After you have exported the file from your DNS provider, you can use the gcloud dns record-sets
import command to import it into your newly created managed zone.
Remember that the addition of the flag --zone-file-format tells Google Cloud that the input record set is in
BIND format. As you already learned in the previous section, if you omit this flag Google Cloud expects the input
file to be in YAML format instead.
WarningIf your import file contains NS or SOA records for the apex of the zone, they will conflict with the
preexisting Cloud DNS records. To use the preexisting Cloud DNS records (recommended), ensure that you
remove the NS or SOA records from your import file. However, there are use cases for overriding this behavior,
which goes beyond the scope of the exam.
To import record sets correctly, you must remove the apex records:
To monitor and verify that the Cloud DNS name servers have picked up your changes, you can use the Linux
watch and dig commands.
First, look up your zone's Cloud DNS name servers using the gcloud dns managed-zones describe
command:
nameServers:
- ns-cloud-a1.googledomains.com.
- ns-cloud-a2.googledomains.com.
- ns-cloud-a3.googledomains.com.
- ns-cloud-a4.googledomains.com.
Replace ZONE_NAME_SERVER with one of the name servers returned when you ran the previous command.
Sign in to your registrar provider and change the authoritative name servers to point to the name servers that you
saw in step 1. At the same time, make a note of the time to live (TTL) that your registrar has set on the records.
That tells you how long you have to wait before the new name servers begin to be used.
To get the authoritative name servers for your domain on the Internet, run the following Linux commands:
If the output shows that all changes have propagated, you’re done. If not, you can check intermittently, or you
can automatically run the command every two seconds while you wait for the name servers to change. To do that,
run the following:
The Domain Name System Security Extensions (DNSSEC) is a feature of the Domain Name System (DNS) that
authenticates responses to domain name lookups. It does not provide privacy protections for those lookups, but
prevents attackers from manipulating or poisoning the responses to DNS requests.
There are three places where you must enable and configure DNSSEC for it to protect domains from spoofing
and poisoning attacks:
1.
The DNS zone for your domain must serve special DNSSEC records for public keys (DNSKEY), signatures
(RRSIG), and nonexistence (NSEC or NSEC3 and NSEC3PARAM) to authenticate your zone's contents.
Cloud DNS manages this automatically if you enable DNSSEC for a zone.
2.
The top-level domain (TLD) registry (for example.com, this would be .com) must have a DS (Delegation
Signer) record that authenticates a DNSKEY record in your zone. Do this by activating DNSSEC at your
domain registrar.
3.
For full DNSSEC protection, you must use a DNS resolver that validates signatures for DNSSEC-signed
domains. You can enable validation for individual systems or your local caching resolvers if you administer
your network's DNS services. You can configure systems to use public resolvers that validate DNSSEC,
notably Google Public DNS and Verisign Public DNS.
The second point limits the domain names where DNSSEC can work. Both the registrar and registry must support
DNSSEC for the TLD that you are using. If you cannot add a DS record through your domain registrar to activate
DNSSEC, enabling DNSSEC in Cloud DNS has no effect.
The DNSSEC documentation for both your domain registrar and TLD registry
The Google Cloud community tutorial's domain registrar–specific instructions
The ICANN (Internet Corporation for Assigned Names and Numbers) list of domain registrar DNSSEC
support to confirm DNSSEC support for your domain
If the TLD registry supports DNSSEC, but your registrar does not (or does not support it for that TLD), you might
be able to transfer your domains to a different registrar that does. After you have completed that process, you can
activate DNSSEC for the domain.
Each VPC network provides DNS name resolution services to the VMs that use it. When a VM uses its metadata
server 169.254.169.254 as its name server, Google Cloud searches for DNS records according to the name
resolution order.
By default, a VPC network's name resolution services—through its name resolution order—are only available to
that VPC network itself. You can create an inbound server policy in your VPC network to make these name
resolution services available to an on-premises network that is connected using Cloud VPN or Cloud Interconnect.
When you create an inbound server policy, Cloud DNS takes an internal IP address from the primary IP address
range of each subnet that your VPC network uses. For example, if you have a VPC network that contains two
subnets in the same region and a third subnet in a different region, a total of three IP addresses are reserved for
inbound forwarding. Cloud DNS uses these internal IP addresses as entry points for inbound DNS requests.
You can change the name resolution order by creating an outbound server policy that specifies a list of alternative
name servers. When you specify alternative name servers for a VPC network, those servers are the only name
servers that Google Cloud queries when handling DNS requests from VMs in your VPC network that are
configured to use their metadata servers (169.254.169.254).
NoteA DNS policy that enables outbound DNS forwarding disables resolution of Compute Engine internal DNS
and Cloud DNS managed private zones. An outbound server policy is one of two methods for outbound DNS
forwarding.
In a hybrid or a multi-cloud environment, DNS records for private (RFC 1918) resources often need to be resolved
across environments.
Traditionally, on-premises DNS records are manually administered by using an authoritative DNS server, such as
BIND in UNIX/Linux environments or Active Directory in Microsoft Windows environments. In contrast, Google
Cloud DNS records are administered by fully managed DNS services like Cloud DNS.
Either way, a strategy on how to forward private DNS requests between environments is needed to make sure that
services can be effectively and efficiently addressed from both on-premises environments and within Google
Cloud.
As a Google Cloud Professional Cloud Network Engineer, you need to understand your business and technical
requirements so that you can determine where an authoritative service for all domain resolution takes place.
Does it make sense to have an authoritative service for all domain resolution on-premises, in Google
Cloud, or both?
Let’s discuss these three approaches and learn when either one of the three is better suited than the other.
The easiest way is to continue using your existing on-premises DNS server for authoritatively hosting all internal
domain names. In that case, you can use an alternative name server to forward all requests from Google Cloud
through outbound DNS forwarding.
Another approach is to migrate to Cloud DNS as an authoritative service for all domain resolution. You can then
use private zones and inbound DNS forwarding to migrate your existing on-premises name resolution to Cloud
DNS.
Approach 3 (Recommended): Use a Hybrid Approach with Two Authoritative DNS Systems
Google Cloud recommends using a hybrid approach with two authoritative DNS systems. In this approach
Authoritative DNS resolution for your private Google Cloud environment is done by Cloud DNS.
Authoritative DNS resolution for on-premises resources is hosted by existing DNS servers on-premises.
Split-horizon DNS can provide a mechanism for security and privacy management by logical or physical
separation of DNS resolution for internal network access (RFC 1918) and access from an unsecure, public network
(e.g., the Internet).
Cloud DNS can be used as the authoritative name server to resolve your domains on the Internet through public
DNS zones and use private DNS zones to perform internal DNS resolution for your private GCP networks.
One common use case for split-horizon DNS is when a server has both a private IP address on a local area network
(not reachable from most of the Internet) and a public address, that is, an address reachable across the Internet in
general.
By using split-horizon DNS, the same name can lead to either the private IP address or the public one, depending
on which client sends the query. This allows for critical local client machines to access a server directly through
the local network, without the need to pass through a router. Passing through fewer network devices has the
twofold benefit of reducing the network latency and freeing up limited router bandwidth for traffic that requires the
Internet, for example, access to external or cloud-resident resources.
DNS Peering
In large Google Cloud environments, Shared VPC is a very scalable network design that lets an organization
connect resources from multiple projects to a common Virtual Private Cloud (VPC) network, so that they can
communicate with each other securely and efficiently using internal IPs. Typically shared by many application
teams, a central team (or platform team) often manages the Shared VPC’s networking configuration, while
application teams use the network resources to create applications in their own service projects.
In some cases, application teams want to manage their own DNS records (e.g., to create new DNS records to
expose services, update existing records, etc.). There’s a solution to support fine-grained IAM policies using Cloud
DNS peering. In this section, we will explore how to use it to give your application teams autonomy over their
DNS records while ensuring that the central networking team maintains fine-grained control over the entire
environment.
Imagine that you, as an application team (service project) owner, want to be able to manage your own application
(service project) DNS records without impacting other teams or applications. DNS peering is a type of zone in
Cloud DNS that allows you to send DNS requests from a specific subdomain (e.g., c.dariokart.com) to
another Cloud DNS zone configured in another VPC.
The example in Figure 6-3 shows a developer in project-c who needs to resolve a hostname with
subdomain (suffix) p.dariokart.com. The project has its own private zone, which contains DNS records for
domain names with suffix c.dariokart.com.
With DNS peering, you can create a Cloud DNS private peering zone and configure it to perform DNS lookups in
a VPC network where the records for that zone’s namespace are available.
The VPC network where the DNS private peering zone performs lookups is called the DNS producer network, as
indicated in Figure 6-3. The project that owns the producer network is called the producer project, referred to as
project-p in the figure.
The VPC network where DNS queries originate is called the DNS consumer network. The project that owns the
consumer network is called the consumer project, referred to as project-c in the same figure.
Figure 6-3 shows you how to create a DNS peering zone with the gcloud CLI.
First, as indicated in step a, the service account associated to the consumer network (vpc-c) must be granted the
roles/dns.peer role in the producer project, that is, project-p.
Next, as indicated in step b, the same service account must be granted the roles/dns.admin role in the
consumer project, that is, project-c.
Finally, you create a new managed private peering zone by running the gcloud dns managed-zones
create as indicated in Figure 6-3.
When the setup is completed, any DNS query to resolve a hostname with suffix p.dariokart.com, for
example, leaderboard.p.dariokart.com, is sent to the DNS private zone in the producer VPC, as shown
in step d.
Exam tipYou may wonder how a DNS private peering zone setup is any different than any other DNS private
zone. After all, the gcloud dns managed-zones create command shows no indication that the new
zone uses DNS peering. The answer is “hidden” in step a. By granting the roles/dns.peer IAM role to the
consumer service account, we are basically giving this principal access to target networks with DNS peering
zones. In fact, the only permission included in such IAM role is the permission
dns.networks.targetWithPeeringZone. Put differently, principals with the IAM permission
dns.networks.targetWithPeeringZone on the producer network’s project can establish DNS peering
between consumer and producer networks.
Cloud DNS peering is not to be confused with VPC peering, and it doesn’t require you to configure any
communication between the source and destination VPC. All the DNS flows are managed directly in the Cloud
DNS backend: each VPC talks to Cloud DNS, and Cloud DNS can redirect the queries from one VPC to the other.
So, how does DNS peering allow application teams to manage their own DNS records?
The answer is by using DNS peering between a Shared VPC and other Cloud DNS private zones that are managed
by the application teams. Figure 6-4 illustrates this setup.
For each application team that needs to manage its own DNS records, you provide them with
You can then configure DNS peering for the specific DNS subdomain to their dedicated Cloud DNS zone—you
just learned how to configure DNS peering with the steps (a-b-c-d) illustrated in Figure 6-3.
In the application team’s standalone VPC (dns-t1-vpc), they have Cloud DNS IAM permissions only on their
own Cloud DNS instance and can manage only their DNS records.
A central team, meanwhile, manages the DNS peering and decides which Cloud DNS instance is authoritative for
which subdomain, thus allowing application teams to only manage their own subdomain.
By default, all VMs that consume the Shared VPC use Cloud DNS in the Shared VPC as their local resolver.
This Cloud DNS instance answers for all DNS records in the Shared VPC by using DNS peering to the application
teams’ Cloud DNS instances or by forwarding to on-premises for on-premises records.
To summarize, shared-vpc acts as the DNS consumer network and dns-t1-vpc and dns-t2-vpc act as
the DNS producer networks.
Cloud DNS logging is disabled by default on each Google Cloud VPC network. By enabling monitoring of Cloud
DNS logs, you can increase visibility into the DNS names requested by the clients within your VPC network.
Cloud DNS logs can be monitored for anomalous domain names and evaluated against threat intelligence.
You should make sure that Cloud DNS logging is enabled for all your Virtual Private Cloud (VPC) networks using
DNS policies. Cloud DNS logging records queries that the name servers resolve for your Google Cloud VPC
networks, as well as queries from external entities directly to a public DNS zone. Recorded queries can come from
virtual machine (VM) instances, GKE containers running in the same VPC network, peering zones, or other
Google Cloud resources provisioned within your VPC.
To determine whether Cloud DNS logging is enabled for a VPC network, first determine the name of the DNS
policy associated to your VPC:
The command output should return the name of the associated DNS policy:
your-shared-vpc-dns-policy
Then, run the gcloud dns policies describe command as indicated as follows:
gcloud dns policies describe your-shared-vpc-dns-policy
--format="value(enableLogging)"
The command output should return the status of the Cloud DNS logging feature (True for enabled, False for
disabled).
You learned in the previous chapter that every VPC network has an “implied allow egress rule” firewall rule,
which permits outgoing connections (the other implied firewall rule blocks incoming connections). This firewall
rule alone is not enough for your VMs (or other compute resource instance types) to reach the Internet.
Wouldn’t it be nice for your internal VMs to reach the Internet without requiring to use an external IP
address?
That’s where Cloud NAT comes into play. Cloud NAT is a distributed, software-defined managed service, which
lets certain compute resources without external IP addresses create outbound connections to the Internet.
Architecture
Cloud NAT is not based on proxy VMs or network appliances. Rather, it configures the Andromeda software-
defined network that powers your VPC, so that it provides Source Network Address Translation (SNAT) for VMs
without external IP addresses. Cloud NAT also provides Destination Network Address Translation (DNAT) for
established inbound response packets. Figure 6-5 shows a comparison between traditional NAT proxies and
Google Cloud NAT.
Figure 6-5 Traditional NAT vs. Cloud NAT
With Cloud NAT, you achieve a number of benefits, when compared to a traditional NAT proxy. As you can see,
these benefits match the five pillars of the well-architected framework.
First and foremost, with Cloud NAT you achieve better security because your internal VMs (or other compute
resource instance types) are not directly exposed to potential security threats originating from the Internet, thereby
minimizing the attack surface of your workloads.
You also get higher availability because Cloud NAT is fully managed by Google Cloud. All you need to do is to
configure a NAT gateway on a Cloud Router, which provides the control plane for NAT, holding configuration
parameters that you specify.
Finally, you also achieve better performance and scalability because Cloud NAT can be configured to
automatically scale the number of NAT IP addresses that it uses, and it does not reduce the network bandwidth per
VM.
You can specify which subnets are allowed to use the Cloud NAT instance by selecting exactly one of these
flags:
--nat-all-subnet-ip-ranges, which allows all IP ranges of all subnets in the region, including
primary and secondary ranges, to use the Cloud NAT instance
--nat-custom-subnet-ip-ranges=SUBNETWORK[:RANGE_NAME],[…], which lets you specify a
list of the subnet’s primary and secondary IP ranges allowed to use the Cloud NAT instance
SUBNETWORK: Specifying a subnetwork name includes only the primary subnet range of the subnetwork.
SUBNETWORK:RANGE_NAME: Specifying a subnetwork and secondary range name includes only that
secondary range. It does not include the primary range of the subnet.
--nat-primary-subnet-ip-ranges, which allows only primary IP ranges of all subnets in the
region to use the Cloud NAT instance
When you create a Cloud NAT gateway, you can choose to have the gateway automatically allocate regional
external IP addresses. Alternatively, you can manually assign a fixed number of regional external IP addresses to
the gateway.
You can configure the number of source ports that each Cloud NAT gateway reserves to each VM for which it
should provide NAT services. You can also configure static port allocation, where the same number of ports is
reserved for each VM, or dynamic port allocation, where the number of reserved ports can vary between the
minimum and maximum limits that you specify.
For example, in Figure 6-5 the VM with (RFC 1918 IP address) IP3 always gets ports in the range 32,000–32,063;
the VM with IP4 always gets ports in the range 32,101–32,164; and the VM with IP45 always gets ports in the
range 32,300–32,363.
The VMs for which NAT should be provided are determined by the subnet IP address ranges that the gateway is
configured to serve.
Exam tipEach NAT IP address on a Cloud NAT gateway offers 64,512 TCP source ports and 64,512 UDP source
ports. TCP and UDP each support 65,536 ports per IP address, but Cloud NAT doesn’t use the first 1024 well-
known (privileged) ports.
Static Port Allocation
When you configure static port allocation, you specify a minimum number of ports per VM instance.
Because all VMs are allocated the same number of ports, static port allocation works best if all VMs have similar
Internet usage. If some VMs use more ports than others, the ports in the Cloud NAT gateway might be underused.
If Internet usage varies, consider configuring dynamic port allocation.
When you configure dynamic port allocation, you specify a minimum number of ports per VM instance and a
maximum number of ports per VM instance.
The NAT gateway automatically monitors each VM's port usage and “elastically” modifies the number of ports
allocated to each VM based on demand. You don't need to monitor the port usage or adjust the NAT gateway
configuration.
Customizing Timeouts
Cloud NAT uses predefined timeout settings based on the connection type.
A connection is a unique 5-tuple consisting of the NAT source IP address and source port tuple combined with a
unique destination 3-tuple.
Use the gcloud compute routers nats create command to create a NAT gateway with custom
timeout settings:
Cloud NAT logging allows you to log NAT connections and errors.
When you enable Cloud NAT logging, a single log entry can be generated for each of the following scenarios:
You can choose to log both kinds of events or only one or the other.
All logs are sent to Cloud Logging.
NoteDropped packets are logged only if they are egress (outbound) TCP and UDP packets. No dropped incoming
packets are logged. For example, if an inbound response to an outbound request is dropped for any reason, no
error is logged.
Enabling Cloud NAT Logging
To enable logging for an existing Cloud NAT instance, including address translation events and errors, use the --
enable-logging flag as follows:
where
To view NAT logs in JSON format and limit the output to ten entries:
{
insertId: "1the8juf6vab1t"
jsonPayload: {
connection: {
Src_ip: "10.0.0.1"
Src_port: 45047
Nat_ip: "203.0.113.17"
Nat_port: 34889
dest_ip : "198.51.100.142"
Dest_port: 80
Protocol: "tcp"
}
allocation_status: "OK"
Gateway_identifiers: {
Gateway_name: "my-nat-1"
router_name: "my-router-1"
Region: "europe-west1"
}
Endpoint: {
Project_id: "service-project-1"
Vm_name: "vm-1"
Region: "europe-west1"
Zone: "europe-west1-b"
}
Vpc: {
Project_id: "host-project"
Vpc_name: "network-1"
Subnetwork_name: "subnetwork-1"
}
Destination: {
Geo_location: {
Continent: "Europe"
Country: "France"
Region: "Nouvelle-Aquitaine"
City: "Bordeaux"
}
}
}
logName: "projects/host-project/logs/compute.googleapis.com%2Fnat_flows"
receiveTimestamp: "2018-06-28T10:46:08.123456789Z"
resource: {
labels: {
region: "europe-west1-d"
project_id: "host-project"
router_id: "987654321123456"
gateway_name: "my-nat-1"
}
type: "nat_gateway"
}
labels: {
nat.googleapis.com/instance_name: "vm-1"
nat.googleapis.com/instance_zone: "europe-west1-b"
nat.googleapis.com/nat_ip: "203.0.113.17"
nat.googleapis.com/network_name: "network-1"
nat.googleapis.com/router_name: "my-router-1"
nat.googleapis.com/subnetwork_name: "subnetwork-1"
}
timestamp: "2018-06-28T10:46:00.602240572Z"
}
Monitoring
Cloud NAT exposes key metrics to Cloud Monitoring that give you insights into your fleet's usage of NAT
gateways.
Metrics are sent automatically to Cloud Monitoring. There, you can create custom dashboards, set up alerts, and
query the metrics.
The following are the required Identity and Access Management (IAM) roles:
For Shared VPC users with VMs and NAT gateways defined in different projects, access to the VM level
metrics requires the roles/monitoring.viewer IAM role for the project of each VM.
For the NAT gateway resource, access to the gateway metrics requires the
roles/monitoring.viewer IAM role for the project that contains the gateway.
Cloud NAT provides a set of predefined dashboards that display activity across your gateway:
Open connections
Egress data processed by NAT (rate)
Ingress data processed by NAT (rate)
Port usage
NAT allocation errors
Dropped sent packet rate
Dropped received packet rate
You can also create custom dashboards and metrics-based alerting policies.
You also learned in the “Creating a Cloud NAT Instance” section how network administrators can create Cloud
NAT configurations and specify which subnets can use the gateway. By default, there are no limits to what subnets
the administrator creates or which of them can use a Cloud NAT configuration.
Configuring Network Packet Inspection
Network packet inspection is an advanced network monitoring capability that clones the traffic of specified VMs
in your VPC network and forwards it for examination.
Network packet inspection uses a technique called Packet Mirroring to capture all traffic and packet data,
including payloads and headers. The capture can be configured for both egress and ingress traffic, only ingress
traffic, or only egress traffic.
The mirroring happens on the virtual machine (VM) instances, not on the network. Consequently, Packet
Mirroring consumes additional bandwidth on the VMs.
Packet Mirroring is useful when you need to monitor and analyze your security status. Unlike VPC Flow Logs,
Packet Mirroring exports all traffic, not only the traffic between sampling periods. For example, you can use
security software that analyzes mirrored traffic to detect all threats or anomalies. Additionally, you can inspect the
full traffic flow to detect application performance issues.
To configure Packet Mirroring, you create and enable a packet mirroring policy that specifies the mirrored sources
and the collector destination of the traffic you need to monitor:
Mirrored sources are the VMs whose packets (ingress, egress, or both) need to be inspected. These can be
selected by specifying a source type, that is, any combination of the following: subnets, network tags, or VM
names.
Collector destination is an instance group which is configured as an internal TCP/UDP network load balancer
(type 8) backend. VMs in the instance group are referred to as collector instances.
An internal load balancer for Packet Mirroring is similar to other type 8 load balancers, except that the forwarding
rule must be configured for Packet Mirroring using the --is-mirroring-collector flag. Any nonmirrored
traffic that is sent to the load balancer is dropped.
Exam tipYou need to know a few constraints on packet mirroring policies. For a given packet mirroring policy:
(1) All mirrored sources must be in the same project, VPC network, and Google Cloud region. (2) Collector
instances must be in the same region as the mirrored sources’ region. (3) Only a single collector destination can
be used.
As you can see, there are a number of preliminary steps you need to complete in order to create a packet mirroring
policy. These include
1.Permissions: For Shared VPC topologies, you must have the compute.packetMirroringUser role
in the project where the collector instances are created and the compute.packetMirroringAdmin in
the project where the mirrored instances are created.
2.
Collector instances: You must create an instance group, which will act as the destination of your mirrored
traffic.
3.
Internal TCP/UDP network load balancer: You must create a type 8 load balancer, configured to use the
collector instances as backends.
4.
Firewall rules: Mirrored traffic must be allowed to go from the mirrored source instances to the collector
instances, which are the backends of the internal TCP/UDP network load balancer.
Upon completion of the four preliminary steps, you can create a packet mirroring policy using the command
gcloud compute packet-mirrorings create as explained in the following:
In the next section, you will learn some of the most relevant reference topologies you can use for your workloads’
network packet inspection requirements.
Packet mirroring can be configured in a number of ways, based on where you want the mirrored sources and
collector instances to be.
This is the simplest configuration because VPC colocation results in one single project, which owns mirrored
sources and collector instances. Figure 6-7 illustrates this topology.
Figure 6-7 Packet mirroring policy with source and collector in the same VPC
In Figure 6-7, the packet mirroring policy is configured to mirror subnet-mirrored and send mirrored traffic
to the internal TCP/UDP network load balancer configured in subnet-collector. Google Cloud mirrors the
traffic on existing and future VMs in the subnet-mirrored. This includes all traffic to and from the Internet,
on-premises hosts, and Google services.
In this reference topology (Figure 6-8), two packet mirroring policies are required because mirrored sources exist
in different VPC networks:
1.
policy-1 collects packets from VMs in subnet-mirrored-1 in the VPC1 network, owned by
Project1, and sends them to a collector internal load balancer in the subnet-collector subnet.
2.
policy-2 collects packets from VMs in subnet-mirrored-2 in the VPC2 network, owned by
Project2, and sends them to the collector internal load balancer in the subnet-collector subnet in
VPC1.
Figure 6-8 Packet mirroring policy with source and collector in peered VPCs
The two VPCs VPC1 and VPC2 are peered. All resources are located in the same region us-central1, which
complies with the third constraint you just learned in the previous exam tip.
The packet mirroring policy policy-1 is similar to the policy in Figure 6-7 in that policy-1 is configured to
collect traffic from subnet-mirrored-1 and send it to the forwarding rule of the internal load balancer in
subnet-collector—mirrored sources and collector instances are all in the same VPC.
However, this is not the case for policy-2 because this policy is configured with mirrored sources (all VMs in
subnet-mirrored-2) and collector instances (the backend VMs of the load balancer) in different VPC
networks.
As a result, policy-2 can be created by the owners of Project1 or the owners of Project2 under one
of the following conditions:
The owners of Project1 must have the compute.packetMirroringAdmin role on the network,
subnet, or instances to mirror in Project2.
The owners of Project2 must have the compute.packetMirroringUser role in Project1.
Collector Instances Located in Shared VPC Service Project
In the scenario illustrated in Figure 6-9, the collector instances (the internal TCP/UDP network load balancer’s
backend VMs) are in a service project that uses subnet-collector in the host project.
NoteThe collector instances are in a service project, which means they are billed to the billing account associated
to the service project, even though they consume subnet-collector in the host project. This is how Shared
VPC works.
In this reference topology, the packet mirroring policy has also been created in the service project and is
configured to mirror ingress and egress traffic for all VMs that have a network interface in subnet-mirrored.
Exam tipIn this topology, service or host project users can create the packet mirroring policy. To do so, users must
have the compute.packetMirroringUser role in the service project where the collector destination is
located. Users must also have the compute.packetMirroringAdmin role on the mirrored sources.
In the scenario illustrated in Figure 6-10, we moved the collector instances to the host project.
Figure 6-10 Packet mirroring policy with collector in the host project
This reference topology is a perfect use case of a Shared VPC for what it was intended to do. You learned in
Chapter 2 that the idea of a Shared VPC is all about separation of duties, by letting developers manage their own
workloads in their own service project without worrying about network setups, and network engineers manage the
network infrastructure in the host project. Packet inspection is a network concern. As a result, it makes sense to let
network engineers own the collector instances (i.e., the backend VMs along with all the internal TCP/UDP
network load balancer resources) and the packet mirroring policies in the host project.
Exam tipIn this topology, service or host project users can create the packet mirroring policy. To do so, users in
the service project must have the compute.packetMirroringUser role in the host project. This is because
the collector instances are created in the host project. Alternatively, users in the host project require the
compute.packetMirroringAdmin role for mirrored sources in the service projects.
Exam tipIf you need to mirror more than one network interface (NIC) of a multi-NIC VM, you must create one
packet mirroring policy for each NIC. This is because each NIC connects to a unique VPC network.
Capturing Relevant Traffic Using Packet Mirroring Source and Traffic Filters
You learned in the “Configuring Packet Mirroring” section how to create a packet mirroring policy with the
gcloud compute packet-mirrorings create command.
To limit the amount of packets that need to be inspected, it is always a good practice to leverage the filtering
flags, which we describe again for your convenience:
--filter-cidr-ranges: One or more IP CIDR ranges to mirror. You can provide multiple ranges in a
comma-separated list.
--filter-cidr-protocols: One or more IP protocols to mirror. Valid values are tcp, udp, icmp, esp,
ah, ipip, sctp, or an IANA protocol number. You can provide multiple protocols in a comma-separated list. If
the filter-protocols flag is omitted, all protocols are mirrored.
--filter-cidr-direction = ingress | egress | both (default)
A proper use of the filters will also save you money in egress cost.
Routing and Inspecting Inter-VPC Traffic Using Multi-NIC VMs (e.g., Next-Generation
Firewall Appliances)
A common use case is to inspect bidirectional traffic between two VPC networks by leveraging a group of network
virtual appliances, that is, multi-NIC VMs.
Exam tipA network interface card (NIC) can be connected to one and one only (VPC) network.
In this use case, the multi-NIC VMs are configured as backend instances in a managed instance group. These
multi-NIC VMs can be commercial solutions from third parties or solutions that you build yourself.
The managed instance group (MIG) is added to the backend service of an internal TCP/UDP network load
balancer (referenced by the regional internal forwarding rule ilb-a). See Figure 6-11.
Since each backend VM has two NICs, and since each NIC maps one to one to exactly one VPC, the same group
of VMs can be used by the backend service of another internal TCP/UDP network load balancer (referenced in
Figure 6-11 by the regional internal forwarding rule ilb-b).
In the VPC network called vpc-a, the internal network TCP/UDP load balancer referenced by the regional
internal forwarding rule ilb-a distributes traffic to the nic0 network interface of each VM in the backend MIG.
Likewise, in the VPC network called vpc-b, the second internal network TCP/UDP load balancer referenced
by the regional internal forwarding rule ilb-b distributes traffic to a different network interface, nic1.
As you can see, this is another way to let two VPCs exchange traffic with each other. This is achieved by
leveraging two custom static routes, whose next hop is the forwarding rule of their internal TCP/UDP network
load balancer and whose destination is the CIDR block of the subnet in the other VPC.
Exam tipThe multi-NIC VMs must be allowed to send and receive packets with nonmatching destination or
source IP addresses. This can be accomplished by using the --can-ip-forward flag in the gcloud
compute instances create command:
https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--
can-ip-forward.
Configuring an Internal Load Balancer As a Next Hop for Highly Available Multi-NIC VM
Routing
You learned in the “Network Load Balancers” section in Chapter 5 that an internal TCP/UDP network load
balancer (type 8) is a regional, software-defined, pass-through (layer 4) load balancer that enables you to run and
scale your services behind an internal (RFC 1918) IP address.
With a group of multi-NIC VMs identically configured like in an instance group, you can create an internal
TCP/UDP network load balancer in each VPC a VM’s NIC is attached to—as mentioned in the exam tip, a NIC
can be attached to one, and one only, VPC. Then, you can create in each VPC a custom static route with your load
balancer as the next hop and the CIDR of another VPC as destination.
For example, if all your backend VMs have two NICs each—say nic0 and nic1—you can create two internal
TCP/UDP network load balancers, one in the first VPC (attached to nic0) and the other in the second VPC
(attached to nic1).
1.
Use case 1: To implement a highly available NAT, as explained in Chapter 3 (see Figure 3-94).
2.Use case 2: To integrate third-party appliances in a highly available, scaled-out manner. These can act as
gateways or as firewalls with advanced packet inspection capabilities (e.g., Intrusion Prevention Systems—
IPS), which can help you improve your workloads’ security posture.
In this section, we’ll walk you through the high-level process to configure two internal TCP/UDP network load
balancers to address use case 2.
This can be achieved by building a solution that sends bidirectional traffic through two load balancers that use
the same group of multi-NIC VMs as backends, as illustrated in Figure 6-12. Let’s get started.
This reference topology uses two custom-mode VPC networks named vpc-a and vpc-b, each with one subnet.
Each backend VM has two network interfaces, one attached to each VPC network (nic0 attached to VPC vpc-a,
nic1 attached to VPC vpc-b).
The subnets, subnet-a and subnet-b, use the 10.13.1.0/24 and 10.15.1.0/24 primary IP address
ranges, respectively, and they both reside in the us-central1 region.
Exam tipIn this reference topology, the subnets attached to each NIC must all share the same region, for example,
us-central1, because VMs are zonal resources.
fw-allow-vpc-a-from-both: An ingress rule, applicable to all targets in the vpc-a network. This rule
allows traffic from sources in both the 10.13.1.0/24 and 10.15.1.0/24 IP address ranges. These two
ranges cover the primary internal IP addresses of VMs in both subnets.
fw-allow-vpc-b-from-both: An ingress rule, applicable to all targets in the vpc-b network. This rule
allows traffic from sources in both the 10.13.1.0/24 and 10.15.1.0/24 IP address ranges. These two
ranges cover the primary internal IP addresses of VMs in both subnets.
fw-allow-vpc-a-ssh: An ingress rule applied to the VM instances in the vpc-a VPC network. This rule
allows incoming SSH connectivity on TCP port 22 from any address.
fw-allow-vpc-b-ssh: An ingress rule applied to the VM instances in the vpc-b VPC network. This rule
allows incoming SSH connectivity on TCP port 22 from any address.
fw-allow-vpc-a-health-check: An ingress rule for the backend VMs that are being load balanced.
This rule allows traffic from the Google Front Ends (130.211.0.0/22 and 35.191.0.0/16).
fw-allow-vpc-b-health-check: An ingress rule for the backend VMs that are being load balanced.
This rule allows traffic from the Google Front Ends (130.211.0.0/22 and 35.191.0.0/16).
This reference topology uses an instance template, which is a resource necessary to create a managed instance
group in us-central1. The instance template uses the iptables software as a third-party virtual appliance,
which enables the multi-NIC configuration.
In the order:
1.Create a startup script that will install the iptables software on any backend VM. This script named confi
will be passed to the gcloud compute instance-templates create command using the --metad
flag in the next step:
#!/bin/bash
# Enable IP forwarding:
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/20-iptables.conf
# Read VM network configuration:
md_vm="http://metadata.google.internal/computeMetadata/v1/instance/"
md_net="$md_vm/network-interfaces"
nic0_gw="$(curl $md_net/0/gateway -H "Metadata-Flavor:Google" )"
nic0_mask="$(curl $md_net/0/subnetmask -H "Metadata-Flavor:Google")"
nic0_addr="$(curl $md_net/0/ip -H "Metadata-Flavor:Google")"
nic0_id="$(ip addr show | grep $nic0_addr | awk '{print $NF}')"
nic1_gw="$(curl $md_net/1/gateway -H "Metadata-Flavor:Google")"
nic1_mask="$(curl $md_net/1/subnetmask -H "Metadata-Flavor:Google")"
nic1_addr="$(curl $md_net/1/ip -H "Metadata-Flavor:Google")"
nic1_id="$(ip addr show | grep $nic1_addr | awk '{print $NF}')"
# Source based policy routing for nic1
echo "100 rt-nic1" >> /etc/iproute2/rt_tables
sudo ip rule add pri 32000 from $nic1_gw/$nic1_mask table rt-nic1
sleep 1
sudo ip route add 35.191.0.0/16 via $nic1_gw dev $nic1_id table rt-nic1
sudo ip route add 130.211.0.0/22 via $nic1_gw dev $nic1_id table rt-nic1
# Use a web server to pass the health check for this example.
# You should use a more complete test in production.
sudo apt-get update
sudo apt-get install apache2 -y
sudo a2ensite default-ssl
sudo a2enmod ssl
echo "Example web page to pass health check" | \
tee /var/www/html/index.html
sudo systemctl restart apache2
2.
Create a common instance template named third-party-template-multinic, which will be used to c
new VMs in both the vpc-a and vpc-b VPC networks, when an autoscaling event is triggered:
Exam tip The --can-ip-forward flag is required for the instance template creation. This setting lets eac
backend VM forward packets with any source IP in the vpc-a and vpc-b VPCs, not just the ones whose sour
matches one of the VM’s NICs.
3.
Create a common managed instance group named third-party-instance-group that will also be used
two backend services, one in the vpc-a and the other one in the vpc-b VPC networks:
You learned the components of a regional internal TCP/UDP network load balancer (type 8) in Chapter 5
(Figure 5-51).
In this reference topology, there are two type 8 load balancers, one in the vpc-a VPC and another one in the
vpc-b VPC.
1.
Create a new HTTP health check named hc-http-80 to test TCP connectivity to the VMs on port 80:
2.
Use the previously created health check to create two internal backend services in the us-central1
region: one named backend-service-a in the vpc-a VPC, and the other one named backend-
service-b in the vpc-b VPC (see Figure 6-12).
3.Add to each of the two backend services the managed instance groups you created in step 3 (third-
party-instance-group), which contains the third-party virtual appliances as backends:
4.
Create two regional, internal forwarding rules: one associated with the subnet-a and the other one
associated with the subnet-b. Connect each forwarding rule to its respective backend service, that is,
backend-service-a and backend-service-b:
Creating the Custom Static Routes That Define the Load Balancers As the Next Hops
1.This is the key configuration step that enables routing between the two VPCs, resulting in a fully integrated,
highly available solution with the third-party multi-NIC appliances:
NoteWith the optional --tags flag, one or more network tags can be added to the route to indicate that the route
applies only to the VMs with the specified tag. Omitting this flag tells Google Cloud that the custom static route
applies to all VMs in the specified VPC network, whose value is set using the --network flag. In this example,
the route applies to all VMs in each VPC. Remember, routes (just like firewall rules) are global resources that are
defined at the VPC level.
This last step concluded the actual configuration of this reference topology. Let’s validate the setup.
Let’s now create a VM with the IP address 10.13.1.70 in the subnet-a (10.13.1.0/24). The creation of
the VM installs the Apache web server, which will serve incoming traffic on the TCP port 80:
Similarly, let’s create another VM with the IP address 10.15.1.70 in the subnet-b (10.15.1.0/24). The
creation of the VM installs the Apache web server, which will serve incoming traffic on the TCP port 80:
Before testing, let’s make sure both internal TCP/UDP network load balancers are healthy:
You should see a message that confirms both load balancers are in a healthy status.
Connect to vm-a using the SSH protocol and try to “curl” to vm-b:
Connect to vm-b using the SSH protocol and try to “curl” to vm-a:
Exam Questions
You are migrating to Cloud DNS and want to import your BIND zone file.
A.
gcloud dns record-sets import ZONE_FILE --zone MANAGED_ZONE
B.
gcloud dns record-sets import ZONE_FILE --replace-origin-ns --zone
MANAGED_ZONE
C.
gcloud dns record-sets import ZONE_FILE --zone-file-format --zone
MANAGED_ZONE
D.
gcloud dns record-sets import ZONE_FILE --delete-all-existing --zone
MANAGED ZONE
Rationale
A is not correct because the default behavior of the command is to expect ZONE_FILE in YAML format.
B is not correct because the --replace-origin-format flag indicates that NS records for the origin of a
zone should be imported if defined, which is not what the question asked.
C is CORRECT because the --zone-file-format flag indicates that the input records file is in BIND
zone format. If omitted, the ZONE_FILE is expected in YAML format.
D is not correct because the --delete-all-existing flag indicates that all existing record sets should be
deleted before importing the record sets in the records file, which is not what the question asked.
You decide to set up Cloud NAT. After completing the configuration, you find that one of your instances is not
using the Cloud NAT for outbound NAT.
A.
The instance has been configured with multiple interfaces.
B. An external IP address has been configured on the instance.
C.
You have created static routes that use RFC 1918 ranges.
D.
The instance is accessible by a load balancer external IP address.
Rationale
A is not correct because the fact that the instance uses multi-NIC is not related with its inability to use the
Cloud NAT for outbound NAT.
B is CORRECT because the existence of an external IP address on an interface always takes precedence
and always performs one-to-one NAT, without using Cloud NAT.
C is not correct because the custom static routes don’t use the default Internet gateway as the next hop.
D is not correct because the question asked to select the cause of the inability of the instance to use the Cloud
NAT for outbound traffic. However, this answer describes a scenario for inbound traffic.
Your organization uses a hub and spoke architecture with critical Compute Engine instances in your Virtual Private
Clouds (VPCs). You are responsible for the design of Cloud DNS in Google Cloud. You need to be able to resolve
Cloud DNS private zones from your on-premises data center and enable on-premises name resolution from your
hub and spoke VPC design.
A.
Configure a private DNS zone in the hub VPC, and configure DNS forwarding to the on-premises server.
Then configure DNS peering from the spoke VPCs to the hub VPC.
B.
Configure a DNS policy in the hub VPC to allow inbound query forwarding from the spoke VPCs. Then
configure the spoke VPCs with a private zone, and set up DNS peering to the hub VPC.
C.
Configure a DNS policy in the spoke VPCs, and configure the on-premises DNS as an alternate DNS
server. Then configure the hub VPC with a private zone, and set up DNS peering to each of the spoke
VPCs.
D.
Configure a DNS policy in the hub VPC, and configure the on-premises DNS as an alternate DNS server.
Then configure the spoke VPCs with a private zone, and set up DNS peering to the hub VPC.
Rationale
A is not correct because the answer does not allow to resolve GCP hostnames from on-premises, which is one
of the two requirements.
B is CORRECT because both requirements are
met.https://cloud.google.com/dns/docs/best-practices#hybrid-architecture-
using-hub-vpc-network-connected-to-spoke-vpc-networks
C and D are not correct because you don’t need to configure the on-premises DNS as an alternate DNS server to
meet the requirements.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification
https://doi.org/10.1007/978-1-4842-9354-6_7
It’s no secret that every large enterprise is moving away from its corporate
data centers and has invested in a cloud adoption program.
That’s why Google Cloud has developed a number of offerings that let your
company’s data centers (or your local development environment) connect to
Google Cloud in a variety of different ways, ranging from solutions that
prioritize performance, reliability, and reduced latencies to others that
prioritize cost savings and easy setups.
In this chapter, you will learn what these connectivity offerings are, how to
configure them, and most importantly how to choose which one(s) best suits
the requirements for your workload.
Cloud Interconnect connections are called circuits and deliver internal (RFC
1918) IP address communication, that is, internal IP addresses are directly
accessible from both networks.
Exam tipCloud Interconnect circuits do not traverse the Internet and, by
default, do not encrypt data in transit.
Cloud Interconnect comes in two “flavors”:
Prerequisites
How It Works
connectivity/docs/interconnect/concepts/dedicated-overview
VLAN Attachments
Billing for VLAN attachments starts when you create them and stops when
you delete them.
1.
Ordering a Dedicated Interconnect connection: Submit an order,
specifying the details of your Interconnect connection. Google then
emails you an order confirmation. After your resources have been
allocated, you receive another email with your LOA-CFAs (Letter of
Authorization and Connecting Facility Assignment).
2.
Retrieving LOA-CFAs: Send the LOA-CFAs to your vendor. They
provision the connections between the Google peering edge and your
on-premises network. Google automatically starts testing the light
levels on each allocated port after 24 hours.
3.Testing the connection: Google sends you automated emails with
configuration information for two different tests. First, Google sends
an IP address configuration to test light levels on every circuit in an
Interconnect connection. After those tests pass, Google sends the final
IP address configuration to test the IP connectivity of each
IP address configuration to test the IP connectivity of each
The process is illustrated in Figure 7-2, with emphasis on the gcloud CLI
commands you need to know for the exam.
Figure 7-2 Process to configure a Dedicated Interconnect connection
NoteThe link type that you select when you create an Interconnect
connection cannot be changed later. For example, if you select a 10 Gbps
link type and need a 100 Gbps link type later, you must create a new
Interconnect connection with the higher capacity.
To create a Dedicated Interconnect connection, use the gcloud compute
interconnects create as indicated as follows:
All the automated emails are sent to the NOC (Network Operations Center)
technical contact and the email address of the Google Account used when
ordering the Interconnect connection. You can also get your LOA-CFAs by
using the Google Cloud console.
You can use the Interconnect connection only after your connections have
been provisioned and tested for light levels and IP connectivity.
Retrieving LOA-CFAs
After you order a Dedicated Interconnect connection, Google sends you and
the NOC (technical contact) an email with your Letter of Authorization and
Connecting Facility Assignment (LOA-CFA) (one PDF file per connection).
You must send these LOA-CFAs to your vendor so that they can install your
connections. If you don’t, your connections won’t get connected.
If you can’t find the LOA-CFAs in your email, retrieve them from the
Google Cloud console (in the Cloud Interconnect page, select Physical
connections). This is one of the very few operations that require the use of
the console. You can also respond to your order confirmation email for
additional assistance.
Google polls its edge device every 24 hours, checking for a light on the port
to your on-premises router. Receiving light indicates that your connection
has been established. After detecting this light, Google sends you an email
containing an IP address that Google uses to ping your on-premises router to
test the circuit.
You must configure the interface of your on-premises router with the correct
link-local IP address and configure LACP (Link Aggregation Control
Protocol) on that interface. Even though there is only one circuit in your
Interconnect connection, you must still use LACP.
Apply the test IP address that Google has sent you to the interface of your
on-premises router that connects to Google. For testing, you must configure
this interface in access mode with no VLAN tagging.
After a successful test, Google sends you an email notifying you that your
connection is ready to use.
If a test fails, Google automatically retests the connection once a day for a
week.
After all tests have passed, your Interconnect connection can carry traffic,
and Google starts billing it. However, your connection isn’t associated with
any Google Virtual Private Cloud (VPC) networks. The next step will show
you how to attach a VPC network to your Dedicated Interconnect
connection.
Creating a VLAN Attachment
VLAN attachments are a way to tell your Cloud Router which VPC network
is allowed to connect to your on-premises networks.
There are a few checks you need to complete before creating a VLAN
attachment in your Dedicated Interconnect connection.
First, your Dedicated Interconnect connection must have passed all tests and
must be ready to use. From a cost standpoint, billing for VLAN attachments
starts when you create them and stops when you delete them.
compute.interconnectAttachments.create
compute.interconnectAttachments.get
compute.routers.create
compute.routers.get
compute.routers.update
roles/owner
roles/editor
roles/compute.networkAdmin
Third, you must have an existing Cloud Router in the VPC network and
region that you want to reach from your on-premises network—Cloud
Router is a regional resource. If you don’t have an existing Cloud Router,
you must create one.
Exam tipThe Cloud Router can use any private autonomous system number
(64512–65535 or 4200000000–4294967294) or the Google public ASN, that
is, 16550.
After all three checks are completed, you can create a VLAN attachment
using the following command:
If you don’t specify a region, you may be prompted to enter one. Again, this
is because a Cloud Router is a regional resource, and a VLAN attachment is
always associated to a Cloud Router.
Exam tipThe BGP IP CIDR blocks that you specify as values of the --
candidate-subnets flag must be unique among all Cloud Routers in
all regions of a VPC network.
The optional --bandwidth flag denotes the maximum provisioned
capacity of the VLAN attachment. In the example, its value is set to 400
Mbps. As of the time writing this book (April 2023), you can only choose
from the following discrete list: 50m (50 Mbps), 100m, 200m, 300m, 400m,
500m, 1g, 2g, 5g, 10g (default), 20g, 50g (50 Gbps). This ability to “tweak”
the capacity of a VLAN attachment is possible because the LACP protocol is
required, as we observed in the previous section.
The optional --vlan flag denotes the VLAN ID for this attachment and
must be an integer in the range 2–4094. You cannot specify a VLAN ID that
is already in use on the Interconnect connection. If your VLAN ID is in use,
you are asked to choose another one. If you don’t enter a VLAN ID, an
unused, random VLAN ID is automatically selected for the VLAN
attachment.
You will need these values to configure your Cloud Router and your on-
premises router. Here is an example of an output:
cloudRouterIpAddress: 169.254.180.81/29
creationTimestamp: '2022-03-13T10:31:40.829-07:00
customerRouterIpAddress: 169.254.180.82/29
id: '7'
interconnect: https://www.googleapis.com/compute/v
kind: compute#interconnectAttachment
name: my-attachment
operationalStatus: ACTIVE
privateInterconnectInfo:
tag8021q: 1000
region: https://www.googleapis.com/compute/v1/proj
router: https://www.googleapis.com/compute/v1/proj
Associate your newly created VLAN attachment to your Cloud Router
by adding an interface that connects to it. The interface IP address is
automatically configured using your attachment’s
cloudRouterIpAddress:
Last, associate a BGP peer to your Cloud Router by adding the customer
router to the newly added interface:
The first three flags, that is, --interface, --peer-asn, and --peer-
name, are mandatory. This makes sense because you are adding a peer
router to your Cloud Router’s interface, which is associated to your VLAN
attachment. As a result, you must provide at a minimum the interface name,
your peer ASN, and your peer name.
For the peer ASN, use the same number that you used to configure your on-
premises router. The peer IP address is automatically configured using your
attachment’s customerRouterIpAddress.
Exam tipBy default, any VPC network can use Cloud Interconnect. To
control which VPC networks can use Cloud Interconnect, you can set an
organization policy.
After you create a VLAN attachment, you need to configure your on-
premises router to establish a BGP session with your Cloud Router. To
configure your on-premises router, use the VLAN ID, interface IP address,
and peering IP address provided by the VLAN attachment. You can
optionally configure your BGP sessions to use MD5 authentication. If you
added MD5 authentication to the BGP session on Cloud Router, you must
use the same authentication key when you configure BGP on your on-
premises router.
The setup varies based on what topology you want to use:
Figure 7-3 illustrates the simplest setup at a logical level, using a layer 3
topology.
Figure 7-3 Reference Dedicated Interconnect layer 3 topology. Portions of this page are reproduced under the CC-BY license
There are scenarios where your data center is in a location that can’t
physically reach a Dedicated Interconnect colocation facility. You still want
to use the benefits of private, broadband connectivity, but you are limited by
the geography of your company’s data centers. That’s where Partner
Interconnect comes into play.
Prerequisites
1.
Supported service provider: You must select a supported service
provider to establish connectivity between their network and your on-
premises network. The list of supported service providers is available at
https://cloud.google.com/network-
connectivity/docs/interconnect/concepts/service-
providers#by-location.
2.
Cloud Router: You must have a Cloud Router in the region where your
selected service provider operates.
How It Works
You select a service provider from the previous list and establish
connectivity.
Next, you create a VLAN attachment in your Google Cloud project, but this
time you specify that your VLAN attachment is for a Partner Interconnect
connection. This action generates a unique pairing key that you use to
request a connection from your service provider. You also need to provide
other information such as the connection location and capacity.
connectivity/docs/interconnect/concepts/partner-overview
VLAN Attachments
1.
Establishing connectivity with a supported service provider: Select
a Google Cloud supported service provider from the list of supported
service providers, and establish connectivity between your on-
premises network(s) and the supported service provider.
2.Creating a VLAN attachment: Create a VLAN attachment for a
Partner Interconnect connection, which results in a pairing key that
you share with your service provider as previously discussed.
3.
Ordering a connection to Google Cloud from your service
provider: Go to your service provider portal and order a connection to
Google Cloud by submitting the pairing key and other connection
details, that is, the connection capacity and location. Wait until your
service provider configures your connection; they must check that they
can serve your requested capacity. After their configuration is
complete, you receive an email notification from Google.
4.
Activating your connection: After the service provider configures
your connection, you must activate it. Activating the connection and
checking its activation status enables you to verify that you established
connectivity with the expected service provider.
5.
Configuring on-premises routers: For layer 2 (data link layer)
connections, you must establish a BGP session between the VPC
network’s Cloud Router in your region and your on-premises router.
For layer 3 connections (network layer), the service provider
establishes a BGP session with the VPC network’s Cloud Router in
your region. This configuration is automated and doesn’t require any
action from you.
The process is illustrated in Figure 7-5, with emphasis on the gcloud CLI
commands you need to know for the exam.
Figure 7-5 Process to configure a Partner Interconnect connection
This is the first “link” in the connection chain. Your company may have
already connectivity with a supported service provider, in which case you
can move to the next step. Otherwise, you must establish connectivity with
your selected partner. This process may take a few weeks.
For Partner Interconnect, the only item in the checklist you must complete—
in addition to the necessary permissions and the connectivity to a selected
service provider—is to have an existing Cloud Router in the VPC network
and region that you want to reach from your on-premises network. If you
don’t have an existing Cloud Router, you must create one.
Exam tipUnlike Dedicated Interconnect, Partner Interconnect requires that
your Cloud Router uses the Google public ASN, that is, 16550
(www.whatismyip.com/asn/16550/).
It is best practice to utilize multiple VLAN attachments into your VPC
network to maximize throughput and increase cost savings. For each BGP
session, Google Cloud recommends using the same MED values to let the
traffic use equal-cost multipath (ECMP) routing over all the configured
VLAN attachments.
adminEnabled: false
edgeAvailabilityDomain: AVAILABILITY_DOMAIN_1
creationTimestamp: '2017-12-01T08:29:09.886-08:00
id: '7976913826166357434'
kind: compute#interconnectAttachment
labelFingerprint: 42WmSpB8rSM=
name: my-attachment
pairingKey: 7e51371e-72a3-40b5-b844-2e3efefaee59/
region: https://www.googleapis.com/compute/v1/proj
router: https://www.googleapis.com/compute/v1/proj
selfLink: https://www.googleapis.com/compute/v1/p
state: PENDING_PARTNER
type: PARTNER
Remember not to share the pairing key with anyone you don’t trust. The
pairing key is sensitive data.
PENDING_CUSTOMER.
Upon creating a VLAN attachment, you can contact the service provider of
your choice serving your region.
The pairing key resulting from the creation of the VLAN attachment
The VLAN attachment region
The VLAN attachment connection capacity
Since you want to minimize the latency between your Google Cloud
workloads and resources on-premises, the best practice is to choose the
region of a location that is close to your data center.
The capacity can range from 50 Mbps to 50 Gbps. Choose the capacity that
best suits your workloads’ performance and reliability requirements based on
your service provider’s offerings. Obviously, the higher the capacity, the
higher the price.
Upon ordering the connection, your service provider will start configuring
your VLAN attachment. When the configuration is complete, you get a
confirmation email from Google, and the state of your attachment changes
from PENDING_PARTNER to PENDING_CUSTOMER.
For layer 2 (data link) connections, you must add the ASN of your on-
premises router to your Cloud Router. Here is how you do it with the gcloud
CLI.
bgp:
advertiseMode: DEFAULT
asn: 16550
bgpPeers:
- interfaceName: auto-ia-if-my-attachment-c2c53a7
ipAddress: 169.254.67.201
managementType: MANAGED_BY_ATTACHMENT
name: auto-ia-bgp-my-attachment-c2c53a710bd6c2e
peerIpAddress: 169.254.67.202
creationTimestamp: '2018-01-25T07:14:43.068-08:00
description: 'test'
id: '4370996577373014668'
interfaces:
- ipRange: 169.254.67.201/29
linkedInterconnectAttachment: https://www.googl
managementType: MANAGED_BY_ATTACHMENT
name: auto-ia-if-my-attachment-c2c53a710bd6c2e
kind: compute#router
name: partner
network: https://www.googleapis.com/compute/v1/pr
region: https://www.googleapis.com/compute/v1/proj
selfLink: https://www.googleapis.com/compute/v1/p
The name is denoted in bold.
Next, use this name and the on-premises router ASN to update your
Cloud Router as follows:
Figure 7-6 illustrates the simplest setup at a logical level, using a layer 3
topology.
Figure 7-6 Reference Partner Interconnect layer 3 topology. Portions of this page are reproduced under the CC-BY license and
to/partner/configuring-onprem-routers
There are scenarios where you want to connect your VPC networks in
Google Cloud to your on-premises data center—or your local area network
(LAN)—with a limited budget in mind. In other use cases, you may
determine that your hybrid workloads don’t need the bandwidth offered by
the two Interconnect products you just learned about, which leverage circuits
ranging from 10 Gbps to 100 Gbps. In some situations, you may even realize
that your workloads have higher tolerance for latency.
For all these use cases, Google Cloud offers Cloud VPN as an alternative or a
complementary option to the Interconnect family of products.
In the former scenario, you have a budget in mind, and you don’t want to
absorb the cost incurred by using an Interconnect product.
In the latter scenario, you want to use Cloud VPN as a way to supplement
the connectivity offered by Dedicated Interconnect or Partner Interconnect in
order to increase the resilience of your hybrid connectivity.
The former is the recommended choice from Google, due to its higher
reliability (99.99% SLA) and its adaptability to topology changes.
HA VPN is a type of Cloud VPN that always utilizes at least two IPsec
tunnels.
Two tunnels are required to provide high availability: in the event one
becomes unresponsive, you have the other available to carry traffic.
These two IPsec tunnels connect your VPC network to another network,
which can be on-premises, in Google Cloud, or even in another Cloud, for
example, AWS.
Exam tipThe two IPsec tunnels must originate from the same region.
Put differently, with HA VPN you cannot have a tunnel originating from an
HA VPN gateway with a network interface in us-east1 and another
tunnel originating from another network interface (associated to the same
HA VPN gateway) in us-central1.
When you delete the HA VPN gateway, Google Cloud releases the IP
addresses for reuse.
https://cloud.google.com/network-connectivity/docs/vpn/concepts/topologies
Figure 7-7 shows the simplest HA VPN topology, with one HA VPN
gateway equipped with two network interfaces—each associated with its
own regional external IP address.
The HA VPN gateway connects to one peer on-premises router, which has
one external IP address (i.e., one network card).
The HA VPN gateway uses two tunnels, which are connected to the single
external IP address on the peer router.
In the upcoming sections, we will walk you through the process to create an
HA VPN gateway with the configuration described in Figure 7-7.
The first step is to create the actual HA VPN gateway Google Cloud
resource:
An example output is
Created [https://www.googleapis.com/compute/v1/pr
NAME INTERFACE0 INTERFACE1 NETWO
ha-vpn-gw-a 203.0.113.16 203.0.113.23 netwo
As expected, the output shows the two interfaces, each associated to its own
regional (us-central1), external IPv4 address.
Since your peer gateway operates on-premises, you cannot create this exact
resource from Google Cloud. What Google Cloud allows you to do is to
create a resource referred to as external-vpn-gateway, which
provides information to Google Cloud about your peer VPN gateway. This
resource is essentially the representation of your physical (or software-
defined) peer gateway on-premises. The following command creates the
representation in Google Cloud of your peer, on-premises gateway, equipped
with one network interface as per our configuration in Figure 7-7:
An example output is
Created [https://www.googleapis.com/compute/v1/pr
NAME INTERFACE0
peer-gw PEER_GW_IP_0
Next, you must create a cloud router for the region where your HA VPN
gateway operates, that is, us-central1:
Substitute as follows:
Exam tipThe Google ASN is used for all BGP sessions on the same Cloud
Router, and it cannot be changed later.
An example output is
Created [https://www.googleapis.com/compute/v1/pr
NAME REGION NETWORK
router-a us-central1 network-a
When creating IPsec tunnels, specify the peer side of the IPsec tunnels as the
external VPN gateway that you created earlier.
An example output is
Created [https://www.googleapis.com/compute/v1/pr
NAME REGION GATEWAY
tunnel-a-to-on-prem-if-0 us-central1 ha-vpn-gw
Created [https://www.googleapis.com/compute/v1/pr
NAME REGION GATEWAY
tunnel-a-to-on-prem-if-1 us-central1 ha-vpn-gw
ROUTER_INTERFACE_NAME_0 and
ROUTER_INTERFACE_NAME_1: A name for the Cloud Router BGP
interface; it can be helpful to use names related to the tunnel names
configured previously.
IP_ADDRESS_0 and IP_ADDRESS_1 (manual configuration): The
BGP IP address for the HA VPN gateway interface that you configure;
each tunnel uses a different gateway interface.
MASK_LENGTH: 30; each BGP session on the same Cloud Router
must use a unique /30 CIDR from the 169.254.0.0/16 block.
TUNNEL_NAME_0 and TUNNEL_NAME_1: The tunnel associated
with the HA VPN gateway interface that you configured.
AUTHENTICATION_KEY (optional): The secret key to use for MD5
authentication.
Substitute PEER_NAME_0 with a name for the peer VPN interface, and
substitute PEER_ASN with the ASN configured for your peer VPN gateway.
Substitute PEER_NAME_1 with a name for the peer VPN interface, and
substitute PEER_ASN with the ASN configured for your peer VPN gateway.
bgp:
advertiseMode: DEFAULT
asn: 65001
bgpPeers:
- interfaceName: if-tunnel-a-to-on-prem-if-0
ipAddress: 169.254.0.1
name: bgp-peer-tunnel-a-to-on-prem-if-0
peerAsn: 65002
peerIpAddress: 169.254.0.2
- interfaceName: if-tunnel-a-to-on-prem-if-1
ipAddress: 169.254.1.1
name: bgp-peer-tunnel-a-to-on-prem-if-1
peerAsn: 65004
peerIpAddress: 169.254.1.2
creationTimestamp: '2018-10-18T11:58:41.704-07:00
id: '4726715617198303502'
interfaces:
- ipRange: 169.254.0.1/30
linkedVpnTunnel: https://www.googleapis.com/com
name: if-tunnel-a-to-on-prem-if-0
- ipRange: 169.254.1.1/30
linkedVpnTunnel: https://www.googleapis.com/com
name: if-tunnel-a-to-on-prem-if-1
kind: compute#router
name: router-a
network: https://www.googleapis.com/compute/v1/
region: https://www.googleapis.com/compute/v1/p
selfLink: https://www.googleapis.com/compute/v1/
This completes the setup from Google Cloud. The final step is to configure
your peer VPN gateway on-premises. You will need assistance from your on-
premises network administrator to properly configure it and fully validate the
IPsec tunnels and their fault tolerance.
Dynamic routing (BGP) is still available for Classic VPN, but only for IPsec
tunnels that connect to third-party VPN gateway software running on Google
Cloud VM instances.
Table 7-2 shows a comparison between the two Cloud VPN types.
Table 7-2 HA VPN and Classic VPN comparison
Provides a
99.99% SLA
when
configured
SLA Provides a 99.9% SLA
with two
interfaces and
two external
IP addresses
Creation
External IP
of external
addresses
IP
created from a External IP addresses and
addresses
pool; no forwarding rules must be created
and
forwarding
forwarding
rules required
rules
Feature HA VPN Classic VPN
Two
tunnels
from one
Cloud
VPN Supported Not supported
gateway to
the same
peer
gateway
Known as the
API vpn- Known as the target-vpn-
resources gateway gateway resource
resource
Feature HA VPN Classic VPN
Supported
IPv6 (dual stack
Not supported
traffic IPv4 and IPv6
configuration)
• Design
requires
primary and
backup VPN
• Dynamic
routing
protocol
support (e.g.,
BGP)
• Need to
access
multiple
subnets or
networks at
the remote
site, across the
VPN
Policy-Based Routing
Route-Based Routing
You learned about Cloud Router in the “Dynamic Routes” section in Chapter
3.
Every time your workload requires dynamic routing capabilities, you need a
Cloud Router.
A Cloud Router also serves as the control plane for Cloud NAT. Cloud
Router provides BGP services for the following Google Cloud products:
Dedicated Interconnect
Partner Interconnect
HA VPN
Classic VPN with dynamic routing only
Router appliance (part of Network Connectivity Center)
As you learned in the previous section, you can also use Cloud Router to
connect two VPC networks in Google Cloud. In this scenario, you connect
the VPC networks by using two HA VPNs and two Cloud Routers, one HA
VPN and its associated Cloud Router on each network.
Exam tipDirect peering and carrier peering do not use Cloud Routers.
Border Gateway Protocol (BGP) Attributes (e.g., ASN, Route
Priority/MED, Link-Local Addresses)
You are not required to be an expert in BGP in order to pass the exam.
Cloud Router advertises VPC subnet routes and custom prefixes to its BGP
peers.
Unless you configure custom route advertisements, Cloud Router only
advertises VPC subnet routes. Custom route advertisements also allow you
to configure a Cloud Router to omit advertising VPC subnet routes.
The dynamic routing mode also controls how each Cloud Router applies
learned prefixes as custom dynamic routes in a VPC network.
The following sections describe the key attributes you need to know for the
exam.
If you are using Dedicated Interconnect, you are required to use either a
private ASN not already in use or the Google public ASN (i.e., 16550). A
private ASN ranges from 64512 to 65534 or from 4200000000 to
4294967294 inclusive.
If you are using Partner Interconnect, you must use the Google public ASN
to configure your Cloud Router.
Your on-premises ASN can be public or private.
The preceding command updates the base priority for routes advertised by
ROUTER_NAME to its BGP peer BGP_PEER_NAME with the integer number
B.
Notice in Figure 7-9 how the two Cloud Routers router-a and router-
b advertise prefixes of subnet routes in their own region only, that is, us-
east1.
Figure 7-10 shows the effect of updating them with global BGP routing
mode.
Figure 7-10 MED for VPCs configured with global BGP routing mode
This time, the two cloud routers router-a and router-b advertise
prefixes of subnet routes in all regions of their respective VPC.
As a result, both route tables are populated with prefixes of subnet routes.
Exam tipThe inter-region cost is an integer number between 201 and 9999,
inclusive. It is defined by Google and is specific to the exact combination of
the two regions, that is, the region of the subnet whose prefix is being
advertised (e.g., us-east1) and the region of the BGP peer router (e.g.,
us-central1). This number may vary over time based on factors such as
network performance, latency, distance, and available bandwidth between
regions.
When your BGP peer routers receive the advertised prefixes and their
priorities, they create routes that are used to send packets to your VPC
network.
BGP sessions for the following network connectivity products use link-local
IPv4 addresses in the 169.254.0.0/16 range as BGP peering IP addresses:
Router appliances use internal IPv4 addresses of Google Cloud VMs as BGP
IP addresses.
IPv6 Support
Cloud Router can exchange IPv6 prefixes, but only over BGP IPv4 sessions.
Exam tipCloud Router does not support BGP IPv6 sessions natively.
For Cloud Router to be able to exchange IPv6 prefixes, the subnets must be
configured to operate in dual stack mode (by using the flag --stack-
type=IPV4_IPV6 in the gcloud compute networks subnets
create/update commands), and you must enable IPv6 prefix exchange
in an existing BGP IPv4 session by toggling the --enable-ipv6 flag in
the gcloud compute routers update-bgp-peer command.
External IPv6 subnet ranges are not advertised automatically, but you can
advertise them manually by using custom route advertisements.
You can enable IPv6 prefix exchange in BGP sessions that are created for
HA VPN tunnels.
You learned in Chapter 2 that the VPC network is the only required argument
when you create a Cloud Router.
With custom route advertisement, you control which prefixes and which
subnet routes a Cloud Router can advertise to its BGP peers.
This control can be defined on a Cloud Router basis (gcloud compute
routers create/update) or on a BGP-peer basis (gcloud
compute routers update-bgp-peer) and can be achieved by
setting the flag --advertisement-mode to CUSTOM when you use the
verbs create, update, or update-bgp-peer on a Cloud Router.
1.
The list of prefixes you want your Cloud Router to advertise
2.
Whether you want all the subnet routes in your Cloud Router’s VPC to
be advertised
Resilience
It is best practice to enable Bidirectional Forwarding Detection (BFD) on
your Cloud Routers and on your peer BGP routers (on-premises or in other
clouds) if they support this feature.
Enabling BFD on both sides will make your network more resilient.
You can enable BFD on your Cloud Router by setting the --bfd-
session-initialization-mode flag to ACTIVE in the gcloud
compute routers add-bgp-peer/update-bgp-peer
commands, as shown in the following code snippet:
Exam tipYou don’t need to know all the BFD settings to pass the exam.
What you need to know instead is that enabling BFD is one way to achieve
resilience, and the way you do it is by setting the
BFD_SESSION_INITIALIZATION_MODE to ACTIVE. To learn how to
configure BFD in detail, use the Google Cloud documentation:
https://cloud.google.com/network-
connectivity/docs/router/concepts/bfd#bfd-settings.
Reliability
For high reliability, set up redundant routers and BGP peers, even if your on-
premises device supports graceful restart. In the event of nontransient
failures, you are protected even if one path fails.
Last, to ensure that you do not exceed Cloud Router limits, use Cloud
Monitoring to create alerting policies. For example, you can use the metrics
for learned routes to create alerting policies for the limits for learned routes.
High Availability
If graceful restart is not supported or enabled on your device, configure two
on-premises BGP devices with one tunnel each to provide redundancy. If you
don’t configure two separate on-premises devices, Cloud VPN tunnel traffic
can be disrupted in the event of a Cloud Router or an on-premises BGP
device failure.
Security
Enable MD5 authentication on your BGP peers, if they support this feature.
Exam Questions
You need to give each member of your network operations team least-
privilege access to create, modify, and delete Cloud Interconnect VLAN
attachments.
B.
Assign each user the compute.networkAdmin role.
C.
Give each user the following permissions only:
compute.interconnectAttachments.create,
compute.interconnectAttachments.get.
D.
Give each user the following permissions only:
compute.interconnectAttachments.create,
compute.interconnectAttachments.get,
compute.routers.create,
compute.routers.get,
compute.routers.update.
Rationale
A is incorrect because the editor role is too permissive. The editor role
contains permissions to create and delete resources for most Google
Cloud services.
B is CORRECT because it contains the minimum set of permissions
to create, modify, and delete Cloud Interconnect VLAN attachments.
You learned this in the “VLAN Attachments” section. The keyword
to consider in this question is the ability to delete VLAN attachments,
whose permission is included in the compute.networkAdmin role,
but is not included in the permissions in answer D.
C is incorrect because it doesn’t include the permission to delete VLAN
attachments.
D is incorrect because it doesn’t include the permission to delete VLAN
attachments.
A.
The default Internet gateway
B.
The IP address of the Cloud VPN gateway
C.
The name and region of the Cloud VPN tunnel
D.
The IP address of the instance on the remote side of the VPN tunnel
Rationale
You are in the early stages of planning a migration to Google Cloud. You
want to test the functionality of your hybrid cloud design before you start to
implement it in production. The design includes services running on a
Compute Engine virtual machine instance that need to communicate to on-
premises servers using private IP addresses. The on-premises servers have
connectivity to the Internet, but you have not yet established any Cloud
Interconnect connections. You want to choose the lowest cost method of
enabling connectivity between your instance and on-premises servers and
complete the test in 24 hours.
A.
Cloud VPN
B.
50 Mbps Partner VLAN attachment
C.
Dedicated Interconnect with a single VLAN attachment
D.
Dedicated Interconnect, but don’t provision any VLAN attachments
Rationale
A.
Log in to your partner’s portal and request the VLAN attachment
there.
B.
Ask your Interconnect partner to provision a physical connection to
Google.
C.
Create a Partner Interconnect–type VLAN attachment in the Google
Cloud console and retrieve the pairing key.
D.
Run gcloud compute interconnect attachments
partner update <attachment> / --region <region>
--admin-enabled.
Rationale
A.
VPC network in all projects
B. VPC network in the IT project
C.
VPC network in the host project
D.
VPC network in the sales, marketing, and IT projects
Rationale
This is the last chapter of our study. You’ve come a long way from the
beginning of this book, where you learned the tenets of “well architecting”
a Google Cloud network. You then learned in Chapter 3 how to implement
Virtual Private Cloud (VPC) networks. The concept of a VPC as a logical
routing domain is the basis of every network architecture in the cloud. As a
natural progression, in Chapters 5 and 6 you learned how to leverage the
wide spectrum of network services, which uniquely differentiate Google
Cloud from other public cloud service providers. Last, you learned in
Chapter 7 how to implement hybrid topologies—which are prevalent in any
sector—along with all considerations related to resilience, fault tolerance,
security, and cost.
Now that you have all your network infrastructure set up and running,
what’s next?
Well, you (and your team of Google Cloud professional network engineers)
are in charge of maintaining this infrastructure to make sure it operates in
accordance with the SLOs (Service Level Objectives) for your workloads.
In this chapter, you will learn how to use the products and services offered
by Google Cloud to assist you in this compelling task.
In this section, you will learn about Cloud Logging. You will understand
what it does, when to use it, and how to leverage its built-in features to
collect and explore logs, specifically for networking components.
Cloud Logging
Cloud Logging includes built-in storage for logs called log buckets, a user
interface called the Logs Explorer, and an API to manage logs
programmatically (see Figure 8-3). Cloud Logging lets you read and write
log entries, query your logs with advanced filtering capabilities, and control
how and where you want to forward your logs for further analysis or for
compliance.
Exam tipBy locking a log bucket, you are preventing any updates on the
bucket. This includes the log bucket’s retention policy. As a result, you
can't delete the bucket until every log in the bucket has fulfilled the bucket's
retention period. Also, locking a log bucket is irreversible.
You can also route, or forward, log entries to the following destinations,
which can be in the same Google Cloud project or in a different Google
Cloud project:
Log Types
Platform logs: These are logs written by the Google Cloud services you
use in your project. These logs can help you debug and troubleshoot
issues and help you better understand the Google Cloud services you’re
using. For example, VPC Flow Logs record a sample of network flows
sent from and received by VMs.
Component logs: These are similar to platform logs, but they are
generated by Google-provided software components that run on your
systems. For example, GKE provides software components that users
can run on their own VM or in their own data center. Logs are generated
from the user’s GKE instances and sent to a user’s cloud project. GKE
uses the logs or their metadata to provide user support.
Security logs: These logs help you answer “who did what, where, and
when” and are comprised of
Cloud Audit Logs, which provide information about administrative
activities and accesses within your Google Cloud resources. Enabling
audit logs helps your security, auditing, and compliance entities
monitor Google Cloud data and systems for possible vulnerabilities or
external data misuse, for example, data exfiltration.
Access Transparency logs, which provide you with logs of actions
taken by Google staff when accessing your Google Cloud content.
Access Transparency logs can help you track compliance with your
legal and regulatory requirements for your organization.
User-written logs: These are logs written by custom applications and
services. Typically, these logs are written to Cloud Logging by using one
of the following methods:
Ops agent or the Logging agent (based on the fluentd open source
data collector)
Cloud Logging API
Cloud Logging client libraries, for example, the gcloud CLI
Multi-cloud logs and hybrid cloud logs: These refer to logs from other
cloud service providers like Microsoft Azure, or AWS, and also logs
from your on-premises infrastructure.
Let’s see how you can read logs from your project.
In this example, I’ll create a Cloud Router (Figure 8-4), and I’ll show
you how to read logs from the _Default bucket (Figure 8-5).
Figure 8-5 Cloud Audit log entry (request) for router creation operation
As you can see, a Cloud Audit log was captured, which included detailed
information on who (gianni@dariokart.com) performed the action
(type.googleapis.com/compute.routers.insert) and when,
along with a wealth of useful metadata.
In the fourth rectangle, you can see the resulting Cloud Router resource
name.
The second page of the log entry (Figure 8-6) shows the response.
Figure 8-8 shows one of the two log entries (the last) for the delete
operation.
Figure 8-8 Cloud Audit log entry (request) for router deletion operation
Note in Figure 8-8 the use of the flag --freshness=t10m to retrieve the
latest log entries within the past ten minutes.
Before moving to the next section, and for the sake of completeness, I want
to quickly show you an alternative—equally expressive—approach to
reading logs by using the provided user interface, referred to as the Logs
Explorer.
NoteMy preference, and general recommendation for the exam, is that you
get very familiar with the gcloud CLI, instead of the tools offered by the
console, like Logs Explorer. This is because the exam is focused on gcloud,
and there’s a reason behind it. User interfaces available in the console
change frequently, whereas the gcloud CLI and the Google Cloud REST
APIs don’t change as frequently as user interfaces. Also, gcloud code is a
natural path into Infrastructure as Code (e.g., Terraform), which is strongly
encouraged. Nevertheless, there are a few compelling use cases where the
console is required because the gcloud CLI doesn’t offer the expected
functionality.
With that being said, in Figure 8-9 I included a screenshot of my Log
Explorer session during my troubleshooting of the global external HTTP(S)
load balancer (classic) with NEGs you learned in Chapter 5.
Cloud Monitoring
Metrics are grouped by cloud services (e.g., GCP services, AWS services),
by agents, and by third-party applications. For a comprehensive list of
metrics, visit
https://cloud.google.com/monitoring/api/metrics.
Cloud Monitoring also allows you to create custom metrics based on your
workloads’ unique business and technical requirements.
Using the metrics explorer and the monitoring query language, you can
analyze your workload’s metrics on the fly, discovering correlations, trends,
and abnormal behavior.
You can leverage these insights to build an overall view of health and
performance of your workloads’ code and infrastructure, making it easy to
spot anomalies using Google Cloud visualization products.
This is all great information, but you cannot just sit and watch, right?
You need a more proactive approach for your SRE team, so that when an
abnormal behavior (or an incident) is detected, you can promptly respond to
it.
This is when Cloud Monitoring alerts come into play. With alerts, you can
create policies on performance metrics, uptime, and Service Level
Objectives (SLOs). These policies will ensure your SRE team and network
engineers are promptly notified and are ready to respond when your
workloads don’t perform as expected.
Figure 8-10 summarizes the Cloud Monitoring capabilities.
In the next section, we’ll use a simple example to show you how Cloud
Monitoring can notify your SRE team and help them respond to incidents.
As per the exam objectives for this chapter, we will choose a network
Google Cloud resource—for the sake of simplicity, a route—we will set up
a custom metric specific to this resource, and we will monitor this resource
with an alerting policy.
Figure 8-11 A JSON file defining the filter for a custom metric
Finally, we will test the alerting policy by performing an action that triggers
an alert and notifies the selected recipients in the channel configured for the
alerting policy.
First, let’s create a custom metric that measures the number of times a route
was changed (created or deleted) in our shared VPC.
You can create a custom metric using the gcloud logging metrics
create command, which requires you to define your metric during the
creation process.
You can pass the definition of your custom metric by using the --log-
filter flag or the --config-from-file flag. The former accepts the
filter inline; the latter expects the path to a JSON or a YAML file such as
the one in Figure 8-11, which includes the metric definition. To learn how
this JSON or YAML file needs to be structured, visit
https://cloud.google.com/logging/docs/reference/v2
/rest/v2/projects.metrics#resource:-logmetric
Figure 8-11 shows our JSON file. Notice how the filter property is
expressed to tell the policy to raise an alert for resources whose type is a
gce_route and whenever a route is deleted or inserted:
1.
To whom a notification should be sent
2.
When to raise an alert for this metric
Note, the gcloud command in Figure 8-13 is in the alpha testing phase.
NoteAlpha is a limited availability test before releases are cleared for more
widespread use. Google’s focus with alpha testing is to verify functionality
and gather feedback from a limited set of customers. Typically, alpha
participation is by invitation and subject to pre-general availability terms.
Alpha releases may not contain all features, no SLAs are provided, and
there are no technical support obligations. However, alphas are generally
suitable for use in test environments. Alpha precedes beta, which precedes
GA (general availability).
The second item is addressed by defining a triggering condition when you
create the alerting policy.
Click the vertical ellipsis icon as indicated in Figure 8-14 with the
arrow, and select “Create alert from metric,” as indicated in Figure 8-15.
Figure 8-15 Creating an alert from metric
Last, scroll down, assign a name to the alerting policy, and click “Create
Policy,” as indicated in Figure 8-18.
In Figure 8-19, you can see our newly created alerting policy’s details.
Figure 8-19 A description of an alerting policy
Second, our policy has one condition, which is based on our custom metric.
The condition also uses a threshold, which tells you any count greater than
zero should trigger an alert.
This is formalized in the filter, the trigger, and the comparison
properties of the conditionThreshold object.
Now that you understand how to create a monitoring alerting policy and its
components, we can finally test it by deliberately creating an event that will
trigger an alert.
Our custom metric measures the number of times a route has changed
(created or deleted) in our shared VPC. The easiest way to test this policy is
by deleting an existing route. Let’s try!
In Figure 8-20, I first made sure that the route I wanted to delete existed in
our shared VPC. Upon confirming its existence, I used the gcloud
compute routes delete command to delete the custom static route
goto-restricted-apis.
Shortly after, I used the Cloud Monitoring dashboard to verify whether the
alerting policy detected a triggering event.
As you can see in Figure 8-21, the event was captured (as pointed to by
the arrow). The rectangle shows the condition threshold, as defined in our
alerting policy.
As a result, an incident was added (Figure 8-22), and an email alert was
sent to the recipient(s) of the notification channel, as shown in Figure 8-23.
Before moving to the next section, here’s a quick note about costs.
The good news is that there are no costs associated with using alerting
policies. For more information, visit
https://cloud.google.com/monitoring/alerts#limits.
This concludes our section about logging and monitoring network
components using the Cloud Operations suite. In the next section, you will
learn how to address security-related network operations.
A unique feature of Google Cloud firewall rules is that you can configure
the source and target of your firewall rules with a service account. This
alone is a powerful capability, because you can better segment sources and
targets of your network using an identity, instead of a network construct like
a CIDR block or a network tag.
Network firewall policies let you group firewall rules so that you can
update them all at once, effectively controlled by Identity and Access
Management (IAM) roles. These policies contain rules that can explicitly
deny or allow connections, as do Virtual Private Cloud (VPC) firewall
rules.
Hierarchical
Global
Regional
Figure 8-24 Network firewall policies overview. Portions of this page are reproduced under the CC-BY license and shared by
Google: https://cloud.google.com/firewall
Diagnosing and Resolving IAM Issues (e.g., Shared VPC,
Security/Network Admin)
Identity and Access Management (IAM) lets you create and manage
permissions for Google Cloud resources. IAM unifies access control for
Google Cloud services into a single pane of glass and presents a consistent
set of operations.
Policy Troubleshooter
A = gianni@dariokart.com
B = compute.subnetworks.delete
C = subnet-backend
Since I want to find out about allow and deny resource policies for
subnet-backend—not just the IAM (allow) policies, I am going to use
the Policy Troubleshooter from the console.
From the “IAM & Admin” menu in the left bar, select “Policy
Troubleshooter” and fill out the form as follows.
Figure 8-26 shows the selection of the permission for the same
principal.
Figure 8-26 Selecting the permission in Policy Troubleshooter form for gianni
If you remember the way our shared VPC was set up in Chapter 3, this
makes sense because principal gianni@dariokart.com was granted
the compute.networkAdmin role at the organization level (Figure 3-
57), which includes the permission to delete subnets.
This outcome makes sense as well, because as shown in Figure 8-32 the
principal samuele@dariokart.com has the editor role in the
backend-devs project and the compute.networkUser role in
subnet-backend. The latter role doesn’t include the permission to
delete subnets as shown in Figure 8-33.
How It Works
Connection draining uses a timeout setting on the load balancer’s backend
service, whose duration must be from 0 to 3600 seconds inclusive.
For the specified duration of the timeout, existing requests to the removed
VM or endpoint are given time to complete. The load balancer does not
send new TCP connections to the removed VM. After the timeout duration
is reached, all remaining connections to the VM are closed.
VPC Flow Logs collects a sample of network flows sent from and received
by VMs, including VMs used as GKE nodes. These samples can be used for
offline analysis including network monitoring, forensics, real-time security
analysis, and cost optimization.
VPC Flow Logs can be viewed in Cloud Logging and can be routed to any
supported destination sink.
How It Works
VPC Flow Logs collects samples of each VM’s TCP, UDP, ICMP, ESP
(Encapsulating Security Payload), and GRE (Generic Routing
Encapsulation) protocol flows.
When a flow sample is collected, VPC Flow Logs generates a log for the
flow. Each flow record is structured in accordance with a specific
definition.
If you want to sample flow logs on a multi-NIC VM, you must enable VPC
Flow Logs for any subnets attached to a NIC in the VM.
VPC Flow Logs are enabled at the subnet level, just like Private Google
Access.
Exam tipWhen enabled, VPC Flow Logs collects flow samples for all the
VMs in the subnet. You cannot pick and choose which VM should have
flow logs collected—it’s all or nothing.
You enable VPC Flow Logs using the --enable-flow-logs flag when
you create or update a subnet with the gcloud CLI. The following code
snippets show how to enable VPC Flow Logs at subnet creation and update
time, respectively:
Cost Considerations
As you can see in Figure 8-34, the two arrows ingest flow log samples into
Cloud Logging at a frequency specified by the
AGGREGATION_INTERVAL variable.
You can also control the amount of sampling (from zero to one inclusive)
with the SAMPLE_RATE variable, but if you are not careful, there may be
a large volume of data collected for each VM in your subnet. These may
result in significant charges.
Luckily, the console provides a view of the estimated logs generated per
day based on the assumption that the AGGREGATION_INTERVAL is the
default value of five seconds. The estimate is also based on data collected
over the previous seven days. You can use this estimated volume to have an
idea on how much enabling VPC Flow Logging would cost you.
You learned that VPC Flow Logs are collected and initially stored in Cloud
Logging. There are a number of ways to view logs in Cloud Logging as
shown in Figure 8-1. What really matters for the purpose of the exam is to
understand what logging query filter to use based on your use case. Let’s
see a few.
To view flow logs for all subnets in your project (that have VPC Flow
Logs enabled), use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
To view flow logs for a specific subnet in your project (that have VPC
Flow Logs enabled), use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
resource.labels.subnetwork_name="SUBNET_NAME"
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
jsonPayload.src_instance.vm_name="VM_NAME"
To view flow logs for a specific CIDR block, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
ip_in_net(jsonPayload.connection.dest_ip, CIDR_BL
To view flow logs for a specific GKE cluster, use this logging query:
resource.type="k8s_cluster"
logName="projects/PROJECT_ID/logs/vpc_flows"
resource.labels.cluster_name="CLUSTER_NAME"
To view flow logs for only egress traffic from a subnet, use this logging
query:
logName="projects/PROJECT_ID/logs/compute.googlea
jsonPayload.reporter="SRC" AND
jsonPayload.src_vpc.subnetwork_name="SUBNET_NAME"
(jsonPayload.dest_vpc.subnetwork_name!="SUBNET_NA
To view flow logs for all egress traffic from a VPC network, use this
logging query:
logName="projects/PROJECT_ID/logs/compute.googlea
jsonPayload.reporter="SRC" AND
jsonPayload.src_vpc.vpc_name="VPC_NAME" AND
(jsonPayload.dest_vpc.vpc_name!="VPC_NAME" OR NOT
To view flow logs for specific ports and protocols, use this logging
query:
resource.type="gce_subnetwork"
l N " j t /PROJECT ID/l / t l
logName="projects/PROJECT_ID/logs/compute.googlea
jsonPayload.connection.dest_port=PORT
jsonPayload.connection.protocol=PROTOCOL
This concludes the section about VPC Flow Logs. In the next section, you
will learn when to enable firewall logs and firewall insights and which
option better suits your network operations needs.
In the first part of this section, you will learn how to configure firewall rules
to log inbound and outbound events.
In the second part of this section, you will learn what to do to utilize
firewall rules effectively and efficiently.
You enable Firewall Rules Logging individually for each firewall rule
whose connections you need to log. Firewall Rules Logging is an option for
any firewall rule, except the two implied rules, as noted as follows.
Exam tipYou cannot enable Firewall Rules Logging for the implied deny
ingress and implied allow egress rules. For more details about the implied
rules, visit
https://cloud.google.com/vpc/docs/firewalls#defaul
t_firewall_rules.
In Figure 8-35, you can see how each of the two firewall rules associated to
the VPC has its own log stream, which ingests connection records to Cloud
Logging.
Figure 8-35 Firewall Rules Logging overview
In contrast to VPC Flow Logs, firewall rule logs are not sampled. Instead,
connection records—whether the connections are allowed or denied—are
continuously collected and sent to Cloud Logging.
As shown in Figure 8-35, each connection record includes the source and
destination IP addresses, the protocol and ports, date and time, and a
reference to the firewall rule that applied to the traffic.
The figure also reminds you—as you learned in Chapters 2 and 3—that
firewall rules are defined at the VPC level because they are global
resources. They also operate as distributed, software-defined firewalls. As a
result, they don’t become choke points as traditional firewalls.
Exam tipFirewall rule logs are created in the project that hosts the network
containing the VM instances and firewall rules. With Shared VPC, VM
instances are created and billed in service projects, but they use a Shared
VPC network located in the host project. As a result, firewall rule logs are
stored in the host project.
Firewall rule logs are initially stored in Cloud Logging. Here are some
guidelines on how to filter the data in Cloud Logging to select the firewall
rule logs that best suit your network operation needs.
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
To view firewall logs specific to a given subnet, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
resource.labels.subnetwork_name="SUBNET_NAME"
To view firewall logs specific to a given VM, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
jsonPayload.instance.vm_name="INSTANCE_ID"
To view firewall logs for connections from a specific country, use this
logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googlea
jsonPayload.remote_location.country=COUNTRY
where the variable COUNTRY denotes the ISO 3166-1alpha-3 code of the
country whose connections you are inquiring about.
Firewall Insights
For example, by using Firewall Insights you learn which firewall rules are
overly permissive, and you can leverage the generated recommendations to
make them more strict.
Shadowed firewall rule insights, which are derived from data about
how you have configured your firewall rules. A shadowed rule shares
attributes—such as IP address ranges—with other rules of higher or
equal priority.
Overly permissive rule insights, including each of the following:
Allow rules with no hits
Allow rules with unused attributes
Allow rules with overly permissive IP addresses or port ranges
Deny rule insights with no hits during the observation period.
Replace as follows:
There are two steps to view IPsec tunnel status. First, identify the tunnel
name and region, and then use the describe command option to view tunnel
details.
To identify the name and region of the VPN tunnel whose status you
need to check, use the gcloud compute vpn-gateways
describe command as follows:
The output will return a message that shows the status of the investigated
IPsec tunnel.
The IP addresses that you can use for a BGP session depend on which
network connectivity product you use. For details, visit
https://cloud.google.com/network-
connectivity/docs/router/concepts/overview#bgp-ips.
“Invalid value for field resource.bgp.asn: ######. Local ASN conflicts with
peer ASN specified by a router in the same region and network.”
The last section of the book is about performance and latency. You have
architected the network of your workloads for high availability, resilience,
security, and cost-effectiveness—these are four of the five pillars of the
well-architected framework, which has been the main theme of our journey.
Now, you need to make sure that your workloads operate and maintain the
performance specified by the SLOs.
Google Cloud has developed and released Cloud Trace to help you measure
latency. Cloud Trace is a distributed tracing system for Google Cloud,
which helps you understand how long it takes your application to handle
incoming requests from users or other applications and how long it takes to
complete operations like RPC calls performed when handling the requests.
In the next section, you will learn how to use Cloud Trace to measure
latency and determine bottlenecks in your workloads.
When a user request arrives, we create a trace that will describe how our
workload responds.
In Figure 8-37, you can find an example. The parent (also referred to as
root) span describes the latency observed by the end user and is drawn in
the top part of the figure. Each of the child spans describes how a particular
service in a distributed system was called and responded with latency data
captured for each. Child spans are shown under the root span.
Figure 8-37 Traces and spans
All Cloud Run, Cloud Functions, and App Engine standard applications are
automatically traced, and libraries are available to trace applications
running elsewhere after minimal setup.
To view your application’s traces, you need to use the Google Cloud
console.
From the product dashboard, go to the product menu and select Trace.
The trace list shows all the traces that were captured by Cloud Trace
over the selected time interval, as shown in Figure 8-38.
Figure 8-38 Trace list. Portions of this page are reproduced under the CC-BY license and shared by Google:
https://cloud.google.com/trace/docs/finding-traces#viewing_recent_traces
You can filter by label to select only the traces for a given span (e.g.,
RootSpan: Recv as shown in Figure 8-39).
Figure 8-39 Filtering traces by label. Portions of this page are reproduced under the CC-BY license and shared by Google:
https://cloud.google.com/trace/docs/finding-traces#filter_traces
You can use this filter to drill down a relevant trace and find latency
data in the “waterfall chart” (Figure 8-40).
Figure 8-40 Waterfall chart for a trace. Portions of this page are reproduced under the CC-BY license and shared by Google:
https://cloud.google.com/trace/docs/viewing-details#timeline
On the top, you can see the root span, and all the child spans are below it.
You can see the latency for each span in the chart to quickly determine the
source of the latency in our application overall.
In this section, you learned how to use Cloud Trace to measure and analyze
both the overall latency of user requests and the way services interact. This
will help you find the primary contributor to latency issues.
Exam Questions
Which two products should you incorporate into the solution? (Choose
two.)
A.
VPC Flow Logs
B.
Firewall logs
C.
Cloud Audit logs
D.
Cloud Trace
E.
Compute Engine instance system logs
Rationale
You have created a firewall with rules that only allow traffic over HTTP,
HTTPS, and SSH ports. While testing, you specifically try to reach the
server over multiple ports and protocols; however, you do not see any
denied connections in the firewall logs. You want to resolve the issue. What
should you do?
A.
Enable logging on the default Deny Any Firewall Rule.
B.
Enable logging on the VM instances that receive traffic.
C.
Create a logging sink forwarding all firewall logs with no filters.
D.
Create an explicit Deny Any rule and enable logging on the new rule.
Rationale
A is incorrect because you cannot enable Firewall Rules Logging for the
implied deny ingress and implied allow egress rules.
B is incorrect because enabling logging on the VMs won’t tell you
anything about allowed or denied connections, which is the requirement.
C is incorrect because a logging sink won’t change the current scenario.
D is CORRECT because to capture denied connections, you need to
create a new firewall rule with --action=deny and --
direction=ingress and enable logging on it with the flag --
enable-logging.
You are trying to update firewall rules in a shared VPC for which you have
been assigned only network admin permissions. You cannot modify the
firewall rules.
A.
Security admin privileges from the Shared VPC admin.
B.
Service project admin privileges from the Shared VPC admin.
C.
Shared VPC admin privileges from the organization admin.
D.Organization admin privileges from the organization admin.
Rationale
Your company has a security team that manages firewalls and SSL
certificates. It also has a networking team that manages the networking
resources. The networking team needs to be able to read firewall rules, but
should not be able to create, modify, or delete them.
A.
Assign members of the networking team the
compute.networkUser role.
B.
Assign members of the networking team the
compute.networkAdmin role.
C.
Assign members of the networking team a custom role with only the
compute.networks.* and the compute.firewalls.list
permissions.
D.
Assign members of the networking team the
compute.networkViewer role, and add the
compute.networks.use permission.
Rationale
A.
One of the VPN sessions is configured incorrectly.
B.
A firewall is blocking the traffic across the second VPN connection.
C.
You do not have a load balancer to load-balance the network traffic.
D.
BGP sessions are not established between both on-premises routers
and the Cloud Router.
Rationale