Google Cloud Platform (GCP) Security Best Practices

Google Cloud Platform (GCP) is a cloud service which provides customers the ability to create and manage virtual machines and Kubernetes clusters, run applications and store data.

The intention of this blog post is to make a walkthrough of a couple of GCP's features and give security recommendations and advice on how to configure your GCP environments.

If you think something is missing here, or if you have further questions about cloud infrastructure security, please contact us. This article was written by Assured GCP specialists Patrik Aldenvik and Benjamin Svensson.

Structure

Organizations are the top structural level in GCP. They define domains within which all other resources reside. Every single resource belongs to one project and a project is an isolated part of the organization which has its own set of permissions, virtual machines, storage buckets and so on.

To ease administration it is possible to sort projects into folders, where a folder is just a node in the GCP resource hierarchy. A folder could for example be used to manage projects that belongs to a specific department.

An important aspect of the resource hierarchy is that all permissions are inherited and the less restrictive permission policy prevails (GCP IAM - Policy Hierarchy). This means that permissions set on an organizational level are passed down to the projects under that organization and projects pass permissions down to resources. For example, a permission added to a specific resource (e.g. a VM) is overridden if there's a less restrictive overlapping permission higher up the hierarchy.

IAM - Identity and Access Management

Google has separated the management of users and the assignment of privileges to a user or group. By using Cloud Identity or G Suite it is possible to administer user accounts and distribute them into suitable groups. These groups and users are then assigned privileges inside GCP using Cloud Identity and Access Management (Cloud IAM).

It is possible to assign privileges in GCP to a Gmail account but it is recommended to use accounts that are easier to manage such as Cloud Identity or G Suite accounts. If these services are used, take advantage of the possibility to sort the users into groups as this will reduce the administration.

Always apply the principle of least privilege, only assign required privileges to resources.

Cloud IAM has predefined different sets of privileges, called roles (GCP IAM - Understanding Roles). The predefined roles will provide fine-grained privileges for all services and most use cases. These roles are perfect to assign to the previous mentioned groups but remember to always apply the principle of least privilege, only assign required privileges to resources. If the predefined roles are too coarse it is possible to create custom roles that only include the privileges needed for the specific task. GCP also includes legacy "Primitive roles" that are very coarse and not recommended for usage.

As always when discussing authentication it is recommended to use multi-factor authentication for all users. For administrators on an internet exposed platform, it is a definite must!

Use multi-factor authentication for all users. For administrators on an internet exposed platform, it is a definite must!

VPC - Virtual Private Cloud

In most cloud platforms there is the concept of a Virtual Private Cloud (VPC) that could be compared to an on-premise network. Within a VPC you create and manage typical network resources such as subnets, firewall rules, routes and more. By default when you create a new project in GCP it generates a default network and associated firewall rules. These defaults are predictable and permissive, and inflict unnecessary risk towards your environment. Hence it is recommended that you create your own VPC and firewall rules instead of using the default generated ones.

Implicitly, if no other firewall rule matches, GCP will deny all ingress (incoming) traffic towards the VPC and allow all egress (outgoing) traffic from the VPC. A security mechanism that is sometimes forgotten is filtering egress traffic. It is recommended to only allow egress traffic that is needed for the environment to function. The same is true for ingress traffic of course but since the implicit rule is "deny any" it becomes much more intuitive.

GCP implements priority to decide which firewall rule matches the traffic. Priority is an integer between 0 and 65535, where 0 is seen as the highest priority. When traffic is evaluated against the firewall rule set, the first match with the highest priority is taken. If rules share the same priority but have different actions, the "deny" action supersedes "allow". If rules share the same priority and the same action the matching rule is undetermined but the traffic is handled in a consistent way.

In hybrid environments (on-premise and cloud) it is a good idea to reduce the attack surface and keep communication using public IPs to a minimum; use a VPN or direct access connection between the two sites. Google's Cloud VPN and Cloud Interconnect are the related services for this. The same theory goes for communication between different VPCs; instead of exposing the VMs externally, use a shared VPC or VPC peering to enable internal communication across VPCs.

Use a shared VPC or VPC peering to enable internal communication across VPCs

By default, all VMs in GCP are assigned a public IP address and are therefore accessible directly from the internet if there are firewall rules that allows it (such as the default ones). This is a risk that could be easily mitigated by disabling public IPs in a project (GCP - Disabling external IP access for VM instances). The downside of disabling public IPs is that the VMs are unable to communicate with the internet; depending on the environment, an external connection could be needed. If external communication is needed, either allow the specific instances to be assigned a public IP or use an internet gateway which in GCP is called Cloud NAT (GCP - Using Cloud NAT).

A recommended alternative to manage VMs in the cloud is to deny management access (SSH/RDP) directly from the internet and instead use a VPN solution or a bastion host. With such a setup you as a client land in a management network that is allowed to interact with the management interfaces of the VMs. More information about connecting to VMs is found at: GCP - Connecting to instances without external IP addresses

Deny management access (SSH/RDP) directly from the internet and instead use a VPN solution or a bastion host

GCP is more or less a set of APIs that a user communicates with. To reduce the attack surface it is recommended to disable or restrict access to APIs which are not in use. Restricting access could be done through IAM policies and VPC Service Controls (GCP - Overview of VPC Service Controls).

Compute Engine

Compute Engine is the GCP service that manages VMs. By default, a VM is given a couple of features that are less secure. One example is "Project-wide SSH keys", which give administrators the ability to provision public SSH keys to all instances in a project. It is recommended to disable project-wide SSH keys as it applies a coarse security model that allows the holder of an added SSH key privileged access to all Linux instances.

Disable project-wide SSH keys°

°Update 2019-12-20: As a replacement to "Project-wide SSH keys" it is recommended to use OS login and to use it with two-factor authentication - Thanks to Max Illfelder from Google for pointing that out for us.

Another insecure feature is the default service account that each VM obtains at creation, unless configured differently. The default service account gives read-only access to all Cloud Storage buckets and its content that resides in the same project as the VM instance. The implication of this is that if no extra configuration is made, an attacker who gains access to a VM will also be able to access the information stored in the cloud storage buckets.

Cloud Storage

Cloud Storage is the GCP service that allows customers to save data in the cloud. Similar services from other providers are AWS S3 or Azure Storage. All data uploaded to Cloud Storage is stored in a "bucket".

A bucket attribute of importance is the name: a bucket name has to be unique across the whole platform. This means that Customer A and Customer B can not both name their bucket to "mybucket" but instead needs to name them uniquely such as "mybucket-customer-a" and "mybucket-customer-b". As buckets are globally accessible it is possible to enumerate them, especially if a customer uses a name such as "mybucket-customer-a". It is recommended to append random characters to the bucket name and not include the company name in it. An example is "prod-logs-b7b12b36511ac3462d12e62164dfff4e". This will make it harder for an attacker to locate buckets in a targeted attack.

Append random characters to bucket names and do not include the company name in them.

A common vulnerability regarding buckets is misconfigured permissions. An example is the misunderstanding of the group "allAuthenticatedUsers", this is a special identifier for any user or service account that has access to the Google platform. It does not identify user or service accounts that belong to a specific organization. Again, you want to implement the principle of least privilege when it comes to bucket permissions. For example: allow a service account with write access to a specific bucket or folder only if it is supposed to be used for logging only.

There are two main access controls which apply to Cloud Storage: IAM permissions and ACLs. IAM permissions is what is usually the simplest choice but if you want more granular control on an object level, ACLs are the way to go. It is recommended to take your time and understand which policy permissions and which ACL rules will fit your use cases.

Prefer ACLs over IAM permissions

More information on this topic can be found at: GCP - Security, ACLs, and access control.

Logging

In the cloud you don't have any physical firewalls or switches and therefore there are no firewall or switch logs. A substitute can be the VPC flow log, which is similar to NetFlow and includes fields such as source and destination IP, source and destination port, protocol etc. This is an excellent log for detecting port scans or anomalous behavior within your VPC network and is recommended to be enabled at full (100%) sampling rate. Yes, this could create a large amount of data but it's likely worth the extra storage to gain full visibility into your environment. If storage becomes an issue an alternative is to enable full logging on specific subnets of the network.

Enable the VPC flow log at full sampling rate to detect port scans or anomalous behavior.

The control plane is your GCP console where you configure your firewall rules or configure IAM. This part of your environment also needs to be supervised. GCP provides Cloud Audit Logs with the purpose to gain visibility in this dimension. Cloud audit logs consist of several sources, of which two of them are:

Admin activity log, which records operations that modify (Create, Delete, Update) the configuration or metadata of a resource. An example is an administrator that modifies an IAM policy or creates a VM instance. It's enabled by default.
Data access log, which records operations that reads (Get, List) information about a resource. An example is a service account that reads an IAM policy or accesses an object in a bucket. It's disabled by default.

The "Data access" log is disabled by default since it generates a lot of entries. Depending on your environment and other logging, "Data access" logs might not be needed. It's recommended to activate logging in your GCP environment to gain visibility.

Activate logging in your GCP environment to gain visibility

Metadata API

It's important not to forget about the Metadata API when you talk about GCP environments. Well, what is the Metadata API? It's the way for the cloud platform to provision specific settings and share information with the VM instances in your cloud environment. The most common use case is when a VM instance has been assigned a service account (an API token) that has been given the permissions to talk to internal services. All a VM has to do to gain access to the service account is to make an HTTP request to the IP-address 169.254.169.254 or in GCP's case, the hostname "metadata.google.internal", as seen below:

Request:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token

Response:

{
  "access_token":"ya29.Iq4BogdwlAqqoDvT4hR9ZDVVQOKA[...]",
  "expires_in":3234,
  "token_type":"Bearer"
}

This Bearer token inside the returned JSON object can then be used for Google APIs and identifies as a service account. Why is this important? Because** if an attacker is able to get a hold of a service account she or he is able to issue requests towards the Google APIs, impersonating the service account and could potentially do serious damage**.

To mitigate this, it is recommended to remove service accounts if not needed. If needed apply the principle of least privilege to reduce the impact of a compromise. It is also recommended to restrict access to metadata API. This could be done through metadata concealment in Google Kubernetes Engine (GKE) and by disabling legacy endpoints of the Metadata APIs. Note that metadata concealment will be deprecated in the future and should be replaced with Workload Identity.

Remove service accounts if not needed. Apply the principle of least privilege.

Conclusion

As the cloud is a rather recent phenomenon, best practices is still during development and is continuously changing. As a result, the default options are not always secure. Hopefully this article has provided insight into some areas that should be considered when using Google Cloud Platform. To summarize, the following points should be taken into consideration:

Remember that the less permissive IAM policy prevails.
Use fine grained permissions and apply to groups.
Remove the insecure defaults for VPCs.
Apply the principle of least privilege to Compute Engine service accounts.
Add restrictive permissions for Cloud storage.
Enable logging.
Restrict access to metadata API.
When configuration is done, verify all the above!

As a further note, Google has published a lot of best practices regarding GCP, of which some have already been linked to in this article. One blog post that relates to this article is: Don't get pwned: practicing the principle of least privilege.

If there are questions or if you are in need of an assessment of your infrastructure do not hesitate to reach out to us. We at Assured are able to help you to verify your configuration through penetration testing, assessments and advisory.