Elastic Kubernetes Service (EKS) security is important because
it helps users protect their data, applications, and infrastructure running on AWS. Kubernetes is a
widely used and helpful tool for managing containerized applications, but it creates additional security
challenges that must be addressed. Securing EKS clusters involves evaluating best practices for AWS
services, worker nodes, pods, and the EKS control plane. This article will discuss best practices for
maintaining a strong EKS cluster security posture.
How can EKS worker nodes be secured? | Securing worker nodes involves implementing operating system security best practices such as removing unused packages, restricting network access, implementing tools like SELinux, and encrypting attached volumes. |
How can pods be secured in an EKS cluster? | Pods can be secured with best practices such as minimizing access to service accounts, limiting access to the host with Kyverno policies, and controlling traffic with network policies. |
How can container images be secured? | Container images can be secured by implementing image scanning with tools like Trivy, keeping images up to date, making sure that base images come from trusted registries, and ensuring that registries are secured. |
Why is observability important for securing EKS clusters? | Observability tools enable users to transparently analyze cluster behavior, providing insight into security breaches and improving incident response times. |
How can AWS infrastructure be secured for EKS? | EKS clusters rely on the security posture of the surrounding AWS resources. Best practices for securing these resources include restricting IAM access, limiting network connectivity, and enabling audit logging with CloudTrail. |
Following best practices related to worker node security is an essential aspect of protecting EKS
clusters. Worker nodes are authorized to access many components in a Kubernetes cluster as part of ordinary operations (such as running
pods), so a compromised node will have serious security implications for the whole cluster.
Worker nodes also have full access to all pods running on that node, which provides access to the pod’s
application code, log data, Kubernetes Secrets, and mounted
volumes. This sensitive information is at risk when a worker node is compromised.
There are many best practices that users follow to secure worker nodes in EKS clusters, and each is
important for maintaining a strong security posture.
Configuring security modules for Linux enables users to set up fine-grained control over pod permissions.
SELinux
and AppArmor support restricting
access to kernel capabilities, host network configuration, the filesystem, and host devices. Enabling
these restrictions will provide a layer of cluster security by reducing the attack surface available for
compromised pods. Kubernetes and EKS support both SELinux and AppArmor natively.
Hosts are often compromised by unexpected programs that may have gone unnoticed by system administrators.
Each additional package installed on a host provides a potential attack vector, and administrators must
be careful when evaluating which packages are required for their EKS cluster nodes. Removing unnecessary
programs will reduce the administrative overhead of patching and upgrades while also improving the
cluster’s security posture.
The EKS Optimized
AMI provided by AWS for EKS worker nodes already implements a minimal operating system with the
bare minimum packages required for nodes to operate. Users deploying hosts with other AMIs should
investigate their installed packages and running processes to verify that only the essential operating
system components are deployed.
The table below shows how the number of installed packages differs across AMIs. Selecting AMIs with
minimal package counts is recommended to reduce the attack surface for worker nodes.
AMI name | Ubuntu 22.04 | Amazon Linux 2 | EKS Optimized 1.25 |
Installed packages | 598 | 453 | 316 |
Each open port for a worker node host is a potential attack vector. Hosts open to the Internet will be
consistently attacked by bot scanners searching for insecure hosts to compromise. This attack vector can
be mitigated by removing unnecessary packages that open additional ports and verifying that only necessary ports are
open.
Evaluating the worker node host’s network exposure is required to ensure that hosts
aren’t vulnerable to malicious entities.
AWS supports volume
encryption for the worker node’s root volume and any additional volumes attached. This provides
an extra layer of security and is essential for meeting compliance requirements. EKS supports these
volume encryption features via the EBS
CSI project, which allows users to manage EBS volumes via native Kubernetes resources
(persistent volumes and storage classes).
The example below displays an EBS volume with encryption enabled. Users can verify the encryption status
of their volumes via the AWS web console or with the AWS CLI.
In addition to blocking network access as mentioned above, disabling SSH is another method of improving
worker node network security. SSH is the first point of attack when hosts are exposed to the Internet,
and hosts can be compromised when users implement weak passwords instead of key-based authentication or
when they use old, compromised versions of SSH packages.
If necessary, users on AWS can enable access to hosts with Systems
Manager, which allows users to authenticate via IAM to connect to EC2 instances (worker nodes)
with more security than SSH. SSM doesn’t require opening ports on the host, and all access is logged via
CloudTrail.
Connecting to hosts via SSM instead of SSH is the recommended approach for improving the security of EKS
worker nodes.
Regardless of how carefully a user configures the operating system, installed software always requires
updates to patch security vulnerabilities. The speed at which users can evaluate CVEs and deploy patches to their worker nodes will significantly
impact their clusters’ security posture. Users should test how much time is required for their pipelines
to roll out new worker node upgrades, as this will give insight into how quickly the nodes can be
patched in response to a new CVE and whether this response time meets user expectations.
Users with complicated deployment processes (e.g., with manual testing and human approvals) may encounter
significant delays when attempting to quickly patch new vulnerabilities. A highly automated pipeline
will be able to roll out patches more quickly in response to new CVEs, thus improving the security
posture of the environment.
Worker nodes produce many useful log files by default that are relevant to maintaining security. Logs are
generated by the operating system that are related to system calls, kernel events, SSH connection
activity, changes to installed packages, and system startup. The Kubelet agent
installed on all Kubernetes/EKS worker nodes will also produce log output.
The screenshot below is a snippet from a worker node’s audit log. It provides insight into incoming SSH
sessions, showing what time the session started and what operating system user was selected. This data
is useful for identifying unwanted SSH access to EKS worker nodes.
Exporting and storing this log data is helpful for auditing node behavior and performing security-related
investigations. Ensuring that log data is available following an incident is crucial for identifying how
a breach has occurred and how it can be prevented in the future.
Since worker nodes have access to many components of an EKS cluster and contain sensitive information, it
is essential to follow node security best practices to maintain a strong security posture. Implementing
these practices can reduce the attack surface of an EKS cluster and mitigate security-related incidents.
Pod security is vital for protecting applications running in the EKS cluster as well as the
infrastructure. A compromised pod will result in security breaches related to the application code base,
logs, and downstream resources such as databases. Such a pod can also access other cluster resources
like Secrets and ConfigMaps, increasing the
impact of a security breach.
Users should follow security best practices like the ones described below to protect their EKS clusters
from compromised pods.
Service
accounts are a Kubernetes feature for granting pods permissions to access the Kubernetes API
Server. The feature is useful for extending the cluster’s functionality but also comes with
security risks.
Giving pods access to service accounts with excessive permissions increases the blast radius of a
compromised pod. The best practice for service accounts is to only provide them to pods that strictly
require them. Service accounts should also only contain the minimum possible permissions for the pod’s
requirements, removing unnecessary access. Access to service accounts is controlled via Kubernetes RBAC resources such as
Roles and RoleBindings.
The example below shows how a service account is created and mounted into a pod. It’s important to
carefully evaluate a pod’s properties to ensure that only appropriate service accounts are mounted (and
only when required). Mounting service accounts with excessive permissions is a security risk.
Visualize Utilization Metrics | Set Resource Requests & Limits | Set Requests & Limits with Machine Learning | Identify mis-sized containers at a glance & automate resizing | Get Optimal Node Configuration Recommendations | |
---|---|---|---|---|---|
Kubernetes | |||||
Kubernetes + Densify |
Kyverno can enforce security policies and best practices in order to
secure EKS clusters. Kyverno is a YAML-based policy engine that allows
users to define rules such as ones enforcing specific container images or trusted registries, setting
resource limits, denying privileged host access, and disabling service account access. By using Kyverno
to enforce security policies, you can reduce the risk of security breaches in your EKS cluster and
ensure that your applications run with secure configurations.
Any aspect of a Kubernetes object’s schema can be validated and blocked by Kyverno, allowing users to
have automated guardrails protecting their clusters.
The following Kyverno policy will check for the use of privileged mode in all pods and warn the user when
this setting is in use. Privileged mode for pods allows full access to the underlying worker node, so
restricting its use is recommended for a stronger security posture.
A key element of improving cluster security is ensuring that pods cannot compromise their underlying
worker nodes. A breached worker node will have significant access to other pods and cluster resources,
and pods are a common entry point.
Protecting worker nodes from compromised pods involves ensuring that pods aren’t granted any unnecessary
access to the host. For example, there are pods that support configurations that grant access to the
host kernel, attached volumes, network interfaces, running processes, and the root filesystem. Giving
pods like these broad permissions leaves the cluster open to significant risks. Pods should be granted
the minimum possible access to the host required to perform their functions. Tools like Kyverno allow
enforcing rules like blocking privileged access for pods.
Network policies
are a Kubernetes feature allowing users to control traffic flow within a cluster. By default, all pods
can communicate with others as well as services, nodes, and the Internet. Implementing network policies
allows users fine-grained control over what communication is allowed within a cluster and can prevent
malicious traffic from impacting a cluster.The following network policy will disable all ingress/egress
network access for pods matching a particular label.
Ensuring that pods are configured safely is important to securing an EKS cluster. The level of access
provided to pods should be carefully evaluated and enforced with relevant tools to maintain cluster
security. A combination of tools like network policies and Kyverno security policies enables users to
control a pod’s capabilities and reduce the cluster’s attack surface.
Ensuring that container images are
secured in EKS clusters is an essential aspect of maintaining cluster security. Container images include
application code and related dependencies and are deployed as pods in EKS clusters. Validating these
container images’ security helps mitigate attack vectors related to compromised pods.
A Kubernetes pod can consist of multiple containers. Each container will typically run on a container
image with multiple layers. Users should select a base image that provides the functionality required
for their applications; additional layers may install extra dependencies. Base images should be
carefully scanned to ensure that no severe vulnerabilities exist. If they are discovered, users should
take action by updating or changing their base images to more suitable alternatives.
The following security best practices will help improve container image security.
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialA developer deploying an application as a container will typically use another container image as a base
layer rather than building an image completely from scratch. Container base images such as Ubuntu or
Nginx allow developers to quickly get their application code running in an environment where many
utilities are already installed.
However, care must be taken to ensure that only trusted base images are implemented. A compromised
container image results in vulnerable software running on the EKS worker nodes, so it can be useful to
validate what container images are selected. Using images from trusted registries and implementing image
scanning are good first steps. Users managing private registries (like AWS ECR) should also take action
to secure access to the
registry.
Many tools are available to automatically scan the contents of all container images deployed to an EKS
cluster. These tools will inspect the container image packages and cross-reference their versions with
published CVEs to alert users when a vulnerable image is used. This allows users to mitigate
vulnerabilities by upgrading their containers when alerts are raised.
Trivy is an example of an open-source tool providing
image scanning functionality. The screenshot below demonstrates how Trivy can analyze a container image
passed as a command-line argument. The example shows an analysis of the Ubuntu 22.04 image, containing
24 vulnerabilities, with descriptions and severity levels provided from Trivy’s databases of CVEs.
Frequently upgrading container images is often required in EKS clusters. New container image versions
with new package versions and dependencies are released regularly to offer new features, bug fixes, and,
most importantly, patches for security vulnerabilities. Ensuring that container images are regularly
upgraded is crucial for integrating the latest security patches.
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialSecuring container images in a Kubernetes cluster is a critical aspect of maintaining the overall
security of the application stack. By implementing the best practices mentioned above, you can minimize
the risk of a security breach and ensure that your applications run in a secure and stable environment.
Observability is a
critical aspect of EKS security because it provides users with insight into the behavior of their
clusters and components, allowing them to quickly detect and respond to security incidents.
Detecting potential security breaches can be done with observability tooling that provides information
related to resource changes, spikes in traffic, unexpected log data, unusual network traffic,
unauthorized access requests, and other anomalies in the cluster’s behavior. Tooling can automatically
and immediately notify users of relevant events that require attention, allowing users to respond
quickly to critical incidents like security breaches.
The following CloudWatch logs query will identify any IAM roles/users accessing the “kubernetes-admin”
user. This user has highly privileged, unrestricted access to an EKS cluster, so access should be
carefully monitored.
The query result indicates that an IAM user successfully obtained access to the privileged
“kubernetes-admin” user and provides timestamp data. Such a log entry enables users to audit access to
the cluster and take action if necessary.
Observability tooling can provide users with extensive data related to an anomalous event, preventing
investigation delays for the user when the incident is time-sensitive. The data provided can include
attributes like a timestamp, source information, and what resources are affected, which may assist with
root cause analysis. Accessing information related to the incident investigation as quickly as possible
is essential for users to respond to the incident with actions such as isolating affected resources,
rolling back problematic changes, and escalating to relevant teams. Additional information related to
auditing log data for incident analysis can be found here.
Regulations such as HIPAA and PCI DSS require users to log data related to security events in order to
provide a detailed root cause analysis in case of a breach. Complying with regulatory requirements is
easier when observability tooling is in place to provide any data or reports required for review.
Setting up a high-quality observability configuration allows users to identify security incidents better
and accelerate post-incident analysis. A combination of logs, metrics, and traces enables users to
analyze cluster behavior transparently to better manage the security posture.
Observability is critical to securing EKS clusters because it allows operators to detect and respond to
security incidents quickly, gain insights into the cluster’s security posture, and meet compliance
requirements. By implementing observability tools and practices, organizations can improve the security
of their EKS clusters and reduce the risk of successful attacks.
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialCreating an EKS cluster involves deploying many AWS resources, which need to be secured to protect the
cluster. Users should follow various AWS best practices to maintain the security of their EKS clusters.
IAM roles and policies
should provide the minimum permissions required for users to perform their tasks. Granting excessive
permissions is a vulnerability if the IAM role is compromised, and this represents a severe attack
vector for compromised AWS accounts. Users should take care to reduce IAM access where possible.
Additionally, only IAM roles should be accessed, instead of IAM users. Creating an IAM
user involves static secret keys, which can be reused maliciously if leaked. IAM roles provide
additional security by automatically rotating keys regularly.
Security groups
and Network ACLs
act as a firewall for a resource like EC2 instances in a VPC. The inbound rules defined in EKS worker
node security groups need to be carefully evaluated to ensure that unnecessary ports aren’t opened and
that access from the public Internet is restricted.
Worker nodes should also be placed in private subnets to ensure that internet traffic cannot route to the
EC2 instances. NAT
gateways can be used to enable instances to communicate with the Internet if necessary.
EKS provides a feature to control access to the cluster’s API server
endpoint. Users can block access to the cluster from the public Internet to ensure that only
users within the VPC can connect to the API server. This improves security by mitigating unwanted public
access.
Access to the API server can be configured during either cluster creation or cluster updates. The general
recommendation is to select “private” to ensure that the API server is only reachable from within the
VPC. Users who require public access to be enabled should consider setting a CIDR value to whitelist.
This enables specific IP address ranges to be whitelisted (such as an organization’s office building)
while blocking access from unwanted IPs.
The screenshot below shows API server connectivity options available to users when creating/updating a
cluster.
If users are required to connect to the cluster from outside of the VPC, the best practice is to create a
VPN connection and route traffic through the VPC to access AWS resources.
CloudTrail logs allow users to audit events occurring
within the entire AWS account. The logs provide insight into changes being applied and resources being
accessed, allowing users to investigate potentially malicious actions. CloudTrail
also works with CloudWatch Alarms and SNS to provide notifications based on particular events.
For example, users may want to be notified when a restricted S3 bucket has its policy modified.
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialSecuring AWS resources with best practices is essential to ensuring that EKS clusters are safe for use in
production environments. Native Kubernetes security features are helpful but do not protect the
underlying AWS infrastructure from compromise. Implementing AWS services and security features is
required to protect EKS clusters.
There are many best practices available for improving the security posture of EKS clusters. Users can
maintain a strong security baseline by implementing practices related to securing AWS infrastructure,
protecting the EKS control plane endpoint, maintaining host security for their worker nodes, and
ensuring that pods are configured correctly. A combination of practices will help users protect their
applications, infrastructure, and data from problems.