Elastic Kubernetes Service (EKS) security is important because it helps users protect their data, applications, and infrastructure running on AWS. Kubernetes is a widely used and helpful tool for managing containerized applications, but it creates additional security challenges that must be addressed. Securing EKS clusters involves evaluating best practices for AWS services, worker nodes, pods, and the EKS control plane. This article will discuss best practices for maintaining a strong EKS cluster security posture.
|How can EKS worker nodes be secured?||Securing worker nodes involves implementing operating system security best practices such as removing unused packages, restricting network access, implementing tools like SELinux, and encrypting attached volumes.|
|How can pods be secured in an EKS cluster?||Pods can be secured with best practices such as minimizing access to service accounts, limiting access to the host with Kyverno policies, and controlling traffic with network policies.|
|How can container images be secured?||Container images can be secured by implementing image scanning with tools like Trivy, keeping images up to date, making sure that base images come from trusted registries, and ensuring that registries are secured.|
|Why is observability important for securing EKS clusters?||Observability tools enable users to transparently analyze cluster behavior, providing insight into security breaches and improving incident response times.|
|How can AWS infrastructure be secured for EKS?||EKS clusters rely on the security posture of the surrounding AWS resources. Best practices for securing these resources include restricting IAM access, limiting network connectivity, and enabling audit logging with CloudTrail.|
Following best practices related to worker node security is an essential aspect of protecting EKS clusters. Worker nodes are authorized to access many components in a Kubernetes cluster as part of ordinary operations (such as running pods), so a compromised node will have serious security implications for the whole cluster.
Worker nodes also have full access to all pods running on that node, which provides access to the pod’s application code, log data, Kubernetes Secrets, and mounted volumes. This sensitive information is at risk when a worker node is compromised.
There are many best practices that users follow to secure worker nodes in EKS clusters, and each is important for maintaining a strong security posture.
Configuring security modules for Linux enables users to set up fine-grained control over pod permissions. SELinux and AppArmor support restricting access to kernel capabilities, host network configuration, the filesystem, and host devices. Enabling these restrictions will provide a layer of cluster security by reducing the attack surface available for compromised pods. Kubernetes and EKS support both SELinux and AppArmor natively.
Hosts are often compromised by unexpected programs that may have gone unnoticed by system administrators. Each additional package installed on a host provides a potential attack vector, and administrators must be careful when evaluating which packages are required for their EKS cluster nodes. Removing unnecessary programs will reduce the administrative overhead of patching and upgrades while also improving the cluster’s security posture.
The EKS Optimized AMI provided by AWS for EKS worker nodes already implements a minimal operating system with the bare minimum packages required for nodes to operate. Users deploying hosts with other AMIs should investigate their installed packages and running processes to verify that only the essential operating system components are deployed.
The table below shows how the number of installed packages differs across AMIs. Selecting AMIs with minimal package counts is recommended to reduce the attack surface for worker nodes.
|AMI name||Ubuntu 22.04||Amazon Linux 2||EKS Optimized 1.25|
Each open port for a worker node host is a potential attack vector. Hosts open to the Internet will be
consistently attacked by bot scanners searching for insecure hosts to compromise. This attack vector can
be mitigated by removing unnecessary packages that open additional ports and verifying that only necessary ports are
Evaluating the worker node host’s network exposure is required to ensure that hosts aren’t vulnerable to malicious entities.
AWS supports volume encryption for the worker node’s root volume and any additional volumes attached. This provides an extra layer of security and is essential for meeting compliance requirements. EKS supports these volume encryption features via the EBS CSI project, which allows users to manage EBS volumes via native Kubernetes resources (persistent volumes and storage classes).
The example below displays an EBS volume with encryption enabled. Users can verify the encryption status of their volumes via the AWS web console or with the AWS CLI.
In addition to blocking network access as mentioned above, disabling SSH is another method of improving worker node network security. SSH is the first point of attack when hosts are exposed to the Internet, and hosts can be compromised when users implement weak passwords instead of key-based authentication or when they use old, compromised versions of SSH packages.
If necessary, users on AWS can enable access to hosts with Systems Manager, which allows users to authenticate via IAM to connect to EC2 instances (worker nodes) with more security than SSH. SSM doesn’t require opening ports on the host, and all access is logged via CloudTrail. Connecting to hosts via SSM instead of SSH is the recommended approach for improving the security of EKS worker nodes.
Regardless of how carefully a user configures the operating system, installed software always requires updates to patch security vulnerabilities. The speed at which users can evaluate CVEs and deploy patches to their worker nodes will significantly impact their clusters’ security posture. Users should test how much time is required for their pipelines to roll out new worker node upgrades, as this will give insight into how quickly the nodes can be patched in response to a new CVE and whether this response time meets user expectations.
Users with complicated deployment processes (e.g., with manual testing and human approvals) may encounter significant delays when attempting to quickly patch new vulnerabilities. A highly automated pipeline will be able to roll out patches more quickly in response to new CVEs, thus improving the security posture of the environment.
Worker nodes produce many useful log files by default that are relevant to maintaining security. Logs are generated by the operating system that are related to system calls, kernel events, SSH connection activity, changes to installed packages, and system startup. The Kubelet agent installed on all Kubernetes/EKS worker nodes will also produce log output.
The screenshot below is a snippet from a worker node’s audit log. It provides insight into incoming SSH sessions, showing what time the session started and what operating system user was selected. This data is useful for identifying unwanted SSH access to EKS worker nodes.
Exporting and storing this log data is helpful for auditing node behavior and performing security-related investigations. Ensuring that log data is available following an incident is crucial for identifying how a breach has occurred and how it can be prevented in the future.
Since worker nodes have access to many components of an EKS cluster and contain sensitive information, it is essential to follow node security best practices to maintain a strong security posture. Implementing these practices can reduce the attack surface of an EKS cluster and mitigate security-related incidents.
Pod security is vital for protecting applications running in the EKS cluster as well as the infrastructure. A compromised pod will result in security breaches related to the application code base, logs, and downstream resources such as databases. Such a pod can also access other cluster resources like Secrets and ConfigMaps, increasing the impact of a security breach.
Users should follow security best practices like the ones described below to protect their EKS clusters from compromised pods.
Service accounts are a Kubernetes feature for granting pods permissions to access the Kubernetes API Server. The feature is useful for extending the cluster’s functionality but also comes with security risks.
Giving pods access to service accounts with excessive permissions increases the blast radius of a compromised pod. The best practice for service accounts is to only provide them to pods that strictly require them. Service accounts should also only contain the minimum possible permissions for the pod’s requirements, removing unnecessary access. Access to service accounts is controlled via Kubernetes RBAC resources such as Roles and RoleBindings.
The example below shows how a service account is created and mounted into a pod. It’s important to carefully evaluate a pod’s properties to ensure that only appropriate service accounts are mounted (and only when required). Mounting service accounts with excessive permissions is a security risk.
Kyverno can enforce security policies and best practices in order to secure EKS clusters. Kyverno is a YAML-based policy engine that allows users to define rules such as ones enforcing specific container images or trusted registries, setting resource limits, denying privileged host access, and disabling service account access. By using Kyverno to enforce security policies, you can reduce the risk of security breaches in your EKS cluster and ensure that your applications run with secure configurations.
Any aspect of a Kubernetes object’s schema can be validated and blocked by Kyverno, allowing users to have automated guardrails protecting their clusters.
The following Kyverno policy will check for the use of privileged mode in all pods and warn the user when this setting is in use. Privileged mode for pods allows full access to the underlying worker node, so restricting its use is recommended for a stronger security posture.
A key element of improving cluster security is ensuring that pods cannot compromise their underlying worker nodes. A breached worker node will have significant access to other pods and cluster resources, and pods are a common entry point.
Protecting worker nodes from compromised pods involves ensuring that pods aren’t granted any unnecessary access to the host. For example, there are pods that support configurations that grant access to the host kernel, attached volumes, network interfaces, running processes, and the root filesystem. Giving pods like these broad permissions leaves the cluster open to significant risks. Pods should be granted the minimum possible access to the host required to perform their functions. Tools like Kyverno allow enforcing rules like blocking privileged access for pods.
Network policies are a Kubernetes feature allowing users to control traffic flow within a cluster. By default, all pods can communicate with others as well as services, nodes, and the Internet. Implementing network policies allows users fine-grained control over what communication is allowed within a cluster and can prevent malicious traffic from impacting a cluster.The following network policy will disable all ingress/egress network access for pods matching a particular label.
Ensuring that pods are configured safely is important to securing an EKS cluster. The level of access provided to pods should be carefully evaluated and enforced with relevant tools to maintain cluster security. A combination of tools like network policies and Kyverno security policies enables users to control a pod’s capabilities and reduce the cluster’s attack surface.
Ensuring that container images are secured in EKS clusters is an essential aspect of maintaining cluster security. Container images include application code and related dependencies and are deployed as pods in EKS clusters. Validating these container images’ security helps mitigate attack vectors related to compromised pods.
A Kubernetes pod can consist of multiple containers. Each container will typically run on a container image with multiple layers. Users should select a base image that provides the functionality required for their applications; additional layers may install extra dependencies. Base images should be carefully scanned to ensure that no severe vulnerabilities exist. If they are discovered, users should take action by updating or changing their base images to more suitable alternatives.
The following security best practices will help improve container image security.
A developer deploying an application as a container will typically use another container image as a base layer rather than building an image completely from scratch. Container base images such as Ubuntu or Nginx allow developers to quickly get their application code running in an environment where many utilities are already installed.
However, care must be taken to ensure that only trusted base images are implemented. A compromised container image results in vulnerable software running on the EKS worker nodes, so it can be useful to validate what container images are selected. Using images from trusted registries and implementing image scanning are good first steps. Users managing private registries (like AWS ECR) should also take action to secure access to the registry.
Many tools are available to automatically scan the contents of all container images deployed to an EKS cluster. These tools will inspect the container image packages and cross-reference their versions with published CVEs to alert users when a vulnerable image is used. This allows users to mitigate vulnerabilities by upgrading their containers when alerts are raised.
Trivy is an example of an open-source tool providing image scanning functionality. The screenshot below demonstrates how Trivy can analyze a container image passed as a command-line argument. The example shows an analysis of the Ubuntu 22.04 image, containing 24 vulnerabilities, with descriptions and severity levels provided from Trivy’s databases of CVEs.
Frequently upgrading container images is often required in EKS clusters. New container image versions with new package versions and dependencies are released regularly to offer new features, bug fixes, and, most importantly, patches for security vulnerabilities. Ensuring that container images are regularly upgraded is crucial for integrating the latest security patches.
Securing container images in a Kubernetes cluster is a critical aspect of maintaining the overall security of the application stack. By implementing the best practices mentioned above, you can minimize the risk of a security breach and ensure that your applications run in a secure and stable environment.
Observability is a critical aspect of EKS security because it provides users with insight into the behavior of their clusters and components, allowing them to quickly detect and respond to security incidents.
Detecting potential security breaches can be done with observability tooling that provides information related to resource changes, spikes in traffic, unexpected log data, unusual network traffic, unauthorized access requests, and other anomalies in the cluster’s behavior. Tooling can automatically and immediately notify users of relevant events that require attention, allowing users to respond quickly to critical incidents like security breaches.
The following CloudWatch logs query will identify any IAM roles/users accessing the “kubernetes-admin” user. This user has highly privileged, unrestricted access to an EKS cluster, so access should be carefully monitored.
The query result indicates that an IAM user successfully obtained access to the privileged “kubernetes-admin” user and provides timestamp data. Such a log entry enables users to audit access to the cluster and take action if necessary.
Observability tooling can provide users with extensive data related to an anomalous event, preventing investigation delays for the user when the incident is time-sensitive. The data provided can include attributes like a timestamp, source information, and what resources are affected, which may assist with root cause analysis. Accessing information related to the incident investigation as quickly as possible is essential for users to respond to the incident with actions such as isolating affected resources, rolling back problematic changes, and escalating to relevant teams. Additional information related to auditing log data for incident analysis can be found here.
Regulations such as HIPAA and PCI DSS require users to log data related to security events in order to provide a detailed root cause analysis in case of a breach. Complying with regulatory requirements is easier when observability tooling is in place to provide any data or reports required for review.
Setting up a high-quality observability configuration allows users to identify security incidents better and accelerate post-incident analysis. A combination of logs, metrics, and traces enables users to analyze cluster behavior transparently to better manage the security posture.
Observability is critical to securing EKS clusters because it allows operators to detect and respond to security incidents quickly, gain insights into the cluster’s security posture, and meet compliance requirements. By implementing observability tools and practices, organizations can improve the security of their EKS clusters and reduce the risk of successful attacks.
Creating an EKS cluster involves deploying many AWS resources, which need to be secured to protect the cluster. Users should follow various AWS best practices to maintain the security of their EKS clusters.
IAM roles and policies should provide the minimum permissions required for users to perform their tasks. Granting excessive permissions is a vulnerability if the IAM role is compromised, and this represents a severe attack vector for compromised AWS accounts. Users should take care to reduce IAM access where possible. Additionally, only IAM roles should be accessed, instead of IAM users. Creating an IAM user involves static secret keys, which can be reused maliciously if leaked. IAM roles provide additional security by automatically rotating keys regularly.
Security groups and Network ACLs act as a firewall for a resource like EC2 instances in a VPC. The inbound rules defined in EKS worker node security groups need to be carefully evaluated to ensure that unnecessary ports aren’t opened and that access from the public Internet is restricted.
Worker nodes should also be placed in private subnets to ensure that internet traffic cannot route to the EC2 instances. NAT gateways can be used to enable instances to communicate with the Internet if necessary.
EKS provides a feature to control access to the cluster’s API server endpoint. Users can block access to the cluster from the public Internet to ensure that only users within the VPC can connect to the API server. This improves security by mitigating unwanted public access.
Access to the API server can be configured during either cluster creation or cluster updates. The general recommendation is to select “private” to ensure that the API server is only reachable from within the VPC. Users who require public access to be enabled should consider setting a CIDR value to whitelist. This enables specific IP address ranges to be whitelisted (such as an organization’s office building) while blocking access from unwanted IPs.
The screenshot below shows API server connectivity options available to users when creating/updating a cluster.
If users are required to connect to the cluster from outside of the VPC, the best practice is to create a VPN connection and route traffic through the VPC to access AWS resources.
CloudTrail logs allow users to audit events occurring within the entire AWS account. The logs provide insight into changes being applied and resources being accessed, allowing users to investigate potentially malicious actions. CloudTrail also works with CloudWatch Alarms and SNS to provide notifications based on particular events. For example, users may want to be notified when a restricted S3 bucket has its policy modified.
Securing AWS resources with best practices is essential to ensuring that EKS clusters are safe for use in production environments. Native Kubernetes security features are helpful but do not protect the underlying AWS infrastructure from compromise. Implementing AWS services and security features is required to protect EKS clusters.
There are many best practices available for improving the security posture of EKS clusters. Users can maintain a strong security baseline by implementing practices related to securing AWS infrastructure, protecting the EKS control plane endpoint, maintaining host security for their worker nodes, and ensuring that pods are configured correctly. A combination of practices will help users protect their applications, infrastructure, and data from problems.
Subscribe to our monthly LinkedIn educational digest for content like this.Subscribe now