Organizations and teams often need multi-tenant, heterogeneous Kubernetes clusters to meet users’ application needs. They may also need to address certain special constraints on the Kubernetes cluster; for example, some pods may require special hardware, colocation with other specific pods, or isolation from others. There are many options for placing those application containers into different, separate node groups, one of which is through the use of taints and tolerations. In this article, we describe taints and tolerations and then use an example to illustrate how to use them to place pods on specific worker nodes while avoiding the nodes where you don’t want pods to get scheduled.
Taints and tolerations are a mechanism that allows you to ensure that pods are not placed on inappropriate nodes. Taints are added to nodes, while tolerations are defined in the pod specification. When you taint a node, it will repel all the pods except those that have a toleration for that taint. A node can have one or many taints associated with it.
For example, most Kubernetes distributions will automatically taint the master nodes so that one of the pods that manages the control plane is scheduled onto them and not any other data plane pods deployed by users. This ensures that the master nodes are dedicated to run control plane pods.
A taint can produce three possible effects:
If you need to dedicate a group of worker nodes for a set of users, you can add a taint to those nodes, such as by using this command:
Then add tolerations of the taint in that user group’s pods so they can run on those nodes. To further ensure that pods only get scheduled on that set of tainted nodes, you can also add a label to those nodes, e.g.,
dedicated=groupName. Then use NodeSelector in the deployment/pod spec, which will make sure that pods from the user group are bound to the node group and don’t run anywhere else.
If there are worker nodes with special hardware, you need to make sure that normal pods that don’t need the special hardware don’t run on those worker nodes. Do this by adding a taint to those nodes as follows:
Later on, the pods requiring special hardware can be run on those worker nodes by adding tolerations for the above taint.
A taint with the NoExecute effect will evict the running pod from the node if the pod has no tolerance for the taint. The Kubernetes node controller will automatically add this kind of taint to a node in some scenarios so that pods can be evicted immediately and the node is “drained” (have all of its pods evicted). For example, suppose a network outage causes a node to be unreachable from the controller. In this scenario, it would be best to move all of the pods off the node so that they can get rescheduled to other nodes. The node controller takes this action automatically to avoid the need for manual intervention.
The following are built-in taints:
We will now present a scenario to help you better understand taints and tolerations. Let’s start with a Kubernetes cluster that has worker nodes categorized into different groups, such as front-end nodes and back-end nodes. Let’s assume that we need to deploy the front-end application pods so that they are placed only on front-end nodes and not back-end nodes. We also must ensure that new pods are not scheduled into master nodes because those nodes run control plane components such as etcd.
Let’s start by getting the list of nodes to see what is already tainted by the Kubernetes default installation. Here we are on a cluster created by the Rancher RKE tool.
From the output above, we noticed that the master nodes are already tainted by the Kubernetes installation so that no user pods land on them until intentionally configured by the user to be placed on master nodes by adding tolerations for those taints. The output also shows a worker node that has no taints. We will now taint the worker so that only front-end pods can land on it. We can do this by using the
kubectl taint command.
The above taint has a key name
app, with a value
frontend, and has the effect of
NoSchedule, which means that no pod will be placed on this node until the pod has defined a toleration for the taint. We will see what the toleration looks like in later steps.
Let’s try to deploy an app on the cluster without any toleration configured in the app deployment specification.
We created a namespace and deployed Nginx using the kubectl run command, but looking at the pod status and cluster events, we see that the pod can’t be scheduled because there are no appropriate worker nodes. Three master nodes have taints that the pod didn’t tolerate and one worker node has a taint that the pod doesn’t tolerate. To successfully place the pod on the worker node, we need to edit the deployment and add a toleration of the taint we configured earlier on the node.
Let’s see what the current deployment YAML looks like.
From the output above, we can see that there is no toleration added in the pod spec. Let’s edit and add one.
Notice the tolerations section of the pod spec: We have added a toleration for the taint so that the pod can be scheduled on the worker node.
Now let’s get the pod’s status and events.
The pod has now been allowed to run on the tainted node. If there are other worker nodes in the cluster, and they are not tainted, then this pod can also land on those free nodes. To make sure that this pod lands on the nodes that are dedicated to front-end pods, then aside from taint and toleration, we need to label the front-end nodes (e.g.,
app=frontend) and then use NodeSelector in the pod deployment spec so that the pod is only scheduled on front-end nodes.
Taints and Tolerations provide advanced pod scheduling where tainted nodes control which pods can be scheduled on them. They are easier to manage as compared to other custom scheduling methods such as affinities. Nodes with special hardware, dedicating nodes for a group of users, and taint based pod evictions are some of the known use cases for taints and tolerations.
Follow our LinkedIn monthly digest to receive more free educational content like this.Follow LinkedIn K8s digest