Introduction

Welcome to this tutorial about Kubernetes. This tutorial is intended for people who want to learn how to install and configure a Kubernetes cluster. It is also intended for people who want to understand how Kubernetes works.

This tutorial is based on the official documentation of Kubernetes. The documentation is very good, so I recommend you to read it if you want to learn more about Kubernetes or if you need help with a specific topic.

This course is divided into the following sections:

Introduction: This page
Kubernetes Components: Kubernetes components
Infrastructure Provisioning: How to provision infrastructure for Kubernetes
Installation: How to install Kubernetes
Understanding what we have done: Understanding what we have done
Upgrading cluster version: How to upgrade Kubernetes cluster version
Create a user for a developer: How to create a user for a developer
Bonus: Bonus

Prerequisites

This course is written to be given in a school so it will be using Scaleway as a cloud provider. However, you can follow this tutorial without using Scaleway. You will just need to provision your own infrastructure.

Understanding Kubernetes Components

Before starting any work, let's review how Kubernetes is designed and what are the components that compose it.

Kubernetes Architecture

source

Kubernetes is a distributed system composed of multiple components. The following diagram shows the architecture of a Kubernetes cluster.

Kubernetes Architecture

The Kubernetes cluster consists of two types of resources:

The control plane coordinates the cluster
The worker nodes are the machines that run applications

A basic Kubernetes cluster has two worker nodes and one control plane node. The control plane manages the worker nodes and the pods running on the worker nodes through the Kubernetes API.

Kubernetes Control Plane

As we said earlier, the control plane is responsible for managing the worker nodes and the pods running on the worker nodes. The control plane consists of the following components:

We will go through each of these components in the following sections.

kube-apiserver

The kube-apiserver is the component that exposes the Kubernetes API. The API server is the front end for the Kubernetes control plane. The API server is responsible for retrieving and updating data in the cluster. The API server is the only component that talks to the etcd cluster. The API server is responsible for:

Exposing the Kubernetes API
Authenticating requests
Authorizing requests
Scheduling pods
Managing the cluster state

etcd

etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines. It's used by Kubernetes as backing store for all cluster data. The etcd cluster is used by the API server to store the state of the cluster.

kube-scheduler

The kube-scheduler is responsible for scheduling pods on worker nodes. The scheduler watches for newly created pods that have no node assigned. For every pod that the scheduler discovers, the scheduler becomes responsible for finding the best node for that pod to run on. The scheduler reaches this decision taking into account the resources available on the node and any relevant constraints. The scheduler makes the decision based on a set of predefined policies. The default scheduler provided with Kubernetes is default-scheduler. You can create your own scheduler and use it instead of the default scheduler. The default scheduler looks at the following factors when scheduling a pod:

Node affinity
Node anti-affinity
Pod affinity
Pod anti-affinity
Taints and tolerations
Node and pod resource requests and limits
Node conditions
Pod priority
Inter-pod affinity and anti-affinity
Node labels

kube-controller-manager

The kube-controller-manager is a daemon that embeds the core control loops shipped with Kubernetes. In Kubernetes, a control loop is a non-terminating loop that regulates the state of the cluster. The controller manager runs all the control loops in separate processes. The following controllers are included in the kube-controller-manager:

Node Controller: Responsible for noticing and responding when nodes go down.
Replication Controller: Responsible for maintaining the correct number of pods for every replication controller object in the system.
Endpoints Controller: Populates the Endpoints object (that is, joins Services & Pods).
Service Account & Token Controllers: Create default accounts and API access tokens for new namespaces.

Kubernetes Worker Nodes

The worker nodes are the machines that run applications. The worker nodes consist of the following components:

kubelet

The kubelet is the primary "node agent" that runs on each node. The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in the PodSpecs are running and healthy. The kubelet doesn't manage containers which were not created by Kubernetes.

kube-proxy

The kube-proxy is responsible for network proxying. The kube-proxy maintains network rules on the nodes. These network rules allow network communication to your Pods from network sessions inside or outside your cluster. The kube-proxy uses the operating system packet filtering layer if there is one available. Otherwise, kube-proxy forwards the traffic itself. The kube-proxy is responsible for:

Load balancing connections across the different Pods
Maintaining network rules on the nodes
Forwarding traffic to the right Pod

Infrastructure provisionning

To deploy our Kubernetes cluster, we need to have some VMs available. To do this in this course, we will be using Scaleway.

At the root of this repository, you will find a terraform folder. It contains the source Terraform files that will provide our infrastructure.

TL;DR

To install the infrastructure you will need to have a scaleway account and the Scaleway CLI installed and setup

With the Scaleway CLI setup, you can then run the following command from the root of the repository:

    cd terraform
    terraform init
    terraform plan
    terraform apply -var="project_name=<your project name>" -var="project_id=<your project id>"

You must provide the Scaleway project id and a custom project name.

Terraform will ask you to validate the creation of the infrastructure, press yes and wait for the infrastructure to be created.

To connect to the instances follow the instructions in the last section of this page

Step by step

// TODO

Connection to the instances

To connect to the instances, we will use the public gateway that is configured with a ssh bastion.

To get the ips of the instances and the public gateway you can run the following command to get outputs from terraform:

    terraform output

To connect with ssh to any instance, you can use the following command:

ssh -J bastion@<public gateway_ip>:61000 root@<instance_ip>

Kubernetes installation with kubeadm

In this section, we will install a Kubernetes cluster using kubeadm.

The following steps are expected to be run on all nodes in the cluster. So first follow all the steps on the control plane node and then repeat the steps on all worker nodes.

So first ssh on the control plane node and then follow the steps.

Prepare the node

Before installing Kubernetes components, we need to prepare the node by enabling the required kernel network modules.

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

source

CRI (Container Runtime Interface)

Kubernetes need a container runtime to run containers. Kubernetes defines an interface called the Container Runtime Interface (CRI). The CRI is an interface which allows Kubernetes to use a wide variety of container runtimes, without the need to recompile Kubernetes.

Kubernetes support multiple CRI implementations, in this course we will use containerd that is the default CRI for Kubernetes.

Installing containerd

To install containerd we will need to add the docker repository to the package manager.

# Install required packages for https repository
sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl
# Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# Add Docker repository
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
# Update package manager index
sudo apt-get update

Now we can install containerd.

sudo apt-get install -y containerd.io

Now we need to configure containerd to use the systemd cgroup driver.

sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml

Find the section [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] in the /etc/containerd/config.toml file and change the SystemdCgroup value to true.

Finally, you can restart the containerd service.

sudo systemctl restart containerd

source

Test containerd installation

To test the installation of containerd we can run the following command.

sudo ctr images pull docker.io/library/hello-world:latest
sudo ctr run --rm docker.io/library/hello-world:latest hello-world
sudo ctr images rm docker.io/library/hello-world:latest

If you see the following output, then the installation is successful.

Hello from Docker!
This message shows that your installation appears to be working correctly.

Installing kubeadm, kubelet and kubectl

To install kubeadm, kubelet and kubectl we will use ubuntu package manager (apt).

# Install required packages for https repository
sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl gpg
# Add Kubernetes’s official GPG key
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
# Add Kubernetes repository
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
# Update package manager index
sudo apt-get update
# Install kubeadm, kubelet and kubectl with the exact same version or else components could be incompatible
sudo apt-get install -y kubelet=1.32.6-1.1 kubeadm=1.32.6-1.1 kubectl=1.32.6-1.1
# Hold the version of the packages
sudo apt-mark hold kubelet kubeadm kubectl

The last line is very important because we don't want the Kubernetes components to be updated automatically by the package manager when running apt-get upgrade.

Setting up control plane node

To init the control plane node with kubeadm simply ssh on the node and run the following command:

sudo kubeadm init

Wait for it to finish.

What does kubeadm init do?

The kubeadm init command does a lot of things:

Install the necessary packages on the system
Create a Kubernetes configuration file
Create static pods for the control plane components
Generate certificates and keys for the cluster
Generate a kubeconfig for the control plane components
Generate a token to be used by the worker nodes to join the cluster
Generate a kubeconfig for the admin user

Setup kubeconfig

To be able to use kubectl we need to setup the kubeconfig file. This configuration file is created by kubeadm and is located at /etc/kubernetes/admin.conf. This enables us to interact with our Kubernetes cluster, currently with only one node as the admin user.

Run the following commands to copy the configuration file to the default location and change the ownership of the file to the current user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Check that control plane is up

We now have a control plane node up and running. To check that everything is working, you can run the following command:

kubectl get nodes

You should see something like this:

NAME           STATUS      ROLES           AGE   VERSION
controlplane   NotReady    control-plane   10m   v1.25.0

The control plane node should be in the NotReady state yet. This is normal, we will fix this in the next section.

Install CNI plugin

The control plane node is not ready because the pods are not able to communicate with each other.

If you run the following command:

kubectl get pods --all-namespaces

You will see something like this:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   coredns-74ff55c5b-2q2q2                    0/1     Pending   0          10m
kube-system   coredns-74ff55c5b-4q4q4                    0/1     Pending   0          10m
kube-system   etcd-controlplane                          1/1     Running   0          10m
kube-system   kube-apiserver-controlplane                1/1     Running   0          10m
kube-system   kube-controller-manager-controlplane       1/1     Running   0          10m
kube-system   kube-proxy-2q2q2                           1/1     Running   0          10m
kube-system   kube-scheduler-controlplane                1/1     Running   0          10m

You can see that the coredns pods are not ready and are in the Pending state. This is because the pods are waiting for a network to be available.

To fix this, we need to install a Container Network Interface (CNI) plugin. A CNI plugin is a network plugin that will allow pods to communicate with each other.

We will use Weave Net in this course.

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/calico.yaml

Wait for the weave-net pods to be ready:

kubectl -n kube-system wait pod -l k8s-app=calico-kube-controllers --for=condition=Ready --timeout=-1s
kubectl get pods -l k8s-app=calico-kube-controllers -n kube-system

Check that control plane node is ready

To check that everything is working you can run the following command:

kubectl get nodes

You should see something like this:

NAME           STATUS   ROLES    AGE   VERSION
controlplane   Ready    master   10m   v1.25.0

The control plane node should be in the Ready state now.

We can also check that the coredns pod is now running:

kubectl get pods --all-namespaces

You should see something like this:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   coredns-74ff55c5b-2q2q2                    1/1     Running   0          10m
kube-system   etcd-controlplane                          1/1     Running   0          10m
kube-system   kube-apiserver-controlplane                1/1     Running   0          10m
kube-system   kube-controller-manager-controlplane       1/1     Running   0          10m
kube-system   kube-proxy-2q2q2                           1/1     Running   0          10m
kube-system   kube-scheduler-controlplane                1/1     Running   0          10m

Join worker nodes

In the next section, we will make our worker nodes join the cluster.

Setting up worker node

On the control plane node create a token to join the worker node to the cluster.

Run this command on the control plane node.

kubeadm token create --print-join-command

Copy the output of the command and run it on the worker node.

kubeadm join ...

Check that worker node is up

To check that the worker node is up and running you can run the following command on the control plane node:

kubectl get nodes

You should see something like this:

NAME           STATUS   ROLES                  AGE   VERSION
controlplane   Ready    control-plane,master   10m   v1.25.0
workernode     Ready    <none>                 10m   v1.25.0

We can add a label to add the worker role to the node.

kubectl label node <node-name> node-role.kubernetes.io/worker=worker

What about the CNI ?

For the installation of the control plane node we needed to install a CNI plugin for the node to be ready, but for the worker node we didn't. Can you explain why ?

Installation conclusion

You should now have a working Kubernetes cluster with a control plane node and two worker nodes. You can start playing with it and deploy your first application.

In the next section we will explain what we have done during the installation.

Understanding what we have done

So far, we have installed a Kubernetes cluster. We have also installed some tools to manage our cluster. In this section, we will try to understand what we have done.

You should have a Kubernetes cluster with 3 nodes. One node is the control plane node and the other two are worker nodes.

By running kubectl get nodes you should see something like this:

NAME           STATUS   ROLES                  AGE   VERSION
controlplane   Ready    control-plane,master   10m   v1.25.0
workernode     Ready    worker                 10m   v1.25.0
workernode2    Ready    worker                 10m   v1.25.0

What we have done

Let's review what we have done in the installation section.

Packages

We installed a bunch of tools. We installed kubeadm, kubectl, kubelet and containerd.

You should already know kubectl, it is the command line tool to manage Kubernetes and kubeadm is a tool to manage a Kubernetes cluster.

Now let's talk about kubelet and containerd. We already mentioned kubelet in the previous section. It is a service that runs on each node of the cluster and is responsible for running containers. To run containers, it uses a container runtime. In our case, we use containerd.

We will explain more about kubelet and containerd in the create a deployment section.

Kubeadm init

After installing all the packages on the control plane node, we ran kubeadm init and we ended up with a Kubernetes cluster with only one node in not ready state. That was quite easy, right?

Let's explain a bit what kubeadm init does:

It creates all needed certificates and keys that will be used to secure the cluster. (We will talk about that in the certificates section)
It creates all needed configuration files for the control plane components. Just like the configuration file /etc/kubernetes/admin.conf we used to connect to the cluster with kubectl all components need their own configuration file to interact with the cluster.
It creates a static pod manifest for the control plane components. We will talk about static pods in the static pods section.

That's the principals steps of kubeadm init. If you want more details, you can check the kubeadm init documentation.

We already explained about the CNI plugin in the installation section, so we will not talk about it here. If you want more details, you can check the CNI documentation.

Kubeadm join

After running kubeadm init on the control plane node, we ended up with a Kubernetes cluster with only one node. We need to add the other two nodes to the cluster.

To make a node join the cluster, we need to run kubeadm join with a token. The token is a secret that is used to authenticate the node to the cluster. The token is created when we run kubeadm init on the control plane node, but we can create tokens anytime we want like we did in the installation section.

In difference to kubeadm init, kubeadm join does less work since the control plane is already up and running. It will only create a kubelet configuration file and start the kubelet service with a secure identity for the node.

Static pods

We just installed our Kubernetes cluster, and we can already see some pods running in the kube-system namespace. But where did these pods come from ? How did they get created ? How did they get scheduled on the nodes ?

These pods are called static pods. Static pods are pods that are managed directly by the kubelet daemon.

What are static pods ?

As we said earlier, static pods are pods that are managed directly by the kubelet daemon. kubelet will watch a specific directory on the host file system every 20s (default value). If a file is created in this directory, kubelet will try to create a pod based on the file. If the file is deleted, kubelet will delete the pod.

The default static pod directory is /etc/kubernetes/manifests.

Run the following command to see the content of the static pod directory :

sudo ls /etc/kubernetes/manifests

This command should return the following output :

etcd.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml

You can check that these files match the pods that are running in the kube-system namespace :

kubectl get pods --namespace=kube-system

You can also have a look at one of these files :

sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml

Playing with static pods

Let's play a bit with static pods to understand how they work.

Let's try to destroy the kube-apiserver pod only with kubectl. To do that you will need to identify the name of the pod:

kubectl delete pod <kube-apiserver-pod-name> --namespace=kube-system

But if you check the pods again, you will see that the kube-apiserver pod is still running :

kubectl get pods --namespace=kube-system

Why ? Because kubelet is still watching the static pod directory and will recreate the pod if it is deleted.

Let's try to delete the kube-apiserver pod file :

# We only move the file to another location to be able to restore it later, what's important is that the file is deleted from the static pod directory
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml ~/kube-apiserver.yaml

Now the kube-apiserver is gone and how can we test that ? Try to run any kubectl command and you will endup with an error since kubectl can't contact the API server.

kubectl get pods --namespace=kube-system

Let's restore the file :

sudo mv ~/kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml

And now, the kube-apiserver pod is back (it can take a few minutes to come back) :

kubectl get pods --namespace=kube-system

We can also create a new pod file in the static pod directory :

sudo tee /etc/kubernetes/manifests/nginx.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
EOF

And now, the nginx pod is running :

kubectl get pods --namespace=default

Let's clean up :

sudo rm /etc/kubernetes/manifests/nginx.yaml

Conclusion

In this article, we saw how static pods work and where are located Kubernetes components manifests. Static pod are never used for anything else than managing these components, but you may have to modify their manifest.

Certificates

As we said previously kubeadm init generates a set of certificates. All these certificates are stored in /etc/kubernetes/pki. This directory is used by Kubernetes to store certificates and keys. But what are these certificates for ?

PKI (Public Key Infrastructure)

Kubernetes uses a PKI to secure the communication between components. PKI is a set of cryptographic tools that are used to generate, store and distribute certificates. The PKI used by Kubernetes is based on the PKI used by etcd. The PKI is composed of 3 main elements:

Certificate Authority(CA) certificates
Components certificates
Users certificates

Certificate Authority(CA) certificates

The first certificate generated is the CA certificate. CA mean Certificate Authority. It is the root certificate that is used to sign all the other certificates. That means that we can validate a certificate by checking if it has been signed by the CA certificate.

This certificate is stored in /etc/kubernetes/pki/ca.crt and the private key in /etc/kubernetes/pki/ca.key. The private key is used to sign the other certificates.

Components certificates

There is a pair of certificate and private key for each component of Kubernetes. The list of components is:

kube-apiserver
kube-controller-manager
kube-scheduler
kube-proxy
kubelet

Etcd certificates

Etcd has its own PKI. The list of certificates is:

etcd-ca
etcd-server
etcd-peer
etcd-healthcheck-client

Enable authentication for kubelet (advanced)

The kubelet is the agent that runs on each node. It is responsible for starting and stopping containers. It also exposes an API but by default it is not secured. This means that anyone can access the API. Your task is to secure kubelet by enabling certificate authentication.

Certificate management with kubeadm

source

Certificates expire after a given time. This time is called the certificate lifetime. The default lifetime for the component's certificates is 1 year and for the CA certificate is 10 years. The rotation of the certificates is done using kubeadm.

You can check the expiration date of the certificates using the following command:

kubeadm certs check-expiration

You can renew all the certificates using the following command:

kubeadm certs renew all

You can renew a specific certificate using the following command:

kubeadm certs renew <certificate-name>

Example for the kube-apiserver certificate:

kubeadm certs renew kube-apiserver

After renewing the certificates you need to restart the components that use them. To do so you can move the manifests of the static pods like we saw in the previous chapter.

Conclusion

Certificates are an important part of Kubernetes since they are the reasons we are able to have node to node communications over public networks. In the next chapter we will see in details what happens when we create a deployment.

Creating our first deployment

Now that we have a working cluster, we can deploy our first application. But you should already have deployed something on Kubernetes, right? So to make it more interesting, I will explain you what happens behind the scenes when you run kubectl create deployment nginx --image=nginx --replicas=3.

source

Client side

So let's get started. The firt things happening when you run kubectl create deployment nginx --image=nginx --replicas=3 is that the client will read the file and perform client side validation. It will check if the file is a valid YAML file and if it contains the required fields. If it doesn't, it will return an error.

Then the client will create an HTTP request for the API server that will contains the YAML object. To do so, it will read the kubeconfig file to get the API server URL and the certificate to authenticate to the API server.

Finally, the client will send the request to the API server.

API server

When the API server receive the request, it will first authenticate the client using the certificate. Then it will check if the request is authorized by checking the RBAC rules linked to the user. If the request is authorized, it will check if the object is valid. It will check if the object is a valid Kubernetes object and if it contains the required fields. If it doesn't, it will return an error.

As the last defense, the API server will check if the object is valid against the admission controllers. We won't go into details about the admission controllers but they are a set of plugins that can modify or reject objects. For example, the PodSecurityPolicy admission controller will check if the Pod is compliant with the PodSecurityPolicy.

At this point the request has been fully verified and the API server will store the object in the etcd database.

Controller manager

Now that the object is stored in the etcd database, the next step is to create the corresponding Kubernetes object. A Deployment is a collection of ReplicaSets and a ReplicaSet is a collection of Pods. So the controller manager will create the ReplicaSet and the Pods and to do so, it will use Kubernetes' built-in controllers.

A controller is a loop that will check if the desired state stored in etcd is the same as the current state of the cluster. If it's not, it will try to reconcile the two states. For example, if you have a Deployment with 3 replicas and you delete one of the Pods, the controller will notice that the desired state is 3 Pods and the current state is 2 Pods and it will create a new Pod to reach the desired state.

So after a Deployment is stored in etcd, it is detected by the Deployment controller that will detect the create event on the Deployment. It will then that there are no ReplicaSets associated with the Deployment and it will create one. It will work similarly for the ReplicaSet and the Pods. If you want to know more about the controllers, you can read the official documentation and this article about Kubernetes controllers.

Scheduler

After all the controllers have done their job, we have a Deployment, a ReplicaSet and a set of Pods stored in etcd but nothing is running on the nodes. To be more precise, the Pods are in the Pending state. The reason is that the Pods are not scheduled on any node.

The scheduler is responsible for scheduling the Pods on the nodes. It will check the Pod's requirements and try to find a node that can run the Pod. If it finds a node, it will update the Pod's spec.nodeName field and the Pod will be scheduled on the node.

For example, if you have a Pod that requires 1 CPU and 1GB of RAM, the scheduler will first filter the nodes with a series of predicates to assure they match the requirements of the Pod. Then it will rank the nodes with a series of priorities to find the best node. For example, it will try to find a node with the lowest number of Pods to reduce the risk of resource contention.

Again, if you want to know more about the scheduler, you can read the official documentation.

Kubelet

At this point, the Pods are scheduled on nodes with the spec.nodeName field set but they are still not running. And that's the job of kubelet.

As you know kubelet is the agent that runs on each node. It is responsible for translating the abstract Pod definition into a running container. To achieve that, it will query the API server to get the list of Pods by filtering on Pods that have the spec.nodeName field that correspond to the name of the node where the kubelet querying is running. Then it will detect change by comparing with its local cache. If there is a change, it will try to reconcile the two states.

For our example, the kubelet will see that the Pod is scheduled on the node and it will try to run it. To do so, it will use the container runtime to run the container. If the container is running, the Pod will be in the Running state.

Conclusion

Well, that's it. This is the whole process that happens when you run kubectl create deployment nginx --image=nginx --replicas=3. Of course we can always go deeper and explain what happens in each step but if you have understood this article, you should have a good understanding of how Kubernetes works.

Playing with the scheduler

In the last section we quickly saw that the scheduler assign the field nodeName in pod spec to make the pod run by kubelet. In this section we will play with the scheduler to see how it works.

Manual scheduling

We can manually schedule a pod by setting the nodeName field in the pod spec. But first to be sure that the scheduler won't do anything for us, we will remove the scheduler from the cluster:

sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp

You can check that the scheduler is not running anymore:

kubectl get pods -n kube-system

Now let's try to create a pod, we will first create the manifest to be able to modify it later:

kubectl run nginx --image=nginx --dry-run=client -o yaml > ~/nginx.yaml

And then create it:

kubectl apply -f ~/nginx.yaml

The pod is created but is stuck in Pending state:

kubectl get pods

We can see that the pod is waiting for a node to be assigned:

NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          42s

Now let's deploy a second pod, first edit the manifest file, change the name of the pod, add a nodeName field with the name of a worker node as value and finally apply it:

kubectl apply -f ~/nginx.yaml

The second pod will run without a problem while the other is still in pending state:

kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
nginx    0/1     Pending   0          14m
nginx2   1/1     Running   0          3m51s

Let's restore the scheduler:

sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests

And now the first pod while be scheduled:

kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
nginx    1/1     Running   0          15m
nginx2   1/1     Running   0          4m52s

We just proved that scheduling a pod is not magic, it only mean to set the nodeName field in the pod spec to assign the pod to a node. Of course the scheduler does not randomly choose a node, it has a logic to choose the best node for the pod and that's why it's an important component of the cluster.

Cleanup:

kubectl delete pod nginx nginx2

Node selector

Another way to schedule a pod on a specific node is to use a node selector. A node selector is a label that can be applied to a pod. If a node has a label that match the node selector, the scheduler will schedule the pod on that node.

Let's see how it works. First we will add a label to a node:

kubectl label node k8s-node-1 node-type=worker

Now let's create a pod with a node selector:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    node-type: worker
EOF

The pod is successfully created and running:

kubectl get pods -o wide

NAME      READY   STATUS    RESTARTS   AGE   IP           NODE
nginx     1/1     Running   0          2m    1.1.1.1      k8s-node-1

Cleanup:

kubectl delete pod nginx

Affinity and anti-affinity

Affinity and anti-affinity is a more advanced way to schedule a pod on a specific node. It allows to specify more complex rules to match a node. You can also specify if a rule is required or preferred that means that if the rule is not matched, the pod can still be scheduled on a node that doesn't match the rule.

You can also constrain scheduling based on labels on other pods running on the node rather than on labels on the node itself. This is called inter-pod affinity and anti-affinity.

Taints and tolerations

One of the most important feature of the scheduler is the ability to schedule pods on specific nodes. This is done by using taints and tolerations. A taint is a label that can be applied to a node. A toleration is a label that can be applied to a pod. If a node has a taint, the scheduler will not schedule any pod on it unless the pod has a toleration for that taint.

Let's see how it works. By default we can't schedule a pod on the control plane node, that's because the control plane node has a taint:

kubectl describe node <control-plane-node-name> | grep Taints

Let's try to create a pod that won't run on worker nodes but only on the control plane node:

cat<<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
    containers:
    - name: nginx
      image: nginx
    tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Equal
      value: ""
    nodeSelector:
      node-role.kubernetes.io/control-plane: ""
EOF

The pod is successfully created and running:

kubectl get pods

NAME      READY   STATUS    RESTARTS   AGE
nginx     1/1     Running   0          2m

Cleanup:

kubectl delete pod nginx

If we try to run the pod on the control plane node without the toleration, the pod will be stuck in Pending state:

cat<<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
    containers:
    - name: nginx
      image: nginx
    nodeSelector:
      node-role.kubernetes.io/control-plane: ""
EOF

kubectl get pods

NAME      READY   STATUS    RESTARTS   AGE
nginx     0/1     Pending   0          2m

Cleanup:

kubectl delete pod nginx

Pod topology spread constraints

Pod topology spread constraints allow you to constrain the distribution of pods across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can be used to achieve high availability as well as efficient resource utilization.

To test this feature, create a deployment with 4 replicas that will act the same way as a daemonset. That means that there will one pod per node and since we have only 3 nodes, the last pod will be in pending state.

Upgrading cluster version

I voluntarily made you install Kubernetes 1.32.0. This is because I wanted to show you how to upgrade a cluster. In this section, we will upgrade our cluster to Kubernetes 1.33.0.

Upgrade control plane

To upgrade the control plane, we will use the kubeadm upgrade command. This command will upgrade the control plane components and the kubelet.

We will first upgrade kubeadm itself.

We need to switch the Kubernetes package repository to the next minor version. To do this, we will edit the Kubernetes APT repository file.

sudo nano /etc/apt/sources.list.d/kubernetes.list

You should see a single line with the URL that contains your current Kubernetes minor version. For example, if you're using v1.32, you should see this:

deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /

Change the version in the URL to the next available minor release, for example:

deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /

Save the file and exit your text editor.

sudo apt-mark unhold kubeadm && \
sudo apt update && sudo apt remove -y kubeadm && sudo apt install -y kubeadm=1.33.2-1.1 && \
sudo apt-mark hold kubeadm

Now we can check if the upgrade is available.

sudo kubeadm upgrade plan

If you see the following output then you can upgrade your cluster.

[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/health] Checking API Server health: Healthy
[upgrade/health] Checking Node health: All Nodes are healthy
[upgrade/health] Checking Static Pod manifests exists on disk: All manifests exist on disk
[upgrade/health] Checking if control plane is Static Pod-hosted or Self-Hosted: Static Pod-hosted
[upgrade/health] Checking Static Pod manifests directory is empty: The directory is not empty
[upgrade/config] The configuration was checked to be correct:
[upgrade/config]      COMPONENT                 CURRENT        AVAILABLE
[upgrade/config]      API Server                v1.32.6        v1.33.2
[upgrade/config]      Controller Manager        v1.32.6        v1.33.2
[upgrade/config]      Scheduler                 v1.32.6        v1.33.2
[upgrade/config]      Kube Proxy                v1.32.6        v1.33.2
[upgrade/config]      CoreDNS                   1.8.0          1.8.0
[upgrade/config]      Etcd                      3.4.13-0       3.4.13-0
[upgrade/versions] Cluster version: v1.32.6
[upgrade/versions] kubeadm version: v1.33.2
[upgrade/versions] Latest stable version: v1.33.2
[upgrade/versions] Latest version in the v1.32 series: v1.32.6
[upgrade/versions] Latest experimental version: v1.34.0-alpha.0

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     1 x v1.32.6   v1.33.2

Now we can upgrade the cluster.

kubeadm upgrade apply v1.33.2

If the upgrade is successful, you should see the following output.

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.33.x". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

Before upgrading kubelet, we need to drain the control plane node.

kubectl drain <controlplane-node-name> --ignore-daemonsets

You can now upgrade kubelet

sudo apt-mark unhold kubelet && \
sudo apt update && sudo apt remove -y kubelet && sudo apt install -y kubelet=1.33.2-1.1 && \
sudo apt-mark hold kubelet

Restart kubelet

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Uncordon the control plane node

kubectl uncordon controlplane

Upgrade worker nodes

To upgrade the worker nodes the steps are similar to the control plane.

First upgrade kubeadm

Again first, we need to switch the Kubernetes package repository to the next minor version. To do this, we will edit the Kubernetes APT repository file.

sudo nano /etc/apt/sources.list.d/kubernetes.list

You should see a single line with the URL that contains your current Kubernetes minor version. For example, if you're using v1.32, you should see this:

deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /

Change the version in the URL to the next available minor release, for example:

deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /

Save the file and exit your text editor.

sudo apt-mark unhold kubeadm && \
sudo apt update && sudo apt remove -y kubeadm && sudo apt install -y kubeadm=1.33.2-1.1 && \
sudo apt-mark hold kubeadm

Then upgrade the node

sudo kubeadm upgrade node

If the upgrade is successful, you should see the following output.

[upgrade/successful] SUCCESS! Your node was upgraded to "v1.33.x". Enjoy!

Before upgrading kubelet, we need to drain the worker node, so go on the controlplane and run:

kubectl drain <worker-node-name> --ignore-daemonsets

You can now upgrade kubelet

sudo apt-mark unhold kubelet && \
sudo apt update && sudo apt remove -y kubelet && sudo apt install -y kubelet=1.33.2-1.1 && \
sudo apt-mark hold kubelet

Restart kubelet

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Uncordon the worker node

kubectl uncordon <worker-node-name>

Check that cluster is up

To check that the cluster is up and running you can run the following command on the control plane node:

kubectl get nodes

You should see something like this:

NAME            STATUS   ROLES                  AGE   VERSION
controlplane    Ready    control-plane,master   10m   v1.33.2
workernode      Ready    worker                 10m   v1.33.2
workernode2     Ready    worker                 10m   v1.33.2

Create a config for a developper

As a Kubernetes cluster administrator, you will need to create config files for the developpers of your company. This will allow them to access the cluster and deploy their applications.

The first step is to create a certificate for the developper. This certificate will be used to authenticate the developper to the cluster.

The certificate will be signed by the cluster CA. As we previously saw, the cluster CA is the certificate authority that is used to sign the certificates of the cluster components. The cluster CA is created when the cluster is created.

To create a certificate for a developper, you will need to create a certificate signing request (CSR). The CSR will be signed by the cluster CA. The CSR will contain the name of the developper. The name of the developper will be used to create a context for the developper.

Once you have successfully created the certificate, you will need to create a kubeconfig file for the developper. The kubeconfig file will contain the certificate of the developper and the address of the cluster. The kubeconfig file will be used by the developper to access the cluster.

Here is an example of a kubeconfig file for a developper named employee :

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: CA_LOCATION/ca.crt
    server: https://KUBERNETES_ADDRESS:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: employee
  name: employee
current-context: employee
users:
- name: employee
  user:
    client-certificate: DEVELOPER_CERTIFICATE_LOCATION/employee.crt
    client-key: DEVELOPER_CERTIFICATE_LOCATION/employee.key

In the client-certificate and client-key fields, you will need to replace DEVELOPER_CERTIFICATE_LOCATION with the location of the certificate of the developper. You can also directly copy the content of the employee.crt and employee.key files in the client-certificate and client-key fields.

Same things for the certificate-authority field. You will need to replace CA_LOCATION with the location of the cluster CA or you can also directly copy the content of the ca.crt file in the certificate-authority field.

In the server field, you will need to replace KUBERNETES_ADDRESS with the address of the cluster.

Once you have successfully created the kubeconfig file, you will need to give it to the developper. The developper will then be able to use the kubeconfig file to access the cluster.

Concrete example

Let's say that you are the cluster administrator of a cluster named kubernetes. You will need to create a certificate for a developper named employee. You will then need to create a kubeconfig file for the developper. You will then need to give the kubeconfig file to the developper.

To create the certificate, you will need to ssh into the control plane node. You will then need to run the following command :

openssl genrsa -out employee.key 2048

This command will create the private key of the developper. The private key will be used to sign the certificate signing request of the developper.