Migrating HA Kubernetes Cluster from Rocky Linux 8 to Rocky Linux 9

Kubernetes homelab migration to the latest version of Rocky Linux.

The Upgrade Plan

We are going to upgrade our Kubernetes homelab nodes from Rocky 8 to Rocky 9.

We have a cluster of six nodes, three control planes and three worker nodes, all of which are KVM guests running Rocky 8.

We will upgrade the control plane nodes first, one at a time using Packer images and Ansible playbooks, and then upgrade the worker nodes, also one at a time, using the same approach.

This is a lengthy but zero-downtime process, and does not require re-building the Kubernetes cluster from scratch. Note that will not be upgrading Kubernetes version.

Software version before the upgrade:

  1. Rocky 8
  2. Containerd 1.6
  3. Kubernetes 1.26
  4. Calico 3.25
  5. Istio 1.17

Software versions after the upgrade:

  1. Rocky 9
  2. Containerd 1.6
  3. Kubernetes 1.26
  4. Calico 3.25
  5. Istio 1.17

SELinux is set to enforcing mode.

Configuration Files

For Packer setup, see Github repository here.

For Ansible playbooks, see GitHub repository here.

Cluster Information

$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION              CONTAINER-RUNTIME
srv31   Ready    control-plane   347d   v1.26.4   10.11.1.31    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.20
srv32   Ready    control-plane   347d   v1.26.4   10.11.1.32    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.20
srv33   Ready    control-plane   477d   v1.26.4   10.11.1.33    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.20
srv34   Ready    none            477d   v1.26.4   10.11.1.34    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.20
srv35   Ready    none            347d   v1.26.4   10.11.1.35    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.20
srv36   Ready    none            477d   v1.26.4   10.11.1.36    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.20

Build a Rocky 9 KVM Image with Packer

First of all, we need to build Rocky 9 KVM image using Packer.

$ git clone https://github.com/lisenet/kubernetes-homelab.git
$ cd ./packer
$ PACKER_LOG=1 packer build ./rocky9.json

Upgrade the First Control Plane Node

We will start with srv31.

Drain and Delete Control Plane from Kubernetes Cluster

Drain and delete the control plane from the cluster:

$ kubectl drain srv31
$ kubectl delete node srv31

Make sure the node is no longer in the Kubernetes cluster:

$ kubectl get nodes
NAME    STATUS   ROLES           AGE    VERSION
srv32   Ready    control-plane   347d   v1.26.4
srv33   Ready    control-plane   477d   v1.26.4
srv34   Ready    none            477d   v1.26.4
srv35   Ready    none            347d   v1.26.4
srv36   Ready    none            477d   v1.26.4

The cluster will remain operational as long as the other two control planes are online.

Delete Control Plane from Etcd Cluster

Etcd will have a record of all three control plane nodes. We therefore have to delete the control plane node from the Etcd cluster too.

$ kubectl get pods -n kube-system -l component=etcd -o wide
NAME         READY   STATUS    RESTARTS     AGE   IP           NODE    NOMINATED NODE   READINESS GATES
etcd-srv32   1/1     Running   4 (2d ago)   20d   10.11.1.32   srv32   none             none
etcd-srv33   1/1     Running   4 (2d ago)   20d   10.11.1.33   srv33   none             none

Query the cluster for the Etcd members:

$ kubectl exec etcd-srv32 \
  -n kube-system -- etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/etcd/peer.crt \
  --key /etc/kubernetes/pki/etcd/peer.key \
  member list
c36952e9f5bf4f49, started, srv33, https://10.11.1.33:2380, https://10.11.1.33:2379, false
c44657d8f6e7dea5, started, srv31, https://10.11.1.31:2380, https://10.11.1.31:2379, false
e279a8288f4be237, started, srv32, https://10.11.1.32:2380, https://10.11.1.32:2379, false

Delete the member for control plane srv31:

$ kubectl exec etcd-srv32 \
  -n kube-system -- etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/etcd/peer.crt \
  --key /etc/kubernetes/pki/etcd/peer.key \
  member remove c44657d8f6e7dea5
Member c44657d8f6e7dea5 removed from cluster 53e3f96426ba03f3

Delete Control Plane KVM Guest

SSH into the hypervisor where the control plane server is running, and stop the VM:

$ ssh [email protected] "virsh destroy srv31-master"
Domain 'srv31-master' destroyed

Delete the current KVM snapshot (it’s the one from the previous Kubernetes upgrade):

$ ssh [email protected] "virsh snapshot-delete srv31-master --current"

Delete the control plane server image, including its storage:

$ ssh [email protected] "virsh undefine srv31-master --remove-all-storage"
Domain srv31-master has been undefined
Volume 'vda'(/var/lib/libvirt/images/srv31.qcow2) removed.

Create a Rocky Linux Control Plane KVM Guest

Copy Rocky 9 image that was built with Packer to the hypervisor for srv31:

$ scp ./packer/artifacts/qemu/rocky9/rocky9.qcow2 [email protected]:/var/lib/libvirt/images/srv31.qcow2

Provision a new srv31 control plane KVM guest:

$ virt-install \
  --connect qemu+ssh://[email protected]/system \
  --name srv31-master \
  --network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:31 \
  --disk path=/var/lib/libvirt/images/srv31.qcow2,size=32 \
  --import \
  --ram 4096 \
  --vcpus 2 \
  --os-type linux \
  --os-variant centos8 \
  --sound none \
  --rng /dev/urandom \
  --virt-type kvm \
  --wait 0

Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment.

$ git clone https://github.com/lisenet/homelab-ansible.git
$ cd ./homelab-ansible
$ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub [email protected]
$ ansible-playbook ./playbooks/configure-k8s-hosts.yml

Prepare Kubernetes Cluster for Control Plane Node to Join

SSH into a working control plane node, srv32, and re-upload certificates:

$ ssh [email protected] "kubeadm init phase upload-certs --upload-certs"
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
d6c4506ef4f1150686b05599fe7019b3adcf914eaaba3e602a3e0d8f8efd0a78

Print the join command on the same control plane node:

$ ssh [email protected] "kubeadm token create --print-join-command"
kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1

SSH into the newly created control plane srv31 and join the Kubernetes cluster:

$ ssh [email protected] \
  "kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll \
  --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1  \
  --control-plane \
  --certificate-key d6c4506ef4f1150686b05599fe7019b3adcf914eaaba3e602a3e0d8f8efd0a78"

Restart kubelet on srv31:

$ ssh [email protected] "systemctl restart kubelet"

Check cluster status:

$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
srv31   Ready    control-plane   11m    v1.26.4   10.11.1.31    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv32   Ready    control-plane   347d   v1.26.4   10.11.1.32    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv33   Ready    control-plane   477d   v1.26.4   10.11.1.33    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv34   Ready    none            477d   v1.26.4   10.11.1.34    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv35   Ready    none            348d   v1.26.4   10.11.1.35    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv36   Ready    none            477d   v1.26.4   10.11.1.36    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20

We have our very first control plane running on Rocky 9.

Repeat the process for the other two control planes, srv32 and srv33.

Do not proceed further until you upgrade all control planes:

$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
srv31   Ready    control-plane   89m    v1.26.4   10.11.1.31    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv32   Ready    control-plane   32m    v1.26.4   10.11.1.32    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv33   Ready    control-plane   52s    v1.26.4   10.11.1.33    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv34   Ready    none            477d   v1.26.4   10.11.1.34    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv35   Ready    none            348d   v1.26.4   10.11.1.35    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv36   Ready    none            477d   v1.26.4   10.11.1.36    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
$ kubectl -n kube-system get pods -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES
calico-kube-controllers-57b57c56f-l9ps5    1/1     Running   0          24m     192.168.134.2    srv31   none             none  
calico-node-4x79f                          1/1     Running   0          164m    10.11.1.31       srv31   none             none  
calico-node-54c25                          1/1     Running   0          29m     10.11.1.32       srv32   none             none  
calico-node-7fmzb                          1/1     Running   1          9d      10.11.1.36       srv36   none             none  
calico-node-hvh28                          1/1     Running   0          4m39s   10.11.1.33       srv33   none             none  
calico-node-p5vkt                          1/1     Running   1          9d      10.11.1.35       srv35   none             none  
calico-node-stfm6                          1/1     Running   1          9d      10.11.1.34       srv34   none             none  
coredns-787d4945fb-9dq4q                   1/1     Running   0          110m    192.168.134.1    srv31   none             none  
coredns-787d4945fb-k67rx                   1/1     Running   0          24m     192.168.134.3    srv31   none             none  
etcd-srv31                                 1/1     Running   0          157m    10.11.1.31       srv31   none             none  
etcd-srv32                                 1/1     Running   0          26m     10.11.1.32       srv32   none             none  
etcd-srv33                                 1/1     Running   0          4m36s   10.11.1.33       srv33   none             none  
kube-apiserver-srv31                       1/1     Running   6          164m    10.11.1.31       srv31   none             none  
kube-apiserver-srv32                       1/1     Running   4          29m     10.11.1.32       srv32   none             none  
kube-apiserver-srv33                       1/1     Running   0          4m38s   10.11.1.33       srv33   none             none  
kube-controller-manager-srv31              1/1     Running   0          164m    10.11.1.31       srv31   none             none  
kube-controller-manager-srv32              1/1     Running   0          29m     10.11.1.32       srv32   none             none  
kube-controller-manager-srv33              1/1     Running   0          4m38s   10.11.1.33       srv33   none             none  
kube-proxy-5d25q                           1/1     Running   0          4m39s   10.11.1.33       srv33   none             none  
kube-proxy-bpbrc                           1/1     Running   0          29m     10.11.1.32       srv32   none             none  
kube-proxy-ltssd                           1/1     Running   1          9d      10.11.1.36       srv36   none             none  
kube-proxy-rqmk6                           1/1     Running   0          164m    10.11.1.31       srv31   none             none  
kube-proxy-z9wg2                           1/1     Running   2          9d      10.11.1.35       srv35   none             none  
kube-proxy-zkj8c                           1/1     Running   1          9d      10.11.1.34       srv34   none             none  
kube-scheduler-srv31                       1/1     Running   0          164m    10.11.1.31       srv31   none             none  
kube-scheduler-srv32                       1/1     Running   0          29m     10.11.1.32       srv32   none             none  
kube-scheduler-srv33                       1/1     Running   0          4m38s   10.11.1.33       srv33   none             none  
metrics-server-77dff74649-lkhll            1/1     Running   0          146m    192.168.135.194  srv34   none             none  

Upgrade Worker Nodes

We will start with srv34.

Drain and Delete Worker Node from Kubernetes Cluster

$ kubectl drain srv34 --delete-emptydir-data --ignore-daemonsets
$ kubectl delete node srv34

Make sure the node is no longer in the Kubernetes cluster:

$ kubectl get nodes
NAME    STATUS   ROLES           AGE   VERSION
srv31   Ready    control-plane   89m   v1.26.4
srv32   Ready    control-plane   32m   v1.26.4
srv33   Ready    control-plane   52s   v1.26.4
srv35   Ready    none            348d  v1.26.4
srv36   Ready    none            477d  v1.26.4

Stop the server:

$ ssh [email protected] "virsh destroy srv34-node"
Domain srv34-node destroyed

Delete the current snapshot:

$ ssh [email protected] "virsh snapshot-delete srv34-node --current"

Delete the server, including its storage:

$ ssh [email protected] "virsh undefine srv34-node --remove-all-storage"
Domain srv34-node has been undefined
Volume 'vda'(/var/lib/libvirt/images/srv34.qcow2) removed.

Create a Rocky Linux Worker Node KVM Guest

Copy Rocky 9 image that was built with Packer to the hypervisor for srv34:

$ scp ./packer/artifacts/qemu/rocky9/rocky9.qcow2 [email protected]:/var/lib/libvirt/images/srv34.qcow2

Provision a new srv34 worker node KVM guest:

$ virt-install \
  --connect qemu+ssh://[email protected]/system \
  --name srv34-node \
  --network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:34 \
  --disk path=/var/lib/libvirt/images/srv34.qcow2,size=32 \
  --import \
  --ram 8192 \
  --vcpus 4 \
  --os-type linux \
  --os-variant centos8 \
  --sound none \
  --rng /dev/urandom \
  --virt-type kvm \
  --wait 0

Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment:

$ cd ./homelab-ansible
$ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub [email protected]
$ ansible-playbook ./playbooks/configure-k8s-hosts.yml

SSH into the newly created worker node srv34 and join the Kubernetes cluster:

$ ssh [email protected] \
  "kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll \
  --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1 "

Restart kubelet on srv34:

$ ssh [email protected] "systemctl restart kubelet"

Check cluster status:

$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
srv31   Ready    control-plane   109m   v1.26.4   10.11.1.31    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv32   Ready    control-plane   52m    v1.26.4   10.11.1.32    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv33   Ready    control-plane   21m    v1.26.4   10.11.1.33    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv34   Ready    none            38s    v1.26.4   10.11.1.34    none          Rocky Linux 9.2 (Blue Onyx)        5.14.0-284.18.1.el9_2.x86_64   containerd://1.6.20
srv35   Ready    none            348d   v1.26.4   10.11.1.35    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20
srv36   Ready    none            477d   v1.26.4   10.11.1.36    none          Rocky Linux 8.7 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.6.20

Repeat the process for the other two worker nodes, srv35 and srv36.

The end result should be all nodes running Rocky 9:

$ kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,KERNEL:.status.nodeInfo.kernelVersion
NAME    VERSION   OS-IMAGE                      KERNEL
srv31   v1.26.4   Rocky Linux 9.2 (Blue Onyx)   5.14.0-284.18.1.el9_2.x86_64
srv32   v1.26.4   Rocky Linux 9.2 (Blue Onyx)   5.14.0-284.18.1.el9_2.x86_64
srv33   v1.26.4   Rocky Linux 9.2 (Blue Onyx)   5.14.0-284.18.1.el9_2.x86_64
srv34   v1.26.4   Rocky Linux 9.2 (Blue Onyx)   5.14.0-284.18.1.el9_2.x86_64
srv35   v1.26.4   Rocky Linux 9.2 (Blue Onyx)   5.14.0-284.18.1.el9_2.x86_64
srv36   v1.26.4   Rocky Linux 9.2 (Blue Onyx)   5.14.0-284.18.1.el9_2.x86_64

Leave a Reply

Your email address will not be published. Required fields are marked *