Upgrading Oracle Kubernetes Engine
In a previous post I explained how Oracle Cloud's generous free tier allows you to run a 4-node Kubernetes cluster free of charge. However, there are one or two "gotchas" where there is the potential for costs to be encountered, one of which is during the process of upgrading cluster nodes. However, first you need to upgrade your cluster control plane.
Upgrading the control plane
Usually the first warning you get that an upgrade is available is when Oracle send out an email announcing that a new Kubernetes version is now available within OKE. When you log in to the OKE interface and examine your cluster, you will see the following:
Note that there is a warning triangle next to the current version (v1.26.2) which indicates that an update is available. Click on your cluster, and at the top of the page you will see the "New Kubernetes version available" button is active:
Click on this button and you will begin the cluster upgrade process. Review the information and the warnings, and notice that there is a drop down at the bottom with the available Kubernetes versions to which you can upgrade your cluster (v1.27.2 in my case):
When you are ready to proceed, click the "Upgrade" button at the bottom. The upgrade process takes a few minutes, but when it is complete you will see that your Kubernetes version has been updated to the selected version and that the warning triangle has been replaced by a green check mark.
Upgrading the OKE node pool
Upgrading the control plane does not automatically upgrade your Node Pool or the nodes contained within it. You must upgrade the Node Pool after the control plane in order to then have the ability to upgrade your nodes.
To do so, within your cluster configuration go to "Node pools" and you should see your pool with your four worker nodes. You will also see that the Kubernetes version is out of date, and that you have the same warning triangle to inform you that an upgrade is available:
Click on your node pool ("pool1" in my case) and from the buttons at the top, click "Edit":
Within the edit dialog you have the option to change the version of the cluster. The "Version" drop-down should include the new Kubernetes version to which you just upgraded your control plane (v1.27.2 in my case):
Set the version to the same as your control plane (again v1.27.2 in my case). When you select the version the "shape" of your nodes will also be updated to upgrade your OS image to the latest version or Oracle Linux.
Save the changes. It takes around 10 minutes for Oracle to upgrade your node pool to the selected version, but when it's completed you will see that the warning triangle on your node pool has again been replaced by a green check mark.
Upgrading Nodes
Upgrading your control plane and node pool is pretty straightforward. Upgrading the nodes themselves, however, can be tricky and it's at this point that you might incur some cost if you are unlucky or not careful enough.
Within your node pool you should see the list of nodes that are currently online within your cluster. Next to each you should see the now-familiar warning triangle, telling you that the Kubernetes version on your nodes is now out-of-date:
The process up upgrading them is pretty straightforward - you just delete the old node, and then OKE will automatically provision a new node to replace it - and this new node will, thanks to your node pool settings, be on the updated version of Kubernetes.
However, just deleting the nodes might cause issues for your workloads if they are not set up to handle node failure. A more controlled method is to cordon and drain your nodes before deleting them, to make sure that your workloads are shifted to the other nodes gracefully. Technically OKE will both cordon and drain your nodes first before deleting them, but you might want to take more control over the process by doing it manually, depending on how important your workloads are. To do so, run the following commands via kubectl using the IP address of the node you plan to delete:
kubectl cordon 10.0.10.226
kubectl drain 10.0.10.226 --delete-local-data --ignore-daemonsets --force
Then, when your node is cordoned and drained, delete it using the drop down in the UI:
Delete the node and confirm. Your node will be deleted and a new one will begin to be created:
After 5-10 minutes the new replacement node will be online and ready for workloads, and the deleted node will completely disappear:
Simply repeat the cordon-drain-delete process for the remaining 3 machines that need to be upgraded one by one. Eventually, all of your new nodes should be up and running with the new version.
That is the process completed, but where does the problem lie? How is it possible to incur cost?
The Problem
An issue seems to arise when the boot volumes of some of your VMs are not correctly marked as "Always Free", despite the total amount of disk being within the 200GB free tier limits. I think this may occur because at points during the upgrade process you will be running 5 nodes as OKE will surge an additional one to replace the one you are deleting. Somehow the boot volume of the newer nodes never gets re-evaluated and marked as an "Always Free" resource.
This issue is evident if, after completing the entire upgrade process, you go to Storage > Block Volumes > Boot Volumes, and see something like this:
Note how only three of the boot volumes are marked as "Always Free" whilst the fourth is a chargeable.
The Fix
Thankfully the fix is relatively straightforward. You simply need to find the machine that the rogue, chargeable boot volume relates to, and delete it from your Kubernetes cluster in the same way that we did during the upgrade process.
In my case, the rogue volume corresponded to the machine 10.0.10.10 in my cluster. Just cordon, drain and delete this machine and it will be recreated automatically by OKE. The new node's boot volume should then rightfully be marked as "Always Free" and you will continue to enjoy a fully free service.
Block storage is billed on 24-hour increments so if you catch this quickly (i.e. on the same day that your carry out the upgrade) then you should not be charged.
Having completed this process you should now see all of your boot volumes correctly marked as "Always Free":
Conclusion
Hopefully this post has given you everything you need to properly upgrade your OKE cluster to the newest available version of Kubernetes. It's quite a slow process and not without its pitfalls, but I believe it's still a great resource to have for labbing and learning and I remain a huge fan of OCI's free tier.