Getting started with Kubernetes on Ubuntu
Post edit note: This is my excruciating journey migrating into the world of Kubernetes from a docker-compose based world. The end goal is setup where I can use traefik with both docker compose and kubernetes services. If you just want to get it done, I suggest you skip the end where I've provided working implementation. Otherwise, you can enjoy my raw notes and learnings in full.
I've reached a infrastructure maturity point where I'm beginning to run in to challenges with managing multiple containers. I'm currently running my applications with docker compose (it's okay for my use case since I'm serving small internal tools with a user community of less than a 100, also the applications are not mission critical).
I'm also running in to limitations on how quickly I can develop and deploy applications and push updates. The current deployment process is very manually where I manually create docker-compose file with my services. Every time I want to make an update, I will have to build the docker image and push it to my image repository and then re-run docker compose. I want to be able to run my development workflow on my desktop, and then push the code to a git repository which should automatically build and deploy the new container (CI/CD). I have zero previous experience with this kind of workflow but after a lot of research and conversations with people who have worked with such workflows, I have decided I need to migrate from docker compose to Kubernetes. Of course I have some unique challenges in my environment, it's an air-gapped server which is going to be the only node and serve as both master and worker node.
I have setup kubernetes on the Ubuntu server using snap microk8s package.
I was able to get the dashboard running using the instruction on https://microk8s.io/docs/addon-dashboard
Some of the things I'm struggling with at this point are:
- How do I setup a custom SSL certificate for the UI dashboard?
- How do I setup traefik as ingress controller?
Perhaps just setting this up and exposing the dashboard service via it will kill two birds with one stone.
I have setup traefik dashboard to work with Kubernetes using the scripts available at https://docs.traefik.io/v2.0/user-guides/crd-acme/
I saved the yamls in different files and used kubectl apply -f {filename}. Also removed the Let's Encrypt stuff.
- Configure SSL certificates for this setup of traefik
- Configure IngressRoutes to point to a service running within the cluster
Note: I had to docker-compose down the existing traefik2 to release the existing ports. However, 443 was still not bindable, possibly because some other container is using it after traefik shut down. Edit: Port forwarding 443 requires root account privileges to work
Configuring TLS Certificates
The setting up certificates required understanding of some new Kubernetes concepts:
- ConfigMap - an object that can store configuration independent of the pod
- Secret - an object to store secrets such as TLS certificate, keys, auth tokens etc
In the end, I only needed to use the Secret to be able to apply TLS to the connection. The Kubernetes documentation about secrets on was pretty clear on how secrets work. I essentially renamed my certificate and associated private key to tls.crt and tls.key and read them as files to create single secret like so:
kubectl create secret generic sub-domain-cert --from-file=./tls.crt --from-file=./tls.key
Subsequently, I saved the secret manifests so I can apply them easily in future like so:
kubectl get secret root-domain-cert -o yaml > root-domain-cert_secret.yml
Eventually in the ingress route manifest, added the tls option in the end referencing the correct certificate:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: ingressroutetls
namespace: default
spec:
entryPoints:
- websecure
routes:
- match: Host(`my.domain.com`)
kind: Rule
services:
- name: whoami
port: 80
tls:
secretName: root-domain-cert
Migrating existing docker compose services
With TLS and ingress routes configuration all set up, now I'm looking at how to migrate existing docker compose files. Luckily, there's something called kompose that can convert docker compose files to appropriate Kubernetes manifests. It might not be perfect but it's a start.
Most of my docker-compose files have some dependency on volumes. Kompose conversions created some persistence volume claim files. However this is not a topic I'm familiar with on Kubernetes. Let's head to the documentation.
For persistence volume claim example, this is a good walkthrough.
I was not able to run ghost using this as there were errors relating to PersistentVolume. Will have to revisit.
Image repository
The next day when I inspected kubectl get pods
, some of the pods were ImagePullBackOff
. I discovered that these images were being pulled directly from docker hub and in the absence of internet connectivity, the pods were recreated at some point and there was an attempt to pull the images. This is resolved by:
- pushing local image copies using microk8s.ctr
- reference the image name correctly in the deployment manifest
- setting the
imagePullPolicy: Never
in the deployment manifest
I resolved the situation for an existing container using these steps and have to modify the service and ingress manifests accordingly but it worked.
Reference: https://microk8s.io/docs/registry-images
Continuing with the migration
The two things I need to become comfortable with are:
- Persistence Volumes in k8s
- Inter container networking and communication in k8s
Persistent Volume
Creating a deployment without a persistence volume claim made the ghost blog run, however, I'm fairly confident this is a volume that will recreate if the pod recreates. So it doesn't solve my problem.
When I add persistentVolumeClaim, i get pod has unbound immediate PersistentVolumeClaims
.
The difference between the two is as follows:
vs.
Using the instruction on the blog post above, I was able to create a Persistence Volume of type hostPath
which is a local directory. The steps involved are:
- Create Persistent Volume of type hostPath
- Create PersistentVolumeClaim (PVC)
- Mount PVC to deployment pod spec
- Bind PVC to container volume
I still haven't tested if the volume data will persist. Based on the document, it should.
Possibly good reference resource: https://igy.cx/posts/setup-microk8s-rbac-storage/
Inter-container networking and communication
Getting traffic into your Kubernetes Cluster
So far, I had been manually using kubectl port-forward
to route traffic from my host into the k8s cluster. I wanted to find out how we can setup the service so that I don' t have to do that.
I googled extensively for the last few hours to figure this out but to no avail. I eventually figured it out and this is what I learned:
- Kubernetes is made to be used with common cloud providers. Generally when you want to expose a service externally, you will set the service type to be LoadBalancer. k8s has code that will automatically provision a load balancer on the cloud service. This is the most common way people deploy their apps on the internet.
- However, for on-prem, there is very little support. There is something called MetalLB that is a load balancer that can be deployed on prem.
- There is also a more simpler solution using externalIPs. This is the one I ended up using. The spec looks something like this:
spec:
ports:
- protocol: TCP
name: web
port: 8000
- protocol: TCP
name: admin
port: 8080
- protocol: TCP
name: websecure
port: 443
targetPort: 4443
selector:
app: traefik
externalIPs:
- <my-ip-address>
Using Traefik with both docker and Kubernetes
Unfortunately, porting all the docker-compose applications to k8s will require some time to setup. I don't currently have this kind of time so this will need to happen later. Now I want to have a setup where I can use both docker and k8s application running at the same time. Traefik works with multiple providers so I thought this should be pretty straight forward. I enabled the k8s provider in my main traefik container that's running on docker-compose. Required just a mild setup where you have to specify the k8s api endpoint, security token, and the k8s certificate that treafik can use to validate the k8s api server. This was just basic configuration in the traefik.toml file to add a new provider.
Getting kubernetes certificate and configuring it for external traefik instance
Kubernetes API uses TLS for encryption. In order for traefik to talk to kubernetes and get routes and services, it needs to have the TLS certificate to validate k8s identity. The path the the certificate file on the kubernetes host is described at https://microk8s.io/docs/ports. From there, you need to mount the certificate to your traefik2 instance as a volume and configure traefik.yaml to read that certificate. The configurations are as follows:
version: '3'
services:
traefik:
container_name: traefik
image: traefik:v2.1
ports:
- "80:80"
- "8080:8080"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- '$PWD/traefik.yaml:/traefik.yaml'
- './ca.crt:/ca.crt' #This is the k8s certificate
Traefik provides 2 ways to connect. KubernetesIngress and KubernetesCRD. The ingress setup seemed quick. Just create an ingress resource on k8s and this shows up as a service and route in the traefik2 dashboard. However, when I got the the url, I always get 404. I attempted to debug this for a few hours but couldn't figure out why the k8s service was not being served even though traefik was seeing it. I then attempted to switch to KubernetesCRD setup. However, I ran in to an error.
After hours of research, trying different things...
Nuggets of understanding begin to emerge. Initially, kubernetesCRD was not working because TraefikService was not found. Combing through the documentation (and different versions of it), I found that my installation did not have the CustomResourceDefinition of TraefikService. Applying this configuration via kubectl was quick and then my traefik configuration at least started working. With the IngressRoutes defined, the traefik instance outside k8s was able to read the configuration and create routers and services for it dynamically. However, when I actually broswed to the URL I was getting a 404. Meanwhile the external traefik logs showed that the websecure
entryPoint was not defined. I checked and rechecked the deployment and service configurations of the ingress controller multiple times. Deleted the entire configuration kubectl delete -f ingress-controller.yaml
and recreated it from scratch (i.e. copy/paste) from the source on the traefik website. All to no avail. At this point I wanted to inspect what was going on in the traefik controller insider the k8s cluster. kubectl logs <traefik-pod-name>
told me that there was some issues with reading k8s secrets. The web
i.e. http entry point was working and serving a whoami
instance perfectly. Howver it's tls cousin was not. I could see the secrets are defined, the service account used by traefik-ingress-controller
has the correct permissions setup, but there was error reading the secrets. On top of that the websecure
entry point was still not defined.
I can see from traefik-ingress-controller
logs that the websecure
entry point is created.
A stroke of insight occurs. The websecure
entryPoint not defined is only being emitted by the external traefik instance. That instance does not have a websecure
entryPoint. It has a secure
endpoint. I swiftly change the CRD configuration to make the entry point secure
and violia! It works! Finally!
But where is it reading the TLS certificate from. Certainly, traefik-ingress-controller
cannot find the secret. I experiment with two different version of CRD configurations with TLS options.
Both versions work! So, I assume that only the external instance trafik is being used for TLS configuration. That's fine for my purposes.
I ran some more experiments to see what is the minimum configuration required from configuring traefik-ingress-controller
. Apparently version 2 of the IngressRoute does not work. There is no options named tls: true
. Silly me for thinking that could be the case, when I know computers are very very specific about what they will allow or not allow. However, my further experiments ended up breaking the setup so I did the entire setup again. Turns out, in my yaml
for IngressRoutes, I had messed up the tabs for tls
entry and it become a sub entry of match
vs. spec
which fucked things up.
I'm going to try the minimum configuration again and come up with a clean setup. But I have had enough of this for today.
Ok so after a little lunch break, I did do the clean config and success! Very pleasant experience. Took me just a minute or so and its good to go. Time to refine the config.
And a minute later... SUCCESS ladies and gentlemen! As I suspected, there's no need for a k8s service for the traefik-ingress-controller
to work. Next, I wanna see how much tighter can I make this config. I don't like the entry points passed as command line options to the traefik-ingress-controller
. Since we're not accessing them from the outside, I don't see there should be any need for it.
As a I suspected, there wasn't any need for those configs. Actually, do we even need that traefik-ingress-controller
deployment at all in k8s? Let's find out!
Haha. Funny. It still works. So the traefik-ingress-controller
inside k8s was completely unnecessary. So this must mean the serviceAccount must also not be required. Tackling that next. And yes, that hunch was right.
Btw if you're wondering how I'm doing these experiments, it's basically cycling through the following commands and commenting out configuration in traefik-ingress-controller.yaml
.
#Get rid of previous configuration
kubectl delete -f traefik-ingress-controller.yml
#Check that I get a 404 message to ensure the service went down
curl https://blog.domain.net
#Make changes to the configuration and reapply
kubectl apply -f traefik-ingress-controller.yml
#Create kubernetes CRD IngressRoutes
kubectl apply -f kubernetesIngressRoutes.yml
#Check the service is back online, this usually takes a few seconds
curl https://blog.domain.net
Some other experiments
So I've added some new routes, it seems to work fine. I also attempted to remove the route.services.port
from the IngressRoute
definition to see if traefik will auto-configure the port. But sadly, this doesn't seem to be the case. There's maybe another way to do that which I haven't discovered yet. I also tested with configuring it to the wrong port and it doesn't work as expected. Something silly like that could take time to debug because you pressed one wrong key.
The final configuration
If you want to configure a traefik 2 instance outside of the k8s cluster to connect with services inside the cluster you only need to do the following.
Create and execute the following yaml file using kubectl apply -f traefik-ingress.yml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ingressroutes.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: IngressRoute
plural: ingressroutes
singular: ingressroute
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ingressroutetcps.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: IngressRouteTCP
plural: ingressroutetcps
singular: ingressroutetcp
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: middlewares.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: Middleware
plural: middlewares
singular: middleware
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: tlsoptions.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: TLSOption
plural: tlsoptions
singular: tlsoption
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: traefikservices.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: TraefikService
plural: traefikservices
singular: traefikservice
scope: Namespaced
Second, create a new file where all your routes will be configured. Not that I'm passing the tls.passthrough
option just to enable HTTPS. I have no idea what the passthrough:false
does. The only reason I did it is because we need to pass something to the tls parameter to enable tls.
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: blog2
namespace: default
spec:
entryPoints:
- secure
routes:
- match: Host(`blog.domain.net`)
kind: Rule
services:
- name: blog
port: 2368
tls:
passthrough: false
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: tabula
namespace: default
spec:
entryPoints:
- secure
routes:
- match: Host(`tab.domain.net`)
kind: Rule
services:
- name: tabula
port: 8080
tls:
passthrough: false
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: kubernetes
namespace: default
spec:
entryPoints:
- secure
routes:
- match: Host(`kub.domain.net`)
kind: Rule
services:
- name: kubernetes
port: 443
tls:
passthrough: false
The above demonstrates three different subdomain routes to services within the cluster.
Conclusion
This was quite a ride. I started as a complete noob for k8s. I initially wanted to migrate all my docker compose services to k8s but soon figured out that that will take a bit more longer than I am willing to spend. Then the idea struck. Traefik works with multiple providers (usually container orchestration systems). Hence, I decided to use my existing traefik instance on my bare metal host system running as a docker compose service and use it for both docker compose services and k8s services. Initially I thought that we need to run a traefik ingress controller within k8s as well as one outside for it work. Subsequently I discovered that this is not the case, we only need to setup traefik 2's kubernetes IngressRoutes in the k8s cluster and these can be used by the external traefik instance to route traffic.
This journey took me about 7 working days so hopefully it will help some save that time. I'm going to hopefully write a cleaner verison of this setup.
Resources:
- https://www.digitalocean.com/community/tutorials/how-to-migrate-a-docker-compose-workflow-to-kubernetes
- https://kubernetes.io/docs/concepts/services-networking/service/#external-ips
- https://medium.com/@JockDaRock/kubernetes-metal-lb-for-on-prem-baremetal-cluster-in-10-minutes-c2eaeb3fe813
- https://medium.com/@JockDaRock/metalloadbalancer-kubernetes-on-prem-baremetal-loadbalancing-101455c3ed48
- https://kubernetes.github.io/ingress-nginx/deploy/baremetal/
- https://itnext.io/routing-external-traffic-into-your-kubernetes-services-part-2-7d1289178671
- https://medium.com/@maniankara/kubernetes-tcp-load-balancer-service-on-premise-non-cloud-f85c9fd8f43c
- https://blog.nobugware.com/post/2019/advanced-traefik-2-0-with-kubernetes/
- https://blog.nobugware.com/post/2019/traefik-2-0-with-kubernetes/
- https://medium.com/kubernetes-tutorials/deploying-traefik-as-ingress-controller-for-your-kubernetes-cluster-b03a0672ae0c
- https://www.alibabacloud.com/blog/how-to-configure-traefik-for-routing-applications-in-kubernetes_594720