Getting started with Kubernetes on Ubuntu

Post edit note: This is my excruciating journey migrating into the world of Kubernetes from a docker-compose based world. The end goal is setup where I can use traefik with both docker compose and kubernetes services. If you just want to get it done, I suggest you skip the end where I've provided working implementation. Otherwise, you can enjoy my raw notes and learnings in full.

I've reached a infrastructure maturity point where I'm beginning to run in to challenges with managing multiple containers. I'm currently running my applications with docker compose (it's okay for my use case since I'm serving small internal tools with a user community of less than a 100, also the applications are not mission critical).

I'm also running in to limitations on how quickly I can develop and deploy applications and push updates. The current deployment process is very manually where I manually create docker-compose file with my services. Every time I want to make an update, I will have to build the docker image and push it to my image repository and then re-run docker compose. I want to be able to run my development workflow on my desktop, and then push the code to a git repository which should automatically build and deploy the new container (CI/CD). I have zero previous experience with this kind of workflow but after a lot of research and conversations with people who have worked with such workflows, I have decided I need to migrate from docker compose to Kubernetes. Of course I have some unique challenges in my environment, it's an air-gapped server which is going to be the only node and serve as both master and worker node.

I have setup kubernetes on the Ubuntu server using snap microk8s package.

I was able to get the dashboard running using the instruction on https://microk8s.io/docs/addon-dashboard

Some of the things I'm struggling with at this point are:

  • How do I setup a custom SSL certificate for the UI dashboard?
  • How do I setup traefik as ingress controller?

Perhaps just setting this up and exposing the dashboard service via it will kill two birds with one stone.

I have setup traefik dashboard to work with Kubernetes using the scripts available at https://docs.traefik.io/v2.0/user-guides/crd-acme/

I saved the yamls in different files and used kubectl apply -f {filename}. Also removed the Let's Encrypt stuff.

  • Configure SSL certificates for this setup of traefik
  • Configure IngressRoutes to point to a service running within the cluster

Note: I had to docker-compose down the existing traefik2 to release the existing ports. However, 443 was still not bindable, possibly because some other container is using it after traefik shut down. Edit: Port forwarding 443 requires root account privileges to work

Configuring TLS Certificates

The setting up certificates required understanding of some new Kubernetes concepts:

  • ConfigMap - an object that can store configuration independent of the pod
  • Secret - an object to store secrets such as TLS certificate, keys, auth tokens etc

In the end, I only needed to use the Secret to be able to apply TLS to the connection. The Kubernetes documentation about secrets on was pretty clear on how secrets work. I essentially renamed my certificate and associated private key to tls.crt and tls.key and read them as files to create single secret like so:

kubectl create secret generic sub-domain-cert --from-file=./tls.crt --from-file=./tls.key

Subsequently, I saved the secret manifests so I can apply them easily in future like so:

kubectl get secret root-domain-cert -o yaml > root-domain-cert_secret.yml

Eventually in the ingress route manifest, added the tls option in the end referencing the correct certificate:

    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRoute
    metadata:
        name: ingressroutetls
        namespace: default
    spec:
        entryPoints:
        - websecure
        routes:
        - match: Host(`my.domain.com`)
        kind: Rule
        services:
        - name: whoami
            port: 80
    tls:
        secretName: root-domain-cert

Migrating existing docker compose services

With TLS and ingress routes configuration all set up, now I'm looking at how to migrate existing docker compose files. Luckily, there's something called kompose that can convert docker compose files to appropriate Kubernetes manifests. It might not be perfect but it's a start.

Most of my docker-compose files have some dependency on volumes. Kompose conversions created some persistence volume claim files. However this is not a topic I'm familiar with on Kubernetes. Let's head to the documentation.

For persistence volume claim example, this is a good walkthrough.

I was not able to run ghost using this as there were errors relating to PersistentVolume. Will have to revisit.

Image repository

The next day when I inspected kubectl get pods, some of the pods were ImagePullBackOff. I discovered that these images were being pulled directly from docker hub and in the absence of internet connectivity, the pods were recreated at some point and there was an attempt to pull the images. This is resolved by:

  • pushing local image copies using microk8s.ctr
  • reference the image name correctly in the deployment manifest
  • setting the imagePullPolicy: Never in the deployment manifest

I resolved the situation for an existing container using these steps and have to modify the service and ingress manifests accordingly but it worked.

Reference: https://microk8s.io/docs/registry-images

Continuing with the migration

The two things I need to become comfortable with are:

  • Persistence Volumes in k8s
  • Inter container networking and communication in k8s

Persistent Volume

Creating a deployment without a persistence volume claim made the ghost blog run, however, I'm fairly confident this is a volume that will recreate if the pod recreates. So it doesn't solve my problem.

When I add persistentVolumeClaim, i get pod has unbound immediate PersistentVolumeClaims.

The difference between the two is as follows:

      volumes:
      - name: content
        persistentVolumeClaim:
          claimName: blog-content
This gives an error pod has unbound immediate PersistentVolumeClaims

vs.

      volumes:
      - name: content
This one works

Reference: https://www.bogotobogo.com/DevOps/Docker/Docker_Kubernetes_PersistentVolumes_PersistentVolumeClaims.php

Using the instruction on the blog post above, I was able to create a Persistence Volume of type hostPath which is a local directory. The steps involved are:

  1. Create Persistent Volume of type hostPath
  2. Create PersistentVolumeClaim (PVC)
  3. Mount PVC to deployment pod spec
  4. Bind PVC to container volume
I still haven't tested if the volume data will persist. Based on the document, it should.

Possibly good reference resource: https://igy.cx/posts/setup-microk8s-rbac-storage/

Inter-container networking and communication

Getting traffic into your Kubernetes Cluster

So far, I had been manually using kubectl port-forward to route traffic from my host into the k8s cluster. I wanted to find out how we can setup the service so that I don' t have to do that.

I googled extensively for the last few hours to figure this out but to no avail. I eventually figured it out and this is what I learned:

  • Kubernetes is made to be used with common cloud providers. Generally when you want to expose a service externally, you will set the service type to be LoadBalancer. k8s has code that will automatically provision a load balancer on the cloud service. This is the most common way people deploy their apps on the internet.
  • However, for on-prem, there is very little support. There is something called MetalLB that is a load balancer that can be deployed on prem.
  • There is also a more simpler solution using externalIPs. This is the one I ended up using. The spec looks something like this:
spec:
  ports:
    - protocol: TCP
      name: web
      port: 8000
    - protocol: TCP
      name: admin
      port: 8080
    - protocol: TCP
      name: websecure
      port: 443
      targetPort: 4443
  selector:
    app: traefik
  externalIPs:
    - <my-ip-address>

Using Traefik with both docker and Kubernetes

Unfortunately, porting all the docker-compose applications to k8s will require some time to setup. I don't currently have this kind of time so this will need to happen later. Now I want to have a setup where I can use both docker and k8s application running at the same time. Traefik works with multiple providers so I thought this should be pretty straight forward. I enabled the k8s provider in my main traefik container that's running on docker-compose. Required just a mild setup where you have to specify the k8s api endpoint, security token, and the k8s certificate that treafik can use to validate the k8s api server. This was just basic configuration in the traefik.toml file to add a new provider.

Getting kubernetes certificate and configuring it for external traefik instance

Kubernetes API uses TLS for encryption. In order for traefik to talk to kubernetes and get routes and services, it needs to have the TLS certificate to validate k8s identity. The path the the certificate file on the kubernetes host is described at https://microk8s.io/docs/ports. From there, you need to mount the certificate to your traefik2 instance as a volume and configure traefik.yaml to read that certificate. The configurations are as follows:

version: '3'

services:
  traefik:
    container_name: traefik
    image: traefik:v2.1
    ports:
      - "80:80"
      - "8080:8080"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - '$PWD/traefik.yaml:/traefik.yaml'
      - './ca.crt:/ca.crt' #This is the k8s certificate
providers:
  kubernetesCRD:
    endpoint: https://<address-of-kubernetes-master-node-with-api-server>:16443
    token: <long-token-string>
    certAuthFilePath: /ca.crt
    throttleDuration: 10s
    namespaces:
      - default
      - kube-system
Traefik.yml "providers" section configuration related to kubernetes

Traefik provides 2 ways to connect. KubernetesIngress and KubernetesCRD. The ingress setup seemed quick. Just create an ingress resource on k8s and this shows up as a service and route in the traefik2 dashboard. However, when I got the the url, I always get 404. I attempted to debug this for a few hours but couldn't figure out why the k8s service was not being served even though traefik was seeing it. I then attempted to switch to KubernetesCRD setup. However, I ran in to an error.

After hours of research, trying different things...

Nuggets of understanding begin to emerge. Initially, kubernetesCRD was not working because TraefikService was not found. Combing through the documentation (and different versions of it), I found that my installation did not have the CustomResourceDefinition of TraefikService. Applying this configuration via kubectl was quick and then my traefik configuration at least started working. With the IngressRoutes defined, the traefik instance outside k8s was able to read the configuration and create routers and services for it dynamically. However, when I actually broswed to the URL I was getting a 404. Meanwhile the external traefik logs showed that the websecure entryPoint was not defined. I checked and rechecked the deployment and service configurations of the ingress controller multiple times. Deleted the entire configuration kubectl delete -f ingress-controller.yaml and recreated it from scratch (i.e. copy/paste) from the source on the traefik website. All to no avail. At this point I wanted to inspect what was going on in the traefik controller insider the k8s cluster. kubectl logs <traefik-pod-name> told me that there was some issues with reading k8s secrets. The web i.e. http entry point was working and serving a whoami instance perfectly. Howver it's tls cousin was not. I could see the secrets are defined, the service account used by traefik-ingress-controller has the correct permissions setup, but there was error reading the secrets. On top of that the websecure entry point was still not defined.

I can see from traefik-ingress-controller logs that the websecure entry point is created.

time="2020-02-11T17:07:00Z" level=debug msg="Start TCP Server" entryPointName=web
time="2020-02-11T17:07:00Z" level=debug msg="Start TCP Server" entryPointName=traefik
time="2020-02-11T17:07:00Z" level=debug msg="Start TCP Server" entryPointName=websecure
Note: the paramater --log.leve=debug was passed to traefik-ingress-controller
time="2020-02-11T17:07:01Z" level=error msg="Error configuring TLS: secret default/sub-domain-cert does not exist" providerName=kubernetescrd ingress=ingressroutetls namespace=default
TLS configuration error from traefik-ingress-controller

A stroke of insight occurs. The websecure entryPoint not defined is only being emitted by the external traefik instance. That instance does not have a websecure entryPoint. It has a secure endpoint. I swiftly change the CRD configuration to make the entry point secure and violia! It works! Finally!

But where is it reading the TLS certificate from. Certainly, traefik-ingress-controller cannot find the secret. I experiment with two different version of CRD configurations with TLS options.

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: ingressroutetls
  namespace: default
spec:
  entryPoints:
    - secure
  routes:
  - match: Host(`test.domain.net`) && PathPrefix(`/tls`)
    kind: Rule
    services:
    - name: whoami
      port: 80
  tls:
    secretName: sub-domain-cert
    passthrough: false
IngressRoute version 1
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: ingressroutetls
  namespace: default
spec:
  entryPoints:
    - secure
  routes:
  - match: Host(`test.domain.net`) && PathPrefix(`/tls`)
    kind: Rule
    services:
    - name: whoami
      port: 80
  tls: true
IngressRoute version 2 - DOES NOT WORK

Both versions work! So, I assume that only the external instance trafik is being used for TLS configuration. That's fine for my purposes.

I ran some more experiments to see what is the minimum configuration required from configuring traefik-ingress-controller. Apparently version 2 of the IngressRoute does not work. There is no options named tls: true. Silly me for thinking that could be the case, when I know computers are very very specific about what they will allow or not allow. However, my further experiments ended up breaking the setup so I did the entire setup again. Turns out, in my yaml for IngressRoutes, I had messed up the tabs for tls entry and it become a sub entry of match vs. spec which fucked things up.

I'm going to try the minimum configuration again and come up with a clean setup. But I have had enough of this for today.


Ok so after a little lunch break, I did do the clean config and success! Very pleasant experience. Took me just a minute or so and its good to go. Time to refine the config.

And a minute later... SUCCESS ladies and gentlemen! As I suspected, there's no need for a k8s service for the traefik-ingress-controller to work. Next, I wanna see how much tighter can I make this config. I don't like the entry points passed as command line options to the traefik-ingress-controller. Since we're not accessing them from the outside, I don't see there should be any need for it.

As a I suspected, there wasn't any need for those configs. Actually, do we even need that traefik-ingress-controller deployment at all in k8s? Let's find out!

Haha. Funny. It still works. So the traefik-ingress-controller inside k8s was completely unnecessary. So this must mean the serviceAccount must also not be required. Tackling that next. And yes, that hunch was right.

Btw if you're wondering how I'm doing these experiments, it's basically cycling through the following commands and commenting out configuration in traefik-ingress-controller.yaml.

#Get rid of previous configuration
kubectl delete -f traefik-ingress-controller.yml

#Check that I get a 404 message to ensure the service went down
curl https://blog.domain.net

#Make changes to the configuration and reapply
kubectl apply -f traefik-ingress-controller.yml

#Create kubernetes CRD IngressRoutes
kubectl apply -f kubernetesIngressRoutes.yml

#Check the service is back online, this usually takes a few seconds
curl https://blog.domain.net

Some other experiments

So I've added some new routes, it seems to work fine. I also attempted to remove the route.services.port from the IngressRoute definition to see if traefik will auto-configure the port. But sadly, this doesn't seem to be the case. There's maybe another way to do that which I haven't discovered yet. I also tested with configuring it to the wrong port and it doesn't work as expected. Something silly like that could take time to debug because you pressed one wrong key.

The final configuration

If you want to configure a traefik 2 instance outside of the k8s cluster to connect with services inside the cluster you only need to do the following.

Create and execute the following yaml file using kubectl apply -f traefik-ingress.yml

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutes.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRoute
    plural: ingressroutes
    singular: ingressroute
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutetcps.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRouteTCP
    plural: ingressroutetcps
    singular: ingressroutetcp
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: middlewares.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: Middleware
    plural: middlewares
    singular: middleware
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tlsoptions.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TLSOption
    plural: tlsoptions
    singular: tlsoption
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: traefikservices.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TraefikService
    plural: traefikservices
    singular: traefikservice
  scope: Namespaced

Second, create a new file where all your routes will be configured. Not that I'm passing the tls.passthrough option just to enable HTTPS. I have no idea what the passthrough:false does. The only reason I did it is because we need to pass something to the tls parameter to enable tls.

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: blog2
  namespace: default
spec:
  entryPoints:
    - secure
  routes:
  - match: Host(`blog.domain.net`)
    kind: Rule
    services:
    - name: blog
      port: 2368
  tls:
    passthrough: false
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: tabula
  namespace: default
spec:
  entryPoints:
    - secure
  routes:
  - match: Host(`tab.domain.net`)
    kind: Rule
    services:
    - name: tabula
      port: 8080
  tls:
    passthrough: false
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: kubernetes
  namespace: default
spec:
  entryPoints:
    - secure
  routes:
  - match: Host(`kub.domain.net`)
    kind: Rule
    services:
    - name: kubernetes
      port: 443
  tls:
    passthrough: false

The above demonstrates three different subdomain routes to services within the cluster.

Conclusion

This was quite a ride. I started as a complete noob for k8s. I initially wanted to migrate all my docker compose services to k8s but soon figured out that that will take a bit more longer than I am willing to spend. Then the idea struck. Traefik works with multiple providers (usually container orchestration systems). Hence, I decided to use my existing traefik instance on my bare metal host system running as a docker compose service and use it for both docker compose services and k8s services. Initially I thought that we need to run a traefik ingress controller within k8s as well as one outside for it work. Subsequently I discovered that this is not the case, we only need to setup traefik 2's kubernetes IngressRoutes in the k8s cluster and these can be used by the external traefik instance to route traffic.

This journey took me about 7 working days so hopefully it will help some save that time. I'm going to hopefully write a cleaner verison of this setup.

Resources: