Getting TimescaleDB running on Kubernetes for internal use

Introduction

As a fun use of the data ShopMon generates, I wrote a quick program that adds that data to a PostgreSQL database for looking at some interesting metrics, usage-by-hour, area, etc. While on a run I was listening to a podcast where the topic of time series databases came up and realized I had a perfect use case as the sensor data ShopMon creates is time-specific along with the location of the sensor. Hey, let's learn about time series databases!

Enter TimescaleDB

The first thing I did to get the lay of the land is to read the Wikipedia article on time series databases, which I found surprisingly thin but the discussion page turned out to have a lot more content and, surprisingly, more information as the authors and editors were arguing back and forth about what constituted a valid time series database and in the course of reading some of the more heated comments, learned about TimescaleDB and the first thing I saw was that it was a time series database built on top of PostgreSQL. Well, I love PostgreSQL and thought I'd give this a try; I have no practical experience with any other time series database so why not start with this one?

Hey, I built a Kubernetes Cluster!

I have recently come into posession of three retired Dell PowerEdge servers, two 2950s and a R900 (which, at 4U, is a beast to handle by oneself). I decided to dedicate these machines to running Kubernetes pretty much exclusively, so with ESXi installed as the hypervisor and with a bunch of VMs running Linux with Docker, I installed Kubernetes (specifically the Rancher distribution) and after deploying a couple of test pods figured that I would get my TimescaleDB database running on it. Oh, and there's even a GitHub repository for getting it running on Kubernetes!

Disclaimer

I'm using this Kubernetes cluster for learning and experimentation so I apologize in advance if there's something that screams "not-good-practice"; I'm still trying to figure out what those are (and what the best practices are).

Installation

The first thing I did was clone the repo and have a look around. I opted to use the "timescale-single" version as I figured to keep it as simple as possible initially.

Preparing the Chart

Apparently Helm charts have become the defacto-standard way to install software on Kubernetes and admittedly there's a lot to like in terms of standardizing deployments. TimescaleDB had an interesting "pre-chart" step that creates certificates and users for the database and puts them in the right spot for Helm to pick up.

Kustomize

Within the timescaledb-single directory, I entered:

./generate_kustomization.sh ztdb

where ztdb is the name of the Kubernetes workload (why ztdb? Zesty Time Database)

No LoadBalancer

One thing I'm kinda meh about is how it seems Kubernetes seems really geared towards being used in cloud environments (which, coming from Google, isn't exactly a huge surprise) but in my case I have, essentially, my own private cloud where I don't want to make anything public.
So when a Load Balancer is specified, deployments essentially get hung up forever waiting for the cloud provider to give it a public IP that will never come. So, in values.yaml I went to the "loadBalancer" section and set enabled to false. According to the comment, this is cool to do and the primary node will be exposed as a Headless Service. Gonna admit right here that I get the idea of a headless service, but what that means for accessing the database from external tools was something to figure out later.

Persistent Volumes and Volume Claims

TL;DR: The Kubernetes TimescaleDB deployment requires six persistent volumes to be ready to be bound.

I set up a separate VM with a large disk to act as an NFS server (all the physical machines use RAID-10 and I installed a number of the VMs on an EqualLogic iSCSI-based SAN I found, also in RAID-10).

On my NFS server I created six directories and exported them so I could set them up as Persistent Volumes (this where I use the Rancher interface to do all the heavy lifting for me). The TimescaleDB deployment would create the volume claims all by itself, so I just provided the volumes and that was it.

Installation

Okay, so now to do the actual installation. In the the timescaledb-single directory I ran:

helm install ztdb . --set service.type=NodePort

The ztdb must match the name of the workload used in the Kustomize step above.

The chart will print out a bunch of information about getting the PGPASSWORD_POSTGRES, PGPASSWORD_ADMIN, and MASTERPOD info. You'll definitely want this later on so once the deployment is finished (it took about ten minutes on my cluster), run the commands and get those values.

Now for the sketchy part

This is where it get confusing; I was able to connect to the database via the headless server (all documented by the stuff printed to the screen during the deployment), but I need to connect to the database from my laptop or other machine via a "public" (i.e. non Kubernetes-pod) IP address; everything is LAN-based (in my case, a 10.150 network) and if I can't connect to the server via DBVisualizer or some other tool, then there isn't much point to any of this 🙃.

Get the IP of the node running the master

Remember during the installation when the chart told you how to get the $MASTERPOD? In my case, it prints pod/ztdb-timescaledb-0. Back in the Rancher interface, under Workloads, I have "ztdb-timescaledb" which, when clicked, shows the three pods, one of which is "ztdb-timescaledb-0". Clicking on that pod brings up info about it, and one of the things we learn is what host it's on and more importantly, the host's IP address (in my case, 10.150.9.107).

Fiddling with Services

Rancher, interestingly, has two interfaces, the "Cluster Manager" interface which I guess is meant to be more informational, dashboard-y, and the "Cluster Explorer" interface which is where you can really get your hands dirty (and presumably do a lot of damage). In the Cluster Explorer interface, select Services and find the service with the same name as the deployment, in my case ztdb (there will also be, for me, ztdb-config and ztdb-replica but we don't need to fiddle with those).

Selecting "ztdb", the details of the service are shown, including a link to "Listener IPs". There is the Cluster IP and nothing else. Well, we can't connect to the Cluster IP so we're going to add an External IP.
Clicking on the Three Dots™ button in the upper right and selecting "Edit as Form", the first thing you'll see is a warning that the service is managed by the Helm app and that changes will be overwritten the next time the app is changed. Okay, that's fair, duly noted. Click on "Listener IPs" and click Add under External IPs, this is where we add the IP address of the node running the master (note: it may be possible to try use other Node IPs, I haven't tried that one yet).

Click Save and now we have an external IP that can access the database. Ah, but we're not quite done!

Setting up DBVisualizer et al

So now we should be able to connect to the database using database tools. First thing besides the IP address, what's the username and password? The password for the postgres user is defined via PGPASSWORD_POSTGRES as was mentioned during the installation. This is going to be a random string so might as well copy and paste it into the password field. The port is the standard PostgreSQL port of 5432 and if you attempt to connect, you'll see that it won't work because it is refusing non-SSL connections.

Setting up the SSL connection

SSL is enabled by default in the values.yaml file as part of the database configuration. You can see it's also part of the connection parameters where hostnossl is set to reject. All this can be changed beforehand, but since we have a database that only wants to talk via SSL, we will need to use the SSL certificate.

Where is the certificate? Back in the chart directory, there is a subdirectory called kustomize that has another subdirectory that is the name of the deployment (again, for me, ztdb). In that directory you'll find the tls.crt file that was generated and that is the SSL cert you'll need for all connections to the database.

Setting up the connection to use SSL varies from tool to tool; in my case I want to use DBVisualizer so under the connection properties there is a "Driver Properties" entry that has all the various driver properties that typically don't need to be set in regular ol' configurations. In our case, sslmode needs to be set to verify-ca and sslrootcert is where you set the path and filename of the certificate contents (it doesn't need to be tls.crt).

With these two properties applied, the tool should now be able to properly connect to the database and you're good to go!