Infrastructure-as-Code for Personal Projects

A recent topic of discussion that appeared on Lobste.rs and Hacker News has been the use of the container orchestration tool, Kubernetes, to wrangle personal projects and websites. The initial entry that sparked this discussion was an article by Caleb Doxsey which pitched Kubernetes as an affordable platform for personal projects.

Kubernetes: The Surprisingly Affordable Platform for Personal Projects

Caleb Doxsey | doxsey.net

(Of course, the Lobste.rs thread has some interesting discussion from varying viewpoints.)

Following that, a counter-point laden response from Carlos Rodriguez appeared with some direct quote rebuttals and some discussion around project size, priorities and the general question of “is it worth it?”.

Kubernetes for personal projects? No thanks!

Carlos Rodriguez | carlosrdrz.es

In this article, I am going to approach the topic from the perspective of someone who runs a fair amount of “side projects” and focus heavily on automation rather than any specific application stack.

First of all, I should clarify what a “side project” is because it’s quite an overloaded term that people often use to describe a wide array of things. To me, a side project describes a semi-important project, and in this context, that project would be a website or a web service of some sort (a chat bot or an API resource are other examples). What I mean by “semi-important” is that it likely brings value to people other than yourself. Maybe it’s a Discord/Slack bot that performs a simple task, maybe it’s a simple aggregator that finds things then provides those things on an API endpoint. As well as these aspects, it also may either be ramen-profitable or a prospect for future profit. However, it’s still a side project, built in your free time.

If a service brings value to the world, it should probably be taken care of. Whether that’s bug fixes, updates or just a general availability of the service. These concerns, along with the pressures of working on the service in spare time, introduce the need for priorities to orchestrate everything.

I believe that someone that is paying 5$ a month to run a side project shouldn’t be concerned about infra (yet). I would recommend you focus on what you are building.

Carlos Rodriguez

I think this concept cannot be emphasised enough. This is the same argument, conceptually, as the role tools play in creative media. The tools are merely a platform for telling a story, a means to an end. The tools are important yes, but what matters is the content that the tools facilitate. Good tools make your job easier, great tools get out of your way completely so you can focus on what you’re good at. In this case, running web services, websites, chat bots, and whatever else requires infrastructure is the “content” and the tools like Kubernetes, Docker, Linux are merely tools.

With that said, I interpreted both articles as missing a greater point regarding infrastructure as a whole and that is how automation is the golden key hidden beneath buzzwords like Kubernetes, Ansible, Docker, etc. Kubernetes is probably too large for a couple of websites but that doesn’t mean your only option is storing everything in /var/www and using hastily written shell scripts to manage everything. There are a multitude of different tools available for doing this, some old and some new.

What I think readers should take away from the ideas discussed in both these articles is that projects that should be taken care of should be automated so the maker(s) can focus on making rather than managing. Infrastructure as code is the real golden key hidden beneath all of this.

​GitOps – Operations by Pull Request

Alexis Richardson | Weaveworks

GitOps, coined by Alexis Richardson of Weaveworks (?) is an engineering methodology that focuses on delegating menial tasks such as deployment, upgrades, downgrades and everything in between to automated processes which are driven entirely by configuration changes. And, as you might have guessed, configuration is stored in a Git repository so all changes to configuration go via commits. This gives you history, versioning, rollbacks and more all for free.

Are your servers PETS or CATTLE?

Simon Sharwood | The Register

Similar to the automation of applications themselves, the automation of the servers that the applications run on is equally, if not more important. The Cattle vs Pets concept deals with the ephemerality of the underlying hardware (or virtual hardware) on which services run. Kubernetes does a great job of providing an abstraction layer on top of this while tools such as Terraform can orchestrate the actual nodes. Despite being a little out of scope for this article, it deserves a mention as it’s definitely very relevant.

With all these automation tools gaining traction and providing really simple methods of deployment, it may seem like a great idea to spin up a Kubernetes cluster for a personal project. Now, if that project is very likely to grow and require more servers than that might be a good call to make. However, if you’ve decided that you may not need a cluster for your project there are plenty of tools that can satisfy the need for automation without resorting to hacky shell scripts and cron jobs. This strikes me as a “when you have a hammer, everything is a nail” situation, and I went through the exact same excitement when I first learnt Go, Docker and Kubernetes all within the same few months.

A quick side note regarding clusters: you can really get a ton of value out of a single server machine while also saving a lot of money. It’s out of the scope for this article but you do pay a price for managed hosting of VMs (also known as “the cloud”) both in terms of monetary cost and resources.

Now the article by Rodriguez covers some of this quite well. There is complexity involved in these tools and you might want to prioritise other things before adding the complexities of these tools to your projects. However, I do think some of the points swing too far in the other direction. rsync and ssh (among others) are the de-facto “moving things around” tools in the Linux world, and they do that job well. However, methods such as these are inherently human targeted. Sure, you can automate them with shell scripts but as Doxsey points out, those scripts can grow in complexity over time and my addition to that would be that they are not portable due to the fact they operate imperatively not declaratively.

So what’s this whole imperative versus declarative thing anyway? This is where GitOps, Cattle, Docker and all these tools and concepts collide in perfect operational symbiosis. In the context of general computing, “imperative” refers to a process of describing how a computer should perform a task by manipulating its state. In the other corner, “declarative” refers to effectively asking a computer for an end result without explicitly describing how it should arrive at that result. This is the fundamental difference between writing a PowerShell or Bash script and writing a Kubernetes manifest, Dockerfile, docker-compose.yml, etc.

That difference has more significance than just your development workflow and opinions about tooling. The difference between the sustainability and maintainability of these methods is, I’d argue, far more important than the tools themselves. A Bash or PowerShell script is authored by you, the person in control of the server(s) and the applications. This means you have to decide on, and maintain, the standards, patterns and underlying philosophy behind the method. This gives you an enormous amount of control over the process, but it can also be a burden. A declarative solution does a lot of that work for you and provides a canvas and a set of colours to play with. The advantage of this approach is that the method is standardised across the range of services you deploy. 

At the risk of stepping into anecdotal territory, I’ve always found configuring Linux applications a complete P.I.T.A. Lets look at Redis, Postgres and Nginx for examples. Redis is relatively simple as it only cares about a single configuration file, it uses a proprietary yet simple format that’s documented here with an example here. Now, this file usually exists somewhere in /etc. The /etc directory is a bit of a nightmare, as stated in the Linux System Administrators Guide:

The /etc maintains a lot of files. Some of them are described below. For others, you should determine which program they belong to and read the manual page for that program. Many networking configuration files are in /etc as well, and are described in the Networking Administrators’ Guide.

Linux System Administrators Guide | 3.3 The /etc directory

The first sentence is already an alarm bell. It’s basically a dumping ground for things you’re not quite sure where to chuck. And a naming convention? Forget about it, responsibility is delegated to the software distributor. On top of this, when invoking Redis, you may also specify a custom location for a configuration file, so if you happen to be working on a machine that is administrated by someone who, for some reason, configures this, you have no way — outside of find or asking the administrator — of finding the file.

Postgres also keeps its configuration file(s) in /etc (except when it doesn’t). Documentation for the configuration format is here which uses a subtly different format to Redis, instead of key value it uses key = value. Of course, Postgres also provides the ability to store configuration files in a custom location, which can be revealed by running SHOW config_file; as a query in the Postgres prompt.

Finally, Nginx, which in its documentation states:

By default, the configuration file is named nginx.conf and placed in the directory /usr/local/nginx/conf, /etc/nginx, or /usr/local/etc/nginx.

Nginx | Beginner’s Guide

Hold on a second, we’re given three possible locations with no reasons why and it’s up to the user to go figure out which one is true. Is this some sort of puzzle game? Snarky comments aside, I’m sure there’s a logical reason for this that ultimately comes down to the subtle differences between various UNIX operating systems (which, in itself is a user-experience nightmare).

The Nginx configuration format is significantly different to those of Redis and Postgres, which only differ slightly. Nginx requires heirarchical data in its configuration, which is fair enough given the use-cases Nginx is designed for. But, again, it’s one more configuration format to learn the possible locations of, the syntax and the subleties of.

To be absolutely clear, I’m not saying any of these problems are completely solved by using the aforementioned declarative tooling. These are problems that are very deeply rooted in both the applications and also the general philosophies that surround them (notably, UNIX). It also seems like a problem that hasn’t really been completely solved by any operating systems ever.

However, one of the benefits of deploying a stack of software via tools such as Kubernetes and Docker Compose, is that the majority of the most widely used configuration parameters are exposed in a standardised way: environment variables. In most situations, these will get you 80% of the way. To reiterate the original target of “side projects”, I would argue that these services most likely don’t rely on carefully tweaked databases and reverse proxies. Redis, Postgres, Nginx and many more industry standard components contain sane defaults and are ready for most jobs out-of-the-box. So when the stack is laid out in a Kubernetes manifest or a docker-compose.yml file, all you really need to care about are: hostnames, ports and credentials. Everything else can be worried about at a later date, when it starts to matter.

The fundamental point behind all of this still stands: the tools should get out of your way so you can create. The more time, effort and expense you spend on thinking about infrastructure is more time, effort and expense not spend on creating services that provide value to people.

1 Trackback or Pingback

Leave a Reply

Your email address will not be published. Required fields are marked *