Use Self-Hosted GitHub Actions Runners

GitHub released self-hosted Actions runners some time ago. The gist is this: instead of using GitHub-provided machines to run your Actions workflows, you use a computer you control instead. This can be hugely powerful for enterprises and at-home nerds alike. I’m going to share two use-cases that I came up with today.

Downloading and setting up the runner is fast and easy. You can add them to a single repo or to whole organizations. Unfortunately, there’s not currently a means of setting a self-hosted runner for a user account – users can only add them to individual repos.

DISCLAIMER: Security on self-hosted runners is up to you. Every workflow you execute on your machine has all the access your user does unless you isolate the runner properly. Proceed at your own risk!

Use case 1: Full-text Instapaper archive

Instapaper has been around over 10 years now, and I have been a user for almost all of that time. I have thousands of saved articles with highlights and notes that I don’t want to lose. I developed a Go program to pull down my saved articles and generate a Jekyll site: each article is a post with its content from the Instapaper Full-text API endpoint, and each article has 2 data files, one for metadata like “did I star this?” and one with a list of my highlights. A script builds the Jekyll site, then syncs it to a home server where it is served with nginx. Since the home server is running Tailscale, I can easily go to https://hostname/instapaper-archive to view every article I have ever saved.

All this works well, but it requires that I remember to run this script on my laptop when I have read some articles on Instapaper, and it is closely tied to state on my laptop. Logs are not persisted anywhere. It has all the problems you’d expect with a personal program you’re running by hand. So, how can I automate this? A self-hosted GitHub Actions runner, of course!

GitHub Actions allows you to run workflows on a schedule. Using Cron syntax, you can schedule a workflow to run as often as every 5 minutes. The beauty of a self-hosted runner is that I control the environment: I control the network, I control the disk, and I control the workflow. (There’s a way to fix the network bit by using tailscale/github-action to setup an ephemeral node, but why eat into my Actions minutes when I can use an idle computer in my home instead?)

I setup the workflow to run the archiver, run jekyll, then rsync the resulting site to my home server for later browsing. Since the archiver works incrementally, I setup a directory outside the working directory that I could write to. This allows me to more easily maintain state between runs.

The key pieces of this workflow are:

on:
  schedule:
  # Run every day at 5:30 and 17:30 UTC.
  - cron: '30 5,17 * * *'

and

runs-on: self-hosted

If you have a number of self-hosted Actions runners on your repo, you can tag them with whatever you like and target just a subset:

# Finds runners tagged with both "self-hosted" and "mytag"
runs-on: [self-hosted, mytag]

It’s nice to have this kind of scheduled work automated for me running on a computer I trust. But is there anything more critical you can use self-hosted runners for?

Use case 2: Scheduled Puppet runs

As a self-professed nerd working in the server space for a few years, I maintain a Puppet repo to configure my personal servers. It’s worked well for me for over 5 years now. I have a cron job on each host that runs puppet apply. This seems to work, but there’s no transparency and it wastes CPU by running every 5 minutes. If I ran a self-hosted runner on each host instead, I could run Puppet on push to my main branch, instead of waiting for the cron job to run, and I could see the puppet run’s logs much more easily.

I gave this a try on my home server first (since it’s easier to recover a host you have physical access to), and it worked like a charm. I configured the runner as root (eek!) and installed a systemd service with ./svc.sh install && ./svc.sh start. I wrote a workflow to pull the latest changes and run puppet apply in my checkout of the repo on each host, and it worked like a charm.

Since I have multiple hosts, I decided I needed to tag them with their hostname so I could target a job to each host. Each host’s puppet apply is thus independent so if there’s a problem on one host, it doesn’t affect the functioning of puppet on the others.

In order to run on push, I added:

on:
  push:
    branches: [main]

and I created a job for that host:

jobs:
  homeserver:
    runs-on: [self-hosted, homeserver]
    …
    steps: …

I pushed this up and it all worked! Changes to the Puppet repo were applied directly to the host by the self-hosted runner. Next, I added a self-hosted runner to my cloud VM and specified a second job in the YAML:

jobs:
  homeserver:
    runs-on: [self-hosted, homeserver]
    …
    steps: …
  cloudvm:
    runs-on: [self-hosted, cloudvm]
    …
    steps: …

This worked like a charm. Now I can stop wasting CPU with my cron job and I can easily see the result of any change I make in the GitHub UI.

Conclusion

Self-hosted runners are pretty magical. You can test your code with the computer under your desk! You can apply system configuration changes with a simple Git push! You can run all sorts of code whenever you want! So far I’m really impressed with how these runners “just work” out of the box. What are you using self-hosted runners for?

Published on February 3, 2022.