Building an OpenBSD Home Router, Part 5: Automation and Operations -

It’s 3am and your ISP has quietly dropped your WAN connection. Not a hard link-down, nothing that dramatic. The cable modem still has sync, the LEDs are green, DHCP is up. But somewhere between your house and the wider internet, a route has gone stale or a BRAS has crashed or someone at the telco has pushed a config change and not tested it properly. Your firewall has a default route pointing into a void, your DNS queries are timing out, and you’re fast asleep with no idea any of this is happening.

In Cyprus, this happens more often than you’d think. I don’t mean that as a dig at Cypriot ISPs specifically, although I do have opinions. I mean that if you run your own infrastructure, even “just” a home router, you need to plan for things going wrong when you’re not watching. Because you’re never watching at 3am. Nobody is.

This post is about all the operational plumbing that turns a hand-configured OpenBSD box into something that actually runs itself: the install script that deploys everything in the right order, the state machine that detects WAN failures and recovers, the cron jobs that keep packages patched and blocklists fresh, and the NetFlow setup that lets me see what my network is actually doing without running a full monitoring stack.

This is Part 5 of the series. Part 1 covered the hardware. Part 2 built the pf firewall rules. Part 3 set up encrypted DNS with dnscrypt-proxy. Part 4 added various hardening. If you’re jumping in here, the short version is: I’m running OpenBSD 7.8 on a PC Engines APU3D2 board with three Intel NICs, em0 on WAN and em1/em2 serving wired and WiFi LANs.

The install script: ordering matters more than you think

The full deployment is a single install.sh script, about 363 lines. I could have used Ansible. I could have used some fancy configuration management tool. But for a single box with a specific purpose, a well-structured shell script is simpler, more auditable, and doesn’t require installing an Ansible ecosystem on my firewall just to copy some config files around.

The script runs in three phases, and the ordering is not arbitrary. Get it wrong and you’ll lock yourself out mid-install, which I did exactly once during testing. Once was enough.

Phase 1: Network-dependent work (while DNS still works)

When you first boot a fresh OpenBSD install, the system is using whatever DNS the ISP’s DHCP server handed out. It’s not your carefully hardened setup yet, it’s just the default. That’s actually useful, because it means network access works right now, and some things need network access.

Phase 1 does everything that requires a working internet connection:

# Firmware updates (WiFi drivers, etc.)
fw_update

# Install all required packages
pkg_add pkg_add python%3 dnscrypt-proxy sshguard softflowd nfdump

# Download IP reputation blocklists
ftp -o /etc/pf.d/spamhaus-drop.txt \
    https://www.spamhaus.org/drop/drop.txt
ftp -o /etc/pf.d/spamhaus-edrop.txt \
    https://www.spamhaus.org/drop/edrop.txt
ftp -o /etc/pf.d/emerging-threats.txt \
    https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt

# Initialize DNSSEC trust anchor
unbound-anchor -a /var/dnscrypt-proxy/trust-anchor

The key insight: do all of this BEFORE you touch the network configuration. The moment you change resolv.conf to point at your local dnscrypt-proxy instance, you need that instance to be installed and running. If you change DNS config first and the package install fails, you’re stuck with no DNS and no way to fix it remotely.

Phase 2: Local configuration (no network dependency)

This is the bulk of the script. It copies config files into place, hardens services, and sets up all the supporting infrastructure:

# Deploy all configuration files
install -o root -g wheel -m 0600 pf.conf /etc/pf.conf
install -o root -g wheel -m 0644 dhcpd.conf /etc/dhcpd.conf
install -o root -g wheel -m 0644 ntpd.conf /etc/ntpd.conf
install -o root -g wheel -m 0644 ifstated.conf /etc/ifstated.conf
# ... and the rest

# SSH hardening
install -o root -g wheel -m 0600 sshd_config /etc/ssh/sshd_config

# Enable services
rcctl enable dhcpd
rcctl enable softflowd
rcctl enable dnscrypt_proxy
rcctl enable ifstated

# Set up cron jobs
python3 setup-cron.py

# Generate mtree security baselines
mtree -c -p /etc > /etc/mtree/etc.spec
mtree -c -p /var > /etc/mtree/var.spec

The mtree baselines deserve a quick mention. OpenBSD’s mtree creates a cryptographic fingerprint of a directory tree, checksums, permissions, ownership, the lot. By generating a fresh baseline after a known-good install, I can later detect any unexpected changes to configuration files. It’s not a full intrusion detection system, but it catches drift, and drift on a firewall is the kind of thing you want to know about.

Phase 3: Network changes (last, because you only get one shot)

This is the phase where, if something goes wrong, you’re driving to wherever the box is with a serial cable. Everything that changes how the box talks to the network goes here, and it goes LAST:

# Deploy interface configurations
install -o root -g wheel -m 0640 hostname.em0 /etc/hostname.em0
install -o root -g wheel -m 0640 hostname.em1 /etc/hostname.em1
install -o root -g wheel -m 0640 hostname.em2 /etc/hostname.em2

# Deploy DNS configuration
install -o root -g wheel -m 0644 resolv.conf /etc/resolv.conf

# Lock resolv.conf immutable
chflags schg /etc/resolv.conf

# Deploy DHCP client configuration
install -o root -g wheel -m 0644 dhcpleased.conf /etc/dhcpleased.conf

# Deploy sysctl settings (IP forwarding, etc.)
install -o root -g wheel -m 0644 sysctl.conf /etc/sysctl.conf

# Validate pf.conf syntax, but do NOT load it
pfctl -nf /etc/pf.conf

That chflags schg on resolv.conf is worth explaining. The schg flag sets the system immutable bit, which means even root can’t modify the file without first clearing the flag. This prevents dhcpleased or any other DHCP client from overwriting your DNS settings when it renews a lease. Your firewall’s DNS should point at your local dnscrypt-proxy instance, always, regardless of what the ISP’s DHCP server suggests.

And notice the last line: pfctl -nf validates the pf configuration without loading it. The -n flag means “parse and check, but don’t apply.” If there’s a syntax error, the script fails here and you can fix it before rebooting. If it passes, the validated ruleset gets loaded cleanly on the next boot. I don’t hot-load pf rules during install because the interface addresses might not match the new configuration yet. A clean reboot with everything in place is much safer than trying to do a live cutover.

Network changes go last. pf validated but not loaded. Clean activation on reboot. This ordering is the difference between a smooth deployment and a 45-minute debug sitting on the floor with a 1m long serial cable.

WAN monitoring: the three-state machine

Here’s a problem that took me a while to properly think through. OpenBSD’s dhcpleased does a decent job of detecting when a network link goes down, the physical carrier drops and the kernel notices immediately via the route socket. But “link up, internet dead” is a completely different failure mode, and it’s the common one.

Your cable modem has sync. DHCP gives you an address. The link LED is green. But packets to the wider internet just vanish. Maybe there’s a routing problem upstream. Maybe the ISP’s CGNAT gateway has crashed. Maybe DNS works but HTTP doesn’t. dhcpleased won’t detect any of this because from its perspective, everything is fine. You have a lease. The link is up. Job done.

I use ifstated for this, and it’s one of those beautiful OpenBSD tools that does exactly one thing and does it well [1]. It’s a daemon that monitors interface states and runs commands when those states change. You define states, tests, and transitions, and it handles the rest.

My ifstated.conf defines a three-state machine:

# Test definitions
init-state connected

# Instant link detection via kernel route socket
em0_link = "em0.link.up"

# Dual-target ping: succeed if EITHER target responds
inet_ok = '( ping -q -c 1 -w 3 1.1.1.1 > /dev/null 2>&1 || \
             ping -q -c 1 -w 3 9.9.9.9 > /dev/null 2>&1 )'

state connected {
    if ! $em0_link
        set-state link_down
    if ! $inet_ok every 60
        set-state disconnected
}

state disconnected {
    init {
        run "logger -t ifstated 'WAN: link up but internet unreachable'"
        run "dhcpleasectl em0"
    }
    if $inet_ok every 30
        set-state connected
    if ! $em0_link
        set-state link_down
}

state link_down {
    init {
        run "logger -t ifstated 'WAN: physical link down on em0'"
    }
    if $em0_link
        set-state disconnected
}

Three states, each handling a different failure mode:

connected is the happy path. The link is up and pings succeed. It checks internet connectivity every 60 seconds, which is frequent enough to catch outages quickly but not so frequent that it generates noticeable traffic.

disconnected means the physical link is up (cable modem has sync, DHCP lease is valid) but pings to the internet fail. This is the sneaky failure mode. The init block runs dhcpleasectl em0, which sends a gentle DHCP re-request. It doesn’t bounce the interface, it doesn’t tear down the link, it just asks for a fresh lease. Often this is enough to kick-start a stale connection. Once in this state, it polls every 30 seconds so recovery is detected quickly.

link_down means the physical carrier is gone. em0.link.up is a kernel-level test that uses the route socket, so it’s instant, no polling delay. When the link comes back, we transition to disconnected first (not directly to connected) so that we verify internet actually works before declaring victory.

The dual-target ping is a small but important detail. I ping both 1.1.1.1 (Cloudflare) and 9.9.9.9 (Quad9), and the test passes if EITHER responds. This avoids false positives when one provider has a problem. If both are simultaneously unreachable, well, something is genuinely wrong.

There’s a subtle bug in ifstated that bit me during testing: if a test command is still running when ifstated tries to evaluate it again, ifstated kills the running test and treats the result as success. This means if your polling interval is shorter than the time your test takes to complete, you’ll get false “everything is fine” results. My ping tests use -w 3 (3-second timeout) and I poll every 60 seconds. Plenty of margin. But if you set the polling interval to, say, 5 seconds with a 3-second timeout, you’ll eventually hit the race condition when the network is slow and life will get confusing.

One last thing about WAN monitoring: my pf NAT rules use (egress:0) as the translation address, not a hardcoded IP. This means if DHCP hands me a new WAN address after recovery, pf automatically picks it up. No rule reload needed, no script to detect the change. It just works.

Automated patching: keeping things current without losing sleep

I have a complicated relationship with automated patching on production infrastructure. On servers at work, I want staged rollouts, test environments, change windows, the whole process. On my home firewall, I want something different: I want it patched, I want it current, and I want to not think about it.

OpenBSD makes this surprisingly easy because the project takes backwards compatibility seriously and because syspatch is designed for exactly this use case [2]. Binary patches are small, tested, and targeted. They fix security issues and serious bugs. They don’t reorganise your config files or change default behaviour.

All the cron jobs are generated by a setup-cron.py script that writes the crontab entries. Here’s what runs:

Weekly system updates (Sunday 03:00)

0 3 * * 0   /usr/sbin/fw_update && \
             /usr/sbin/syspatch && \
             /usr/sbin/pkg_add -u && \
             /usr/bin/mtree -c -p /etc > /etc/mtree/etc.spec && \
             /usr/bin/mtree -c -p /var > /etc/mtree/var.spec && \
             if ls /bsd.syspatch* > /dev/null 2>&1; then \
                 /sbin/reboot; \
             fi

This does four things in sequence:

fw_update updates firmware blobs (WiFi drivers, mostly, but they are not used in this system, WiFi is from a commodity WiFi AP in bridge mode).
syspatch applies binary security patches.
pkg_add -u updates installed packages.
mtree baselines regenerate after the updates, so the new state becomes the known-good state.

The reboot logic is the clever bit. After patching, the script checks whether /bsd.syspatch* exists. These files are created when syspatch patches the kernel. If the kernel was updated, we reboot, because a kernel patch isn’t active until the next boot. If only userland packages were updated, no reboot needed, the box stays up.

I run this at 3am on Sunday because that’s when I’m least likely to notice a brief network interruption, and most likely to be awake by late morning to check that everything came back cleanly. In practice, I’ve had exactly zero failed updates in six months of running this. OpenBSD’s patch process is remarkably reliable.

Daily IP reputation updates (02:00)

0 2 * * *   /usr/bin/ftp -o /etc/pf.d/spamhaus-drop.txt \
                https://www.spamhaus.org/drop/drop.txt && \
             /usr/bin/ftp -o /etc/pf.d/spamhaus-edrop.txt \
                https://www.spamhaus.org/drop/edrop.txt && \
             /usr/bin/ftp -o /etc/pf.d/emerging-threats.txt \
                https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt && \
             /sbin/pfctl -t blocklist -T replace \
                -f /etc/pf.d/spamhaus-drop.txt \
                -f /etc/pf.d/spamhaus-edrop.txt \
                -f /etc/pf.d/emerging-threats.txt

This pulls fresh copies of the Spamhaus DROP and EDROP lists [4] plus the Emerging Threats blocklist [5], then hot-reloads them into the pf blocklist table. The table replace is atomic, there’s no window where the table is empty. Connections from blocked networks get dropped silently at the firewall, never reaching any LAN device.

Daily DNS blocklist update (03:30)

30 3 * * *  /usr/bin/ftp -o /var/dnscrypt-proxy/blocklist.txt \
                https://cdn.jsdelivr.net/gh/hagezi/dns-blocklists@latest/wildcard/pro-onlydomains.txt && \
             /usr/bin/pkill -HUP dnscrypt-proxy

This refreshes the Hagezi Pro DNS blocklist [6] and sends SIGHUP to dnscrypt-proxy to reload it. No restart, no DNS downtime, just a config reload. The blocklist covers ads, trackers, telemetry, phishing, and various other categories of domains that have no business being resolved on my network.

Thermal safety (every 5 minutes)

*/5 * * * * /usr/local/bin/thermal-check.sh

The APU3D2 is passively cooled and sits in a metal case. In a Larnaca summer, ambient temperature in my office can hit 35C easily, and the board can get warm. The thermal check reads the CPU temperature sensor and logs a warning if it’s above a threshold. I haven’t needed it to trigger an emergency shutdown yet, but it’s there. Because hardware that’s running 24/7 in a Mediterranean climate needs someone paying attention to thermals, even if that someone is a cron job.

DHCP: two pools, two trust levels

The DHCP configuration is straightforward but reflects a deliberate network design decision. I run two LAN segments with different trust levels:

# Wired LAN (em1) - 10.20.10.0/24
subnet 10.20.10.0 netmask 255.255.255.0 {
    range 10.20.10.100 10.20.10.199;
    option routers 10.20.10.1;
    option domain-name-servers 10.20.10.1;
    option ntp-servers 10.20.10.1;
    default-lease-time 86400;
    max-lease-time 86400;
}

# WiFi LAN (em2) - 10.20.20.0/24
subnet 10.20.20.0 netmask 255.255.255.0 {
    range 10.20.20.100 10.20.20.199;
    option routers 10.20.20.1;
    option domain-name-servers 10.20.20.1;
    option ntp-servers 10.20.20.1;
    default-lease-time 14400;
    max-lease-time 14400;
}

Wired devices get 24-hour leases. These are my workstation, my NAS, my printer, things that are physically connected and relatively stable. WiFi devices get 4-hour leases. Phones, tablets, visitors’ laptops, things that come and go. The shorter lease time means the address pool turns over faster, and devices that have left the network release their addresses sooner.

Both subnets point DNS and NTP at the local gateway address. All DNS goes through dnscrypt-proxy. All time comes from the local ntpd. No device on my LAN talks to external DNS or NTP servers directly, the pf rules prevent it even if they try.

NTP: time is infrastructure

Speaking of NTP, the configuration is minimal:

# /etc/ntpd.conf
servers pool.ntp.org
listen on 10.20.10.1
listen on 10.20.20.1

The firewall syncs its own clock from pool.ntp.org, then serves time to both LAN segments. Accurate time matters more than people think on a firewall. Log timestamps need to be correct for forensics. Certificate validation depends on correct time. Cron jobs need to run when you expect them to. And if you’re correlating NetFlow data with logs from other systems, time skew turns analysis into guesswork.

DHCP client hardening: what your ISP doesn’t need to know

The WAN-facing side of the DHCP equation is where things get interesting from a privacy perspective. By default, dhcpleased sends your hostname to the ISP’s DHCP server and accepts whatever DNS servers the ISP pushes. Both of these are information leaks I’d rather avoid.

My dhcpleased.conf [7] is short:

# /etc/dhcpleased.conf
interface em0 {
    send no host name
    ignore dns
}

send no host name does what it says. The ISP’s DHCP server doesn’t need to know what I’ve called my firewall. It’s a small thing, but every piece of identifying information you don’t send is a piece that can’t be logged, correlated, or leaked.

ignore dns tells dhcpleased to accept the IP address and default route from DHCP but reject the DNS server options. My resolv.conf always points at 127.0.0.1 (dnscrypt-proxy), and the chflags schg from the install script ensures nothing overwrites it.

Combined with lladdr random on em0 (which randomises the MAC address at each boot), my ISP sees a different MAC address each time the box restarts and no hostname. They know an OpenBSD box is on their network (the DHCP client identifier gives that away), but they don’t get the easy fingerprinting data that most consumer routers happily broadcast.

Is this paranoid? Maybe. But I run a home network that handles my work traffic, my family’s personal data, and my IoT devices. A bit of paranoia at the network edge isn’t paranoia, it’s basic operational hygiene.

Putting it all together

When I step back and look at what’s actually running on this little fanless box under my desk, I’m genuinely impressed by what OpenBSD gives you out of the box. The install script is 363 lines. The cron jobs handle patching and blocklist updates. ifstated watches the WAN. softflowd records traffic flows. DHCP and NTP serve the LAN. And all of it is observable, auditable, and repairable over a serial console if everything goes sideways.

The total CPU overhead of all these operational services is negligible. The APU3D2 idles at about 8% CPU with all of this running, which leaves plenty of headroom for pf to do its actual job of shuffling packets.

Part 6 will wrap up the series with performance tuning, monitoring dashboards, and six months of operational lessons. Including the time I accidentally locked myself out by loading pf rules before the interfaces were configured, which is how I learned the “validate but don’t load” lesson the hard way.

What I keep coming back to is this: the operational complexity of running your own firewall is vastly overestimated by people who haven’t tried it, and slightly underestimated by people who have. It’s not hard. But it does require thinking about failure modes at 3am when you’re not there to fix them. Automation isn’t a nice-to-have on infrastructure, it’s the whole point.

References