Building an OpenBSD Home Router, Part 2: The Firewall -

Eight days. I’d had the new firewall running for eight days when I pulled up pfctl -s info and stared at the numbers: 61 million packets passed, 325 thousand blocked. Zero percent CPU. The APU3D2 was barely awake. Just sitting there, quietly dropping a quarter of a million packets that had no business being on my network, using roughly the same computational effort as breathing.

I’ve been writing pf rules on and off for about fifteen years, and the thing that still gets me is the syntax. If you’ve ever written iptables rules, you know the feeling of wrestling a language that was designed by committee and refined by people who actively enjoy suffering. pf isn’t like that. pf reads like prose. Not elegant prose, maybe, but clear, declarative, opinionated prose. You read a pf.conf and you can see what the firewall is doing. You don’t need to trace chains and jump targets and figure out which table is evaluated in which order. It’s just… there.

pf’s syntax is what happens when one person with strong opinions designs a packet filter and refuses to make it complicated.

That person is Henning Brauer, and his fingerprints are all over this system in the best possible way. But more on that later.

This is Part 2 of my series on building a production home router with OpenBSD 7.8 on a PC Engines APU3D2. Part 1 covered the hardware and initial install. This post is about the firewall itself, the actual pf.conf that’s been running on my network in Larnaca since day one.

tl;dr I’m going to walk through every significant section of my production pf.conf, from tables to NAT to QoS. If you just want the queue management bit (which is genuinely the most interesting part), skip to the HFSC + FQ-CoDel section near the end. But you’ll miss the fun of the journey.

Tables: the foundation

Before any rules get evaluated, pf needs to know about the IP ranges and addresses it’ll reference. That’s what tables are for, and I’ve got four of them.

Martians

The <martians> table is a list of IP ranges that should never appear on the public internet. These are reserved, private, or special-purpose addresses as defined by RFC 6890 [1]. If any of these show up on my WAN interface, something is either misconfigured or malicious, and I don’t much care which. Either way, they get dropped.

table <martians> const { \
    0.0.0.0/8, \
    10.0.0.0/8, \
    100.64.0.0/10, \
    127.0.0.0/8, \
    169.254.0.0/16, \
    172.16.0.0/12, \
    192.0.0.0/24, \
    192.0.2.0/24, \
    192.168.0.0/16, \
    198.18.0.0/15, \
    198.51.100.0/24, \
    203.0.113.0/24, \
    240.0.0.0/4 \
}

Thirteen ranges. The const keyword means this table is loaded at startup and never modified at runtime. You can’t accidentally flush it, you can’t add entries to it from a rule. It’s immutable. I like immutable things in security contexts.

A few of those ranges are worth calling out. 100.64.0.0/10 is the Carrier-Grade NAT range, RFC 6598, which your ISP might use internally but should never leak to you. 198.18.0.0/15 is reserved for inter-network benchmarking. 198.51.100.0/24 and 203.0.113.0/24 are documentation ranges, the TEST-NET-2 and TEST-NET-3 blocks that exist purely for use in examples and should never carry real traffic. And 240.0.0.0/4 is the old Class E “reserved for future use” space that has been reserved for future use for roughly thirty years now, with no future use in sight.

Dynamic tables

The other three tables are populated at runtime:

table <sshguard> persist
table <replist> persist file "/etc/pf.replist"
table <bruteforce> persist

<sshguard> is managed by sshguard [2], a lovely little daemon that watches authentication logs and adds offending IPs to a pf table. It handles the log parsing and timing, pf handles the blocking. Clean separation of concerns.

<replist> is my IP reputation list. A daily cron job fetches the Spamhaus DROP and EDROP lists [3] plus the Emerging Threats compromised IP list, merges them, deduplicates, and writes the result to /etc/pf.replist. The file keyword tells pf to load the table contents from that file at startup. The persist keyword keeps the table in memory even if no rules reference it yet, which matters during reload.

<bruteforce> is populated by pf itself via the overload mechanism. More on this when we get to the SSH rules.

Block policy: silence on the outside, manners on the inside

Here’s a design decision I feel strongly about. The WAN-facing block policy is drop:

set block-policy drop

When an unsolicited packet arrives on my WAN interface, it disappears. No TCP RST, no ICMP unreachable, nothing. The sender gets silence. This is deliberate. Every response you send to an attacker is information. A RST tells them the port is reachable but filtered. An ICMP unreachable tells them the host exists. Silence tells them nothing. Maybe the IP is in use, maybe it isn’t. They don’t know, and I’d like to keep it that way.

On the LAN side, I use return in the rules themselves:

block return in on $lan_if

This sends a TCP RST for blocked TCP connections and an ICMP port-unreachable for everything else. Why the difference? Because my LAN clients are my devices. If something on my network tries to reach a port I’ve blocked, I want it to fail fast. A dropped packet means the client sits there waiting for a timeout, which is anywhere from 30 seconds to two minutes depending on the application. A RST means instant failure. My devices shouldn’t have to wait around wondering.

SYN flood defence: Henning Brauer’s syncookies

This is one of my favourite features in OpenBSD’s pf, and it’s worth understanding why it exists and how it works.

set syncookies adaptive (start 25%, end 12%)

A SYN flood attack works by sending thousands of TCP SYN packets, the first step of the three-way handshake, without ever completing the handshake. Each SYN causes the firewall to allocate state, reserving memory for a connection that will never materialise. Enough of these and you exhaust the state table, at which point legitimate connections can’t get through.

SYN cookies are a defence against this. Instead of allocating state when a SYN arrives, the firewall encodes the connection parameters into the sequence number of the SYN-ACK response. If the client completes the handshake (sends the final ACK), the firewall can reconstruct the state from the sequence number. No state allocated until the handshake completes. Brilliant.

But there’s a trade-off. SYN cookies can’t support TCP options negotiated during the handshake, things like window scaling and selective acknowledgements. So you don’t want them active all the time. You only want them when you’re actually under attack.

That’s what the adaptive mode does. When the state table fills to 25% with half-open connections (SYNs received, handshake not completed), syncookies activate. When it drops back below 12%, they deactivate. The hysteresis gap prevents flapping.

Henning Brauer committed the syncookies implementation for pf in OpenBSD 6.3, and the adaptive mode came in 6.5. It’s one of those features where you set it and forget it. Under normal conditions, it does nothing. Under attack, it saves you.

DNS interception: trust nobody

This section is, philosophically, the one I’m most opinionated about. I run unbound [4] as a local recursive DNS resolver on the router. Every DNS query from my network is answered by my own resolver, which talks directly to authoritative nameservers. No ISP DNS, no Google DNS, no Cloudflare DNS. Mine.

But simply running a local resolver isn’t enough. Any device on my network could be configured (by its manufacturer, by an app, by malware) to use a different DNS server. Chromecast devices hardcode 8.8.8.8. Some IoT gadgets use their manufacturer’s DNS. Android phones may use DNS-over-TLS to bypass local DNS entirely.

So I intercept everything:

pass in on $lan_if proto { tcp, udp } \
    from $lan_if:network to any \
    port domain \
    divert-to 127.0.0.1 port 5353 \
    tag LAN_DNS

Any DNS query (port 53, TCP or UDP) from a LAN client, regardless of destination, gets transparently redirected to my local unbound instance. The client thinks it’s talking to 8.8.8.8 or whatever it has configured. It’s actually talking to my resolver. The client never knows.

Then I block the escape routes:

block drop in on $lan_if proto tcp \
    from $lan_if:network to any \
    port 853

Port 853 is DNS-over-TLS (DoT). If I don’t block it, a clever device could bypass my DNS interception by encrypting its queries. Blocked. Similarly, I block port 5353 (mDNS) from leaving the LAN, because mDNS is a local discovery protocol and has no business traversing a router.

The result? I have complete visibility over every DNS query made by every device on my network. I can see what my smart TV is trying to resolve. I can see what my printer is phoning home to. And I can block it if I don’t like it. This is what owning your network actually means.

NAT: the boring but essential bit

NAT configuration in pf is pleasingly concise:

match out on egress inet \
    from ($lan_if:network) to any \
    nat-to (egress:0)

One line. That’s your entire outbound NAT. But the details matter.

(egress:0) uses parentheses to tell pf to dynamically track the current IP address of the egress interface. The :0 selects the first (usually only) IP address on that interface. If my ISP’s DHCP server assigns me a new address, pf picks up the change automatically. No reload needed. No pfctl -f /etc/pf.conf. It just works.

This is a bigger deal than it sounds. With iptables, a DHCP address change means re-running your NAT rules with the new IP, which usually means a script hooked into the DHCP client. With pf, the parenthetical syntax handles it natively. One less thing to break.

The match keyword is also important here. Unlike pass, match doesn’t make a pass/block decision. It just applies a transformation (NAT, in this case) to packets that meet the criteria. The actual pass/block decision happens later, in the filter rules. This separation of “what to transform” from “what to allow” is one of pf’s genuinely elegant design choices.

SSH rate limiting: be paranoid, but precisely paranoid

My router runs SSH. It has to, it’s headless. But SSH exposed to the WAN is a target, and the brute-force scanners find you within hours of connecting. My SSH rules are layered defence:

pass in on egress proto tcp \
    from any to (egress) port ssh \
    modulate state \
    (max-src-conn 100, \
     max-src-conn-rate 15/5, \
     overload <bruteforce> flush global, \
     max-src-states 50000)

Let me unpack this:

max-src-conn 100 allows a maximum of 100 concurrent connections from a single source IP. A legitimate user might have a few SSH sessions open. A hundred is generous. Anything beyond that is either an attack or a misconfiguration, and I don’t much care which.

max-src-conn-rate 15/5 limits each source to 15 new connections within a 5-second window. This is the brute-force killer. A human connecting to SSH establishes one connection, maybe two. A script trying password combinations establishes hundreds per second. Fifteen in five seconds is a comfortable margin for legitimate use and an instant trip-wire for attacks.

overload <bruteforce> flush global is where it gets good. When a source trips either the connection count or the rate limit, its IP gets added to the <bruteforce> table and ALL of its existing states are flushed. Not just the SSH states. All states. Every connection that IP has to my router, gone. This is aggressive and deliberate. If you’re brute-forcing my SSH, I don’t want you doing anything else on my network either.

max-src-states 50000 is a safety net. No single rule can consume more than 50,000 entries in the state table. This prevents a resource exhaustion attack even if the other limits somehow fail.

The <bruteforce> table is then referenced in a block rule:

block drop in quick on egress from <bruteforce>

The quick keyword means this rule is final. If your IP is in the bruteforce table, you’re done. No further rule evaluation. No appeal.

Combined with sshguard watching the authentication logs and populating <sshguard>, I’ve got two independent layers of SSH protection: pf catches the network-level abuse, sshguard catches the authentication-level abuse. Belt and braces.

Antispoof: the obvious thing that people forget

antispoof quick for egress
antispoof quick for $lan_if

Antispoof rules block packets that claim to come from a network behind an interface but arrive on a different interface. If a packet arrives on my WAN claiming to have a source address from my LAN subnet, that’s spoofed. Block it.

It’s a two-line config that prevents an entire class of attacks. I’m always slightly surprised when I see firewall configs without it.

QoS: HFSC queues with FQ-CoDel (the really interesting bit)

Right. This is the section I’ve been wanting to write, because it solves a problem that affects nearly every home network and almost nobody addresses it.

The problem is bufferbloat [5].

Here’s what happens without QoS. Your home internet connection is asymmetric, say 30 Mbps download and 10 Mbps upload. When you saturate the upload (a video call, a large file sync, a backup running), your router’s outbound queue fills up. Packets queue behind each other, waiting for their turn on the 10 Mbps link. The queue grows. Latency increases. Maybe from 20ms to 200ms. Maybe to 2000ms.

And here’s the cruel bit: the increased upload latency kills your download performance too. TCP relies on ACK packets flowing back from receiver to sender. Those ACKs travel on your upload link. When the upload queue is bloated, the ACKs are delayed, and the sender’s congestion control algorithm interprets delayed ACKs as congestion and slows down. Your 30 Mbps download drops to a fraction of its capacity, all because your upload is stuffed.

This is bufferbloat, and RFC 8290 [6] describes the solution: FQ-CoDel, Fair/Flow Queueing with Controlled Delay.

Why shape upload only?

I only apply QoS on the outbound side of my WAN interface. This seems counterintuitive at first, why not shape download too? The answer is that I can only control what leaves my router. By the time a download packet arrives at my WAN interface, it’s already traversed the ISP’s network and consumed bandwidth. I can’t un-consume that bandwidth by shaping it locally.

What I can control is the upload. And by controlling the upload intelligently, I indirectly fix the download, because the ACK starvation problem goes away.

The queue hierarchy

wan_upload = 10M

queue rootq on $wan_if bandwidth $wan_upload max $wan_upload
queue  q_dns parent rootq bandwidth 500K min 200K
queue  q_ack parent rootq bandwidth 2500K min 1M
queue  q_std parent rootq bandwidth 7M default flows 1024 qlimit 1024

Three queues, one parent:

rootq is the root queue, shaped to my actual upload bandwidth of 10 Mbps. This is critical. If I set this higher than my real upload speed, the shaping happens at the ISP’s end instead of mine, and I lose control. Setting it slightly below the real speed (some people use 95%) ensures that the bottleneck is always at my router, where my queues can manage it.

q_dns gets 500K with a guaranteed minimum of 200K. DNS queries are tiny and latency-sensitive. They should NEVER wait behind a bulk upload. A slow DNS lookup adds perceived latency to everything, web pages, email, API calls. By giving DNS its own queue with a guaranteed minimum, I ensure that name resolution stays snappy even when the upload is saturated.

q_ack gets 2500K with a minimum of 1M. This is for TCP ACKs, and it’s the key to solving the download problem. By prioritising ACKs over bulk traffic, I prevent ACK starvation. The download sender gets timely acknowledgements, its congestion control stays happy, and the download runs at full speed even when the upload is busy.

q_std gets the remaining 7M and is the default queue, everything that isn’t DNS or ACKs lands here. The flows 1024 parameter is where FQ-CoDel comes in. This tells pf to maintain up to 1,024 separate sub-queues, one per traffic flow, and apply CoDel’s delay management algorithm to each independently.

How FQ-CoDel works (the 30-second version)

Traditional queue management drops packets when the queue is full (tail drop). This is terrible. The queue is already oversized by the time you start dropping, and you drop from the back, penalising the most recent arrivals rather than the flows causing the congestion.

CoDel (Controlled Delay) takes a different approach. It monitors the sojourn time of packets, how long each packet has spent waiting in the queue. If the minimum sojourn time over an interval exceeds a target (typically 5ms), CoDel starts dropping packets from the head of the queue. It drops more aggressively the longer the problem persists, using a schedule based on the square root of the drop count.

FQ (Fair Queueing) adds per-flow isolation. Each TCP or UDP flow gets its own sub-queue. A single greedy flow can’t starve other flows because they’re in separate queues. The scheduler serves flows round-robin, giving each flow its fair share.

Combined, FQ-CoDel gives you low latency AND fairness. A video call and a backup can coexist on the same link without the backup destroying the call’s latency. Each flow is managed independently, and each has its delay controlled.

The qlimit 1024 sets the maximum depth of the queue in packets. With CoDel active, the queue should rarely approach this limit because CoDel starts managing delay well before the queue fills.

Assigning traffic to queues

match out on $wan_if proto tcp to port { domain, domain-s } \
    set queue q_dns

match out on $wan_if proto udp to port { domain, domain-s } \
    set queue q_dns

match out on $wan_if proto tcp flags A/SAFR \
    set queue (q_std, q_ack)

The first two rules put DNS traffic (ports 53 and 853) into the DNS queue. The third rule is the clever one. flags A/SAFR matches TCP packets that have the ACK flag set out of the SYN, ACK, FIN, and RST flags. This catches pure ACK packets, the acknowledgements that keep download streams flowing, and assigns them to q_ack. But notice the syntax: set queue (q_std, q_ack). The parenthetical pair means: normally use q_std, but if the packet has a low-delay TOS flag or is a pure ACK, use q_ack. pf handles this classification automatically.

The net effect? DNS is always fast. ACKs are always prioritised. Everything else gets fair, delay-controlled queueing. And my downloads stay fast even when the upload is saturated.

95% of home networks don’t do any of this. They run a dumb FIFO queue on the WAN interface and wonder why their video calls turn to mush whenever someone starts a cloud backup. The hardware to fix this costs about 100 euros. The configuration is about 15 lines of pf.conf. There really is no excuse.

Production stats: the proof

After eight days of production use on my home network in Larnaca, here’s what pfctl -s info shows:

State Table                          Total             Rate
  current entries                      847
  searches                      122544831          176.9/s
  inserts                         2193847            3.2/s
  removals                        2192999            3.2/s
Counters
  match                          61692498           89.1/s
  bad-offset                            0            0.0/s
  fragment                              0            0.0/s
  short                                 0            0.0/s
  normalize                             0            0.0/s
  memory                                0            0.0/s
  bad-timestamp                         0            0.0/s
  congestion                            0            0.0/s
  ip-option                            39            0.0/s
  proto-cksum                           0            0.0/s
  state-insert                          0            0.0/s
  state-limit                           0            0.0/s
  src-limit                             1            0.0/s
  synproxy                              0            0.0/s
  syncookies sent                       0            0.0/s

61 million matches. 847 active states. Zero syncookies sent, which means nobody’s bothered to SYN flood me yet (give it time). One src-limit hit, meaning exactly one IP has tripped my SSH rate limit. Zero memory errors, zero congestion drops, zero state table exhaustion events.

The blocked packet count comes from pfctl -t bruteforce -T show and pfctl -s labels, which together show roughly 325,000 packets dropped across the martians table, the reputation list, and the brute-force table over those eight days.

And the CPU? Functionally zero. The APU3D2’s AMD GX-412TC, a quad-core 1GHz embedded chip, handles 89 rule evaluations per second without breaking a sweat. This isn’t a high-traffic network, it’s a home connection, but the point is that pf on OpenBSD is extraordinarily efficient. The overhead of stateful filtering, NAT, and FQ-CoDel queue management is negligible on hardware that costs less than a fancy dinner.

What’s next

In Part 3, I’ll cover the unbound DNS resolver configuration: recursive resolution, DNSSEC validation, local zones, and the daily reputation list update script that feeds the <replist> table.

But for now, I’ll leave you with this thought. A firewall isn’t a product you buy. It’s a policy you write. And a good firewall policy should be readable, auditable, and explainable to another human being, not a 2,000-line iptables dump that nobody, including the person who wrote it, fully understands.

pf lets you write firewall policy the way you’d write a specification: clear, declarative, top-to-bottom. That’s not just aesthetically pleasing. It’s a genuine security property. A firewall you can read is a firewall you can reason about. And a firewall you can reason about is one you can actually trust.

References

openbsd pf firewall qos bufferbloat networking