Josh's Toolbox

Wormhole: An AI Journey in Designing a Nix Module for a Network Ingress Tunnel

I was recently designing an ingress tunnel Nix configuration to configure a cloud machine to access my home server, and out of laziness I decided to use Claude to quickly understand some wireguard options. Giving into the drug, I ended up working with it to design the whole module. What I share below, many have, my experience with AI, its successes, its failings, and my thoughts on the long term.

I’ve been spending some time recently working on my home server. It runs NixOS and is currently only exposed on my internal network. It runs PiHole, has a git server, and let’s us watch videos through Jellyfin. But, I’m quickly reaching the limitations of utility without it being available when I’m not home. I want to run calendar software and some sort of file hosting, but those are largely more useful when out and about.

I could host these services and run them without a domain name, memorizing my Internet Protocol (IP) address to access it, but that has its problems. For starters, IP addresses are not fun to memorize. For another, while my IP address is semi-static, whenever we lose service, which is all to common, I get a new IP address, and if I’m not at home, I don’t know that it’s cycled, much less what it is. But probably, the worst part is securing communication with my services: without a domain name, it’s much harder to set up certificates to establish secure communication.

So, I need to use a domain name to expose my services and that requires publishing a Domain Name Server (DNS) A record with my IP address. And that means making it public which isn’t desirable. Sure, the IPv4 address space is small enough that it’s basically all getting hammered all the time, but once you associate an IP address with it, you’re advertising to the world that you have interesting stuff to attack. Creating DNS records isn’t usually enough, but once you create the aforementioned certificates, certificate transparency records let everyone know your URL exists.

And yes, there are ways to work around this: generate a wildcard certificate, publish records and expose your services only on subdomains. DNS servers don’t usually make it possible to reveal a list of the records they serve. But I am paranoid. And with an IP address that is generally static, once it is exposed, I can’t easily rotate it. So, instead, what I want is to run a machine on a separate IP address that I can destroy and rotate with ease. And that server is the ingress server to my network. (Well, one of the networks in my house; I’m only letting this reach a subset of my devices.) Instead of records and traffic using my IP address, requests are made to it and it routes traffic between my server and requesters. To make things even more interesting, I want to encrypt traffic between my server and this tunnel server using WireGguard. While protocols like Hypertext Transport Protocol Secure (HTTPS) encrypt your traffic, there are often parts, usually small, that are still unencrypted. Transport Layer Security (TLS), the security layer on top of HTTP to create HTTPS, for example, does not encrypt the Server Name Identification. In addition, many protocols don’t hide what kind of traffic the underlying protocol is (e.g., Hypertext Transport Protocol (HTTP)). For no-profit and little gain, I want to hide why these machines are talking to each other, and that’s where configuring a WireGuard tunnel comes in.

That is where our journey starts. At first, I just was talking to Claude (Sonnet 4.6) to understand some WireGuard nix options, but decided to explore deeper and deeper till Claude was writing the whole thing. As we explore every iteration of the journey, I’m gonna keep a scoreboard, to keep track of who I felt was most important to each iteration: Claude or me. Is this scientific, no. But, whatever. And, I’ll give Claude a well deserved point because it is the one writing all this code.

JoshClaude
01

Iteration 1: Initial Draft

My initial plan had been to run an nginx proxy on the instance and naively route all traffic to my home server using stream directives. These would receive all all Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic then send it on to my home server. Then once my server responded to the Nginx server, the responses would be sent on back to the correct clients.

This is actually a pretty common setup for web servers: it let’s you host multiple different web pages on the same domain. It also let’s your various web services just worry about generating the responses and letting Nginx handle things like TLS. Nginx can terminate the TLS connection and speak just simple HTTP to your service.

But, Claude had a different idea when I proposed this: what if we did all the routing in the Linux kernel using nftables. (Well, it initially recommended iptables, but nftables is the more modern replacement.) Keeping all the traffic in the kernel would make it more performant. Nginx is a user mode application, so we end up crossing the user and kernel mode boundaries, reducing performance. But, perhaps, worse, Nginx actually terminates the connection and does direct negotiation with clients, which is its own overhead. The kernel can just modify the packets in-place then send them on their way.

Point Claude.

JoshClaude
02

Iteration 2: Review

I was pretty deep into my context window now. The same session that generated the code was the session I had had long conversations about design trade-offs, the impact of the limitations of underlying network technologies, and replacing iptables calls with nftable ones. In order to avoid context rot permeating the design, I took the code from iteration 1 over to a new session and asked it: “What do you think this Nix module does, and what errors do you see in it?” (Is that comma important? ¯\_(ツ)_/¯)

When your eyes start to bleed, skip to “That was a lot.”

Before we get into the proposed corrections, a quick aside to talk about the limitations of this design that Nginx doesn’t have to handle. Because Nginx terminates the traffic, talking to our server is simply a separate network connection. But, when we rewrite packets we need to take into account the network connection between the client and us and us and the target server. Individual TCP and UDP packets are limited by the configuration of the underlying point-to-point connection protocols on the Internet (e.g., messages between two directly connected machines). For Ethernet, this usually limits packets to 1500 bytes. This wouldn’t really be much of an issue if we were just sending out packets un-modified to the target server. But, we’re sending these messages over a VPN, and VPNs encapsulate messages which adds overhead. With WireGuard, the overhead is 80 bytes for IPv4 and so the maximum transmission unit size (MTU) is configured to be 1420 bytes. We need to tell clients this. For TCP, this is straightforward. During connection negotiation, we can tell the client what the maximum size the packets they can send are through the maximum segment size field (MSS). This is basically the MTU minus the IP and TCP headers. It’s smaller for IPv6 because IPv6 addresses are longer, and we need to take that into account for both connections, the client and the VPN. UDP, doesn’t have this negotiation, but if the packet is not allowed to be fragmented (as indicated in its headers), when the kernel tries to send it to the VPN, it will fail and send an Internet Control Message Protocol (ICMP) packet back to the client telling them about the fragmentation needs. It’s then up to the caller to figure out what to do. TCP would do something similar, but because TCP is a negotiated connection, we’ll get the data. UDP is a fire and forget and so client software needs to make their own choices; the network protocol won’t automatically ensure that we get that information.

That was a lot. But the point is, I identified this limitation in the initial session and Claude did not. So point me:

JoshClaude
12

Anyway, back to my new session. It identified issues in calculating the MSS, some limitations on network interface naming in Linux, and a few hard-coded values. I let it fix them and so Claude, too, gets another point:

JoshClaude
13

Iteration 3: A Simple Rename

I had already decided that wormhole was going to be the name of this tool, and many of the named configurations of subsystems used wormhole as a prefix to disambiguate them. But, network interfaces are limited to 20 characters on Linux and that really reduced the namespace for trailing identifiers if one were to configure multiple wormhole services. Claude was the one who raised it, but I was the one who identified it as an issue and instructed Claude to use wh- as a prefix instead. So this one, is my point:

JoshClaude
23

Iteration 4: An API Change

At this point, looking at the code was making my eyes bleed. I had originally designed an API I liked, but it ended up requiring all sorts of string mangling to make work. So, I asked Claude to take its own stab at it and really following Nix best practices. I felt it did a pretty stellar job here in the rewrite and opened doors to improve type checking. So, a well-deserved point to Claude.

JoshClaude
24

Iteration 5: Using WireGuard Magic

One of the neat things about WireGuard is that both sides don’t actually need to know that the other exists to form a connection. Instead, as long as one end is setup to maintain a persistent connection, it will reach out and establish it. In my case, with a potentially changing IP address for my home server, this is a huge boon. For configuring the tunnel service, this means that we aren’t guaranteed to know at configuration time if the connection between the tunnel and my home server will be IPv4 or IPv6. So, to get my point for this iteration, I instructed Claude to use reasonable defaults and expose configuration options to allow specifying a network type even if you didn’t know the target IP address in question.

JoshClaude
34

Iteration 6: Comments

Given that I didn’t write this module, I decided that I needed Claude to write more comments. So, for its hard work:

JoshClaude
35

Iteration 7: A Hole in the Strategy

Back in iteration 5, when I had requested that we expose IP version configuration, I also had us error when the WireGuard IP was configured and it disagreed with the configured version. But, I hadn’t considered the case of chained tunneling, and so here I reduced the erroring space to take that into account. I wouldn’t give myself a point, but Claude didn’t raise this either. Also, I’m the scorekeeper, so take that Claude:

JoshClaude
45

Iteration 8: Code Review Part 2: Electric Boogaloo

At this point, it was time to get a second set of virtual eyes on this thing, so I took the same approach I took with the initial pass and asked a fresh Claude session for insight. But, instead of letting that session correct it, I shared feedback with the session that was making all these changes so as to not lose the context of the design choices. For their tag team work, the Claudes earned another point:

JoshClaude
46

Iteration 9: To MTU or Not to MTU

At this point, I was again thinking about packet sizes. I realized that if there was a smaller than expected MTU between the tunnel server and my home server, the client would never find out about it. An ICMP packet would be generated to tell the tunnel to reduce packet sizes, but it had no means by which to send that back to the client. Or does it?

I’m pretty sure that conntrack, the kernel’s network connection tracking mechanism, will successfully send the ICMP packet to the client. ICMP is used to report connection errors and diagnose connectivity between machines. While ICMP is a separate protocol at the same level as TCP and UDP and has a different structure, ICMP packets include as a part of their payload the IP header and at least the first four bytes of the transport layer header which is where TCP/UDP specify the source and destination ports. And that is enough to associate with the connection. But, at this point, this didn’t click in my brain, and so I asked Claude to provide more knobs to manually reduce packet sizes.

As a result of my failure to truly investigate this, I award no points. And I slap myself with a yellow card.

Josh 🟨Claude
46

Iteration 10: Hole Punching or Connectionful UDP

While UDP traffic is connectionless, that doesn’t mean both ends don’t end up sending traffic to each other. In fact, WireGuard works over UDP. When routers see TCP packets, they can track the lifetime of the connection and ensure traffic to a given port from a given IP continues to go to the right machine. This is not the case for UDP. But, they do still do something similar: they simply keep that mapping alive for a brief period of time, on Linux by default for 30 seconds. I mentioned earlier that only one of two systems needs to know how to reach the other one, and it’s UDP hole punching that allows this. When the system in the know reaches out to the other one, it is able to talk back reusing the port that the smarter system reached out form.

I asked Claude to provide a knob to adjust that window to compensate for services on either side of the connection. My idea, my point:

Josh 🟨Claude
56

Iteration 11: What If I Have a Lot of Friends?

There is one major limitation that you run into when designing a low overhead ingress server. I briefly mentioned ports earlier, but let’s talk about them in a bit more detail. A connection between two machines is defined by four pieces of information: the IPs of the two machines and the ports each is using. Ports are just a number, a 16-bit number to be precise. This let’s there be up to 65535 connections between one machine and a service tied to a single port (like a web server) on another. By default, it’s actually usually around 28000 as access to some of these ports is usually limited, but we’ve put ourselves in an even more limited situation. In our case, no matter who connects to us, the packet gets translated into a network where everyone shares an IP address. So now, we can only handle 28000 connections per target port. One way to partially overcome this limitation is to provision more IPs on the WireGuard network, so I asked Claude to provide exactly that functionality. For my world-breaking geniusness, I get one whole point:

Josh 🟨Claude
66

Iteration 12: Do You Know What You Are Doing Claude?

While making the previous change, Claude made an unrelated change that actually would’ve broken things. And with that maneuver, Claude hands me the lead.

Josh 🟨Claude
76

Iteration 13: Do I Know What I Am Doing?

I realized that like the UDP knobs, we’d want similar knobs to turn for TCP. When a UDP connection is opened, conntrack by default tracks that connection for 30 seconds. When a TCP connection is opened, conntrack tracks it till it closes, or if never sees that, for 5 DAYS. Want to make the home server unreachable through the tunnel? Simply open a bunch of TCP connections without closing them. There are some applications that might need this, but having a knob to reduce this can make attacks harder.

But, these knobs we’re turning are system-wide knobs. These are kernel parameters and doing these things silently behind the scenes is going to get real confusing and surprise systems owners when you turn tunnels off and on. So, instead, I directed Claude to document these knobs and stop exposing my own weird interface for them. As a result, I’m afraid I will have to remove a point from myself for such a silly oversight. What a short-lived lead:

Josh 🟨Claude
67

Iteration 14: Are We There Yet?

I promise we’re almost out of iterations. This iteration was a simple request from me to further expand the module’s documentation on kernel parameter configuration.

Josh 🟨Claude
68

Iteration 15: Connecting to the WireGuard Network

While only one end of a WireGuard connection needs to know the IP address of the other, they both need to be configured with the other’s public WireGuard keys. My eventual plan was to expose this through a simple web server (a whole other can of worms that I hope to talk about in the future), and the configuration for that server would need to know where to find the public key our module creates is, so I instructed Claude to expose that in the Nix module’s configuration.

It’s nice to move my points in a more positive direction again.

Josh 🟨Claude
78

Iteration 16 & 17: So You’re Out of Connections

In iteration 12 we provided the means to reduce the chance of connection exhaustion, but that didn’t make it impossible. So what happens when connections are exhausted? The kernel quietly releases them to the great big bit bucket in the sky. So, I asked Claude to ask some documentation to help system maintainers detect if this is happening using the conntrack utility.

After that, I realized we had another knob to turn to help with connection exhaustion. Earlier I mentioned that we usually have fewer ports to work with than our available. So after the previous documentation update, I instructed Claude to write some docs on that knob.

Josh 🟨Claude
88

Where Does That Leave Us?

And that brings us to the final version. I have not reviewed it, or even tested it. I figure that should take me a few hours, but at the time of writing this blog post, I haven’t invested it. DO NOT USE IT. (Besides, you don’t even know my super special and awesome way I plan to expose WireGuard credentials.)

But, this post isn’t about the tool we created, at least not solely. It’s about my most recent experience with Claude code.

I should note, Sonnet 4.6 may have not been the best model for this, so if you believe that to be the case, you may disagree with some of this feedback.

This isn’t the first project I’ve done with Claude handling the bulk of the work. Late last year, I wrote a simple Python server that translated events from a third party into Prometheus metrics that I could query with Grafana. I don’t exactly recall which model I was using, but I do remember using their best coding model at the time. I was a bit disappointed with its architectural chops: the initial design was untestable and specifically designed for the task at hand; adjusting it to a better architecture required a bit of teeth pulling.

Here, I still saw echoes of that, but I must say it was much better at designing the module’s interface. I was very surprised at the cleanliness of the interface and the implementation. That said, I would’ve loved to see some more pushback on my design choices, like when I proposed exposing system-wide knobs as a part of the module.

And it’s still very prone to errors. While I know there’s a lot of work to make these models more and more accurate, we can’t forget that they are statistical approximations of conversation, not knowledge, and the very conversation your having can affect the accuracy of the models. In iteration 9, it was the one who proposed that ICMP packets could not properly be routed, and it took some sleep on my part to realize that would be a rather notable failing of the nftables feature (masquerade) that we were using.

I truly didn’t expect our scores to be tied when I tallied it up writing this post, but in truth the values and their equivalency mean nothing. The non-zeroness of either really tells the story. I was able to move much more rapidly with Claude and it raised ideas I hadn’t considered. But, as mentioned, it is still prone to errors; it still has blindspots. And it needed my expertise to avoid those both.

Recently, when using AI, I’ve felt like I can really feel my skills atrophy. But in this instance, I was surprised to feel like I was learning. I think the key here is that I was an active participant where I think historically I’ve been using Claude to do things I could do, I just didn’t feel like doing.

And I can’t say that I’ll stop that use case. AI is addicting. It’s a quick fix and like many quick fixes, there are long term trade-offs. But, I think that in an actively engaged partnership, with enough expertise in the driver, AI can be an effective tool for learning and growing.

As for the efficacy of this project overall,I still have to test this. I still don’t trust the output of AI tools enough to throw caution into the wind. And, maybe, I’ll discover I’m too optimistic, or maybe too pessimistic.