Every network you have ever used — every Wi-Fi connection, every webpage loaded, every message sent — depended on a routing protocol you never saw. Somewhere beneath the surface, packets were being examined, paths were being calculated, and forwarding decisions were being made millions of times per second. In the commercial internet, you never think about routing because someone else has already solved the problem for you. Massive ISPs run fleets of routers with carefully tuned configurations, teams of network engineers on call, and routing tables containing nearly a million entries.
In alternative networks, you do not have that luxury. You are the ISP. You are the network engineer. And the routing decisions your network makes — or fails to make — determine whether your mesh delivers packets reliably across a neighborhood or drops them into a void.
Chapter 3 introduced mesh routing at a high level: proactive versus reactive, Layer 2 versus Layer 3, and brief overviews of BATMAN-adv, Babel, and OLSR. That was the appetizer. This chapter is the main course. We are going deep into the routing protocols that power real alternative networks — not just the mesh protocols, but also the encrypted overlay networks that route through hostile infrastructure, the traditional protocols adapted for community ISPs, and the modern VPN mesh tools that sometimes solve the problem better than any of them.
By the end of this chapter, you will understand how each protocol makes routing decisions, when each is the right choice, and how to configure and tune them for your specific deployment. You will also build a working routing simulator in Python to see these algorithms come alive.
Routing is the invisible intelligence of your network. It is time to make it visible.
In a conventional home network, routing is trivially simple. Your laptop sends everything to your router. Your router sends everything to your ISP. The ISP’s router consults a global routing table and forwards the packet toward its destination. The path is essentially predetermined by the physical infrastructure: cables, switches, and a hierarchy of increasingly powerful routers.
In an alternative network, almost none of these assumptions hold:
The topology is dynamic. Nodes join and leave. Wireless links fluctuate. A node’s neighbor at noon may be unreachable at midnight because someone parked a truck in the signal path. A routing protocol must continuously discover the current topology and adapt.
Links are asymmetric and unreliable. A wireless link between two nodes may work perfectly in one direction and poorly in the other. Rain, foliage, interference from microwave ovens, and a thousand other factors cause link quality to vary minute by minute. The routing protocol must measure link quality and prefer reliable paths over unreliable ones.
Resources are constrained. Community mesh nodes are typically consumer routers with limited CPU, memory, and bandwidth. The routing protocol itself consumes resources — every control message it sends is bandwidth that cannot carry user data. A protocol that works beautifully in simulation may overwhelm real hardware with its overhead.
There is no central authority. No one is manually configuring routes. No one is monitoring the global topology and making adjustments. The routing protocol must be entirely autonomous — discovering, computing, and maintaining routes without human intervention.
Multiple gateways may exist. In a community network, several nodes may provide internet access through different ISPs. The routing protocol must discover these gateways, compare their quality, and distribute traffic among them. When a gateway goes down, traffic must shift to alternatives within seconds.
Security cannot be assumed. In a public mesh, anyone can join. A malicious node could announce false routes, blackhole traffic, or perform man-in-the-middle attacks. Some protocols address this with cryptography; others rely on the assumption of a cooperative community.
These challenges make routing in alternative networks fundamentally harder than routing in conventional networks. The good news is that decades of research and deployment experience have produced protocols specifically designed for these conditions.
Before diving into specific protocols, it is essential to understand routing metrics — the numbers that protocols use to compare paths and choose the best one. The choice of metric profoundly affects how a network behaves.
Hop count is the simplest metric: the best path is the one with the fewest intermediate nodes. Hop count is used by basic distance-vector protocols like RIP. It is easy to compute but terrible for wireless networks, because it ignores link quality entirely. A two-hop path through excellent links is far superior to a one-hop path through a marginal link, but hop count would choose the worse path.
ETX (Expected Transmission Count) measures the expected number of transmissions needed to successfully deliver a packet across a link, accounting for packet loss in both directions. A perfect link has an ETX of 1.0. A link with 50% packet loss in each direction has an ETX of 4.0 (because on average, four transmission attempts are needed to get one successful round-trip delivery). ETX is used by Babel, OLSR with ETX plugins, and many other mesh protocols. It is significantly better than hop count for wireless networks because it naturally penalizes unreliable links.
Throughput measures the actual data rate achievable on a link, accounting for both the nominal bitrate and the packet loss rate. BATMAN V uses throughput as its primary metric. Throughput captures more information than ETX because it accounts for the fact that a 54 Mbps link with 10% loss is still better than a 6 Mbps link with no loss — something ETX would get wrong.
RTT (Round-Trip Time) measures latency. Some Babel configurations use RTT as a metric, which is useful when latency-sensitive applications (like voice calls) are a priority.
Composite metrics combine multiple factors. EIGRP, for example, uses a weighted combination of bandwidth, delay, reliability, and load. Some community network protocols incorporate price (as in Althea) or energy cost (relevant for battery-powered nodes).
The key insight is that no metric is universally best. The right metric depends on your traffic patterns, your hardware, and what you are optimizing for. Understanding metrics is understanding the soul of a routing protocol.
Chapter 3 introduced BATMAN-adv (Better Approach To Mobile Ad-hoc Networking — Advanced) as the community mesh workhorse. Here we go deeper — into the protocol mechanics, the critical differences between protocol versions, and the configuration knobs that determine whether your mesh flies or crawls.
BATMAN-adv is a Layer 2 mesh protocol implemented as a Linux kernel module. It makes an entire mesh network appear as a single Ethernet broadcast domain. From the perspective of devices connected to the mesh, every other device is “on the same LAN” — DHCP works, mDNS works, ARP works, everything that expects a flat network works without modification. This simplicity is BATMAN-adv’s greatest strength and, at scale, its greatest limitation.
The original BATMAN IV protocol relies on a single message type: the Originator Message (OGM). Every node periodically broadcasts an OGM identifying itself. When a node receives an OGM from a non-neighbor (relayed through one or more intermediate nodes), it updates its routing table: the neighbor from which the OGM arrived is a potential next-hop to reach the originator.
The clever part is how BATMAN IV selects the best next-hop. Rather than relying on a single measurement, it uses a sliding window to track how many OGMs from each originator arrive through each neighbor over a recent time period. The neighbor delivering the most OGMs wins. This passive measurement naturally reflects link quality — a lossy link drops OGMs, so fewer arrive, and that neighbor scores lower.
BATMAN IV’s OGM approach is elegant in its simplicity, but it has a significant blind spot: it cannot distinguish between link quality and link throughput. A 6 Mbps link with perfect reliability scores the same as a 300 Mbps link with perfect reliability, because both deliver the same number of OGMs. This is a serious problem in networks with mixed-speed links — which is to say, almost every real-world deployment.
BATMAN V (the current version) fundamentally redesigns the protocol by separating neighbor discovery from route dissemination:
ELP (Echo Location Protocol) handles neighbor discovery and link quality measurement. Each node sends ELP probe packets to its neighbors at regular intervals. By measuring the rate of successful probe delivery and the throughput achievable on each link, ELP produces accurate per-link throughput estimates. ELP probes are local — they are not forwarded beyond direct neighbors, so they add minimal overhead.
OGMv2 handles global route dissemination. Like the original OGMs, OGMv2 messages propagate through the mesh announcing each originator’s presence. But unlike the original OGMs, the routing decision is based on the throughput metric gathered by ELP, not on passive OGM counting. Each OGMv2 message carries a throughput value that is updated at each hop to reflect the minimum throughput along the path.
The result is dramatically better routing in heterogeneous networks. BATMAN V will correctly prefer a path through two fast links over a single slow link, something BATMAN IV simply could not do.
BATMAN-adv’s Layer 2 operation depends on translation tables — distributed mappings between client MAC addresses and the mesh nodes they are connected to. When a laptop connects to mesh node A, node A advertises the laptop’s MAC address in a local translation table entry. This entry propagates through the mesh, so every node learns: “to reach MAC address XX:XX:XX:XX:XX:XX, send to mesh node A.”
There are two types:
When a node receives a frame destined for a MAC address in its global translation table, it encapsulates the frame in a BATMAN-adv header and unicasts it to the mesh node hosting that client. The receiving mesh node decapsulates the frame and delivers it to the local client.
This is how the “flat LAN” illusion is maintained — but it comes at a cost. Every client device on the entire mesh has a global translation table entry on every mesh node. In a mesh with 200 nodes and 5 clients per node, every node maintains 1,000 global TT entries. This does not scale indefinitely, which is one reason BATMAN-adv is best suited for networks under a few hundred nodes.
In community networks, multiple nodes typically provide internet access. BATMAN-adv includes a gateway selection mechanism where internet-connected nodes announce themselves as gateways, and client-facing nodes select the best gateway based on advertised bandwidth and path quality.
Gateways are classified by their upload and download bandwidth, and the selection algorithm balances throughput to the gateway with the quality of the mesh path to reach it. You can configure gateway mode on each node:
# On an internet-connected node — announce as gateway
batctl gw_mode server
batctl gw_bandwidth 100MBit/20MBit
# On a client-facing node — select best gateway
batctl gw_mode client
batctl gw_sel_class 20 # TQ threshold for switching
The gw_sel_class parameter controls how eagerly a node switches gateways. A higher value means the node will only switch if the new gateway is significantly better, preventing flapping between gateways of similar quality.
BATMAN-adv configuration is primarily done through batctl, the userspace control tool. Key tuning parameters:
# Set the OGM interval (milliseconds, default 1000)
batctl orig_interval 1000
# Enable bridge loop avoidance (critical when nodes
# are also connected via Ethernet)
batctl bridge_loop_avoidance 1
# Enable distributed ARP table (reduces broadcast)
batctl distributed_arp_table 1
# Enable AP isolation (clients can't talk directly)
batctl ap_isolation 0
# Check mesh status and originators table
batctl originators
batctl translation_global
batctl gateway_list
For most community deployments, the defaults work well. The most common tuning adjustment is reducing the OGM interval in highly dynamic environments (where nodes move frequently) or increasing it in stable deployments (to reduce overhead).
BATMAN-adv is the right choice when:
BATMAN-adv is the wrong choice when:
Most people who have studied networking have a dim view of distance-vector routing. They learned about RIP (Routing Information Protocol), its painfully slow convergence, its count-to-infinity problem, and its replacement by link-state protocols like OSPF. Distance-vector, the textbooks imply, is a historical curiosity.
Babel (RFC 8966) proves them wrong. Designed by Juliusz Chroboczek at the Université de Paris, Babel is a distance-vector protocol that solves the classic problems of the approach while remaining beautifully simple and remarkably efficient. It has become the routing protocol of choice for a growing number of community networks, and understanding it deeply is essential for anyone building alternative network infrastructure.
Babel operates on a simple principle: each node announces the destinations it can reach and the cost (metric) of reaching them. Neighbors receive these announcements, add the cost of the local link, and re-announce the destinations with the updated total cost. Through iterative propagation, every node converges on the lowest-cost path to every destination.
But Babel adds two critical mechanisms that elevate it above naive distance-vector:
Feasibility conditions prevent routing loops. Babel tracks the feasibility distance for each destination — the metric of the best route it has ever announced. A node will only accept a new route to a destination if the neighbor’s announced metric is strictly lower than the feasibility distance. This guarantees that routes always converge and loops never form, even temporarily. The concept is borrowed from EIGRP’s diffusing update algorithm, but Babel implements it more simply.
Sequence numbers handle stale information. Each route announcement carries a sequence number set by the originator. When a destination becomes unreachable, the originator increments its sequence number. Nodes still advertising the old sequence number will eventually be overridden by announcements with the new sequence number, ensuring that stale routes are purged.
Babel supports pluggable metrics, but the most commonly used are:
ETX (Expected Transmission Count): Babel nodes periodically send Hello messages and track the rate of Hello reception from each neighbor. If node A receives 80% of node B’s Hellos, and node B receives 90% of node A’s Hellos, the link ETX is $\frac{1}{0.8 \times 0.9} \approx 1.39$. The path metric is the sum of link ETXs along the path.
RTT (Round-Trip Time): Babel can timestamp Hello messages and compute round-trip delay. This is useful for latency-sensitive applications and for detecting congested links.
Channel-aware metrics: In multi-channel networks, Babel can assign different costs to different radio channels, preferring less congested frequencies.
One of Babel’s most practical strengths is native dual-stack operation. A single Babel instance routes both IPv4 and IPv6 simultaneously, carrying both address families in the same protocol messages. This is not an afterthought — it is a core design feature.
For community networks, this is invaluable. You can assign IPv6 addresses from a ULA (Unique Local Address) range for internal communication while also routing IPv4 for internet access, all with a single routing protocol instance.
The Babel reference implementation is babeld, a lightweight userspace daemon. Configuration is straightforward:
# Install on Debian/Ubuntu
sudo apt install babeld
# Basic configuration file: /etc/babeld.conf
redistribute metric 128
interface wlan0 type wireless
interface eth0 type wired
# Run babeld
babeld -c /etc/babeld.conf wlan0 eth0
The interface directive tells babeld which interfaces to use and their type (wireless interfaces use different timing parameters than wired ones). The redistribute directive controls which routes from the kernel routing table are announced into the Babel mesh.
For more advanced configurations:
# /etc/babeld.conf — community network setup
# Announce a local prefix
redistribute local ip 10.42.0.0/24 metric 128
redistribute local ip fd00::/48 metric 128
# Wireless mesh interface with tuned parameters
interface wlan-mesh type wireless \
hello-interval 4 update-interval 16
# Wired backbone link
interface eth0 type wired \
hello-interval 20 update-interval 60
# Filter: don't accept default routes from untrusted
in deny 0.0.0.0/0 ge 0
in deny ::/0 ge 0
The hello-interval and update-interval parameters control how frequently Babel sends Hello messages and route updates. Shorter intervals mean faster convergence but higher overhead. For stable wired links, longer intervals are appropriate; for volatile wireless links, shorter intervals help the protocol track changes.
Babel excels in several scenarios:
OLSR (Optimized Link State Routing Protocol) was one of the first routing protocols specifically designed for mobile ad-hoc networks. Defined in RFC 3626, OLSR is a proactive link-state protocol — every node maintains a complete picture of the network topology and independently computes shortest paths to all destinations.
OLSR’s defining innovation is the Multi-Point Relay (MPR) mechanism, which dramatically reduces the overhead of flooding link-state information through the network.
In a naive link-state protocol, every node rebroadcasts every topology control message it receives. In a dense network, this creates a broadcast storm — exponentially redundant copies of the same information consuming all available bandwidth.
OLSR solves this with a clever selection process. Each node examines its two-hop neighborhood (neighbors, and their neighbors) and selects a minimal subset of its one-hop neighbors as MPRs such that every two-hop neighbor is reachable through at least one MPR. Only MPR nodes rebroadcast topology control messages. Non-MPR nodes receive these messages (and use them to update their topology database) but do not retransmit them.
The MPR optimization typically reduces flooding overhead by 60-90% compared to naive flooding, making OLSR viable for networks of moderate size. However, the MPR selection process itself requires each node to know its complete two-hop neighborhood, which means HELLO messages must include lists of all one-hop neighbors.
OLSR uses two primary message types:
HELLO messages are sent periodically to one-hop neighbors. Each HELLO contains a list of the sender’s known neighbors and their link status (symmetric, asymmetric, or MPR). HELLOs are never forwarded — they serve only to build the local neighbor table and two-hop topology.
TC (Topology Control) messages contain the link-state information that MPR nodes flood through the network. Each TC message lists the node’s MPR selectors — the neighbors that have chosen this node as an MPR. By collecting TC messages from all MPR nodes, every node can construct a topology graph and compute shortest paths.
OLSRv2 (RFC 7181) is a complete redesign of OLSR that addresses several limitations of the original. Key improvements include:
The OLSRd daemon is the original implementation of OLSRv1. It was widely deployed in early community mesh networks, particularly in Freifunk, and supported numerous plugins including ETX link quality, DNS name resolution, and dynamic gateway selection. OLSRd is mature and well-understood but is no longer actively developed.
OLSRd2 (also called OONF, the OLSRv2 implementation from the Olsr.org Network Framework) implements the OLSRv2/NHDP standards. It is more modular, more efficient, and better suited to modern deployments. However, adoption has been slower than expected, partly because many community networks migrated to Babel or BATMAN-adv rather than upgrading from OLSRd to OLSRd2.
On OpenWrt, OLSRd can be installed and configured via UCI:
opkg update && opkg install olsrd olsrd-plugins
# /etc/config/olsrd
config olsrd
option IpVersion '4'
option LinkQualityLevel '2'
option LinkQualityAlgorithm 'etx_ff'
config Interface
list interface 'mesh'
option Mode 'mesh'
config LoadPlugin
option library 'olsrd_txtinfo'
option accept '0.0.0.0'
OLSR (particularly OLSRd with ETX) remains a reasonable choice for legacy deployments that are already running it. For new deployments, Babel generally offers better convergence, simpler configuration, and native dual-stack support. OLSR’s main advantage is its large body of operational experience and the extensive documentation from early community mesh deployments.
CJDNS is not just a routing protocol — it is an entirely new networking layer built on three radical premises:
Every connection is encrypted end-to-end. There is no unencrypted traffic on a CJDNS network. Not even routing protocol messages. Every packet is authenticated and encrypted using CryptoAuth, a protocol based on Curve25519 key exchange and Salsa20/Poly1305 authenticated encryption.
Addresses are derived from cryptographic keys. Your CJDNS IPv6 address is the double-SHA-512 of your public key, truncated to 128 bits and prefixed with fc (placing it in the IPv6 ULA range). This means your address is your identity — no one can spoof it without possessing your private key, and no authority needs to allocate it.
Routing is based on a Kademlia-style distributed hash table (DHT). Rather than flooding topology information (like OLSR) or propagating distance vectors (like Babel), CJDNS routes packets through a DHT where each node knows only about its immediate peers and a logarithmic subset of the network. This provides $O(\log n)$ routing in a network of $n$ nodes.
CJDNS arranges the network as a tree — specifically, a spanning tree rooted at an arbitrary node. Each node has an address on this tree (a label) that encodes the path from the root. Packets are source-routed using these labels: the sender encodes the full path in the packet header, and each intermediate node reads its portion of the label to determine the next hop.
When a node does not know the label for a destination, it queries the DHT. The DHT lookup returns a path label, which the sender then embeds in the packet header. Subsequent packets to the same destination reuse the cached label until the path breaks, at which point a new DHT lookup is performed.
This design has unusual properties:
Hyperboria was the largest CJDNS deployment — a global overlay network where volunteers peered with each other over existing internet connections (and in some cases, direct wireless links). At its peak, Hyperboria had hundreds of active nodes across multiple continents.
Hyperboria demonstrated both the promise and the challenges of CJDNS. The encryption and cryptographic addressing worked flawlessly — it was genuinely impossible to spoof addresses or eavesdrop on traffic. But the DHT-based routing suffered from high latency in sparsely connected regions, and the source-routing labels became unwieldy as path lengths grew. The network was also entirely dependent on volunteer effort, and as volunteer interest waned, the network contracted.
CJDNS compiles from source on most Unix-like systems:
git clone https://github.com/cjdelisle/cjdns.git
cd cjdns && ./do
./cjdroute --genconf > /etc/cjdroute.conf
sudo ./cjdroute < /etc/cjdroute.conf
The generated configuration includes your keypair and a randomly generated IPv6 address. To connect to other nodes, you exchange peering credentials — each peer entry includes the other node’s public key, password, and connection details:
"connectTo": {
"1.2.3.4:54321": {
"login": "peer-login",
"password": "shared-secret-here",
"publicKey": "abc123...xyz.k"
}
}
CJDNS can also auto-discover peers on the local network using Ethernet frames or multicast, making it useful for local mesh deployments as well as internet overlays.
Strengths: unbreakable encryption, self-assigning addresses, no central authority, privacy-preserving routing, works as both LAN mesh and internet overlay.
Limitations: complex codebase (written in C with its own event loop), sparse documentation, declining community, high latency in large or sparse networks, no native support for exit nodes or internet access (it is a pure overlay network).
Yggdrasil is the spiritual successor to CJDNS — or perhaps more accurately, what CJDNS would look like if redesigned from scratch with the benefit of hindsight. Created by Arceliar and neilalexander, Yggdrasil shares CJDNS’s commitment to cryptographic addressing and end-to-end encryption but takes a fundamentally different approach to routing.
Yggdrasil builds a globally-agreed spanning tree across the entire network. Each node’s position in the tree is determined by its distance (in hops) from the tree root, which is the node with the highest public key. Every node gets a tree coordinate — a sequence of port numbers describing the path from the root to that node.
Routing uses two mechanisms:
Tree routing follows the spanning tree. A packet ascends from the sender toward the root until it reaches the nearest common ancestor with the destination, then descends toward the destination. This always works but may not be optimal — the spanning tree path can be much longer than the shortest path through the network.
Source routing via DHT supplements tree routing. Yggdrasil maintains a Kademlia-style DHT that maps cryptographic addresses to tree coordinates. Once a node knows the destination’s tree coordinates, it can compute a greedy route through the tree that is shorter than the full up-then-down path. In practice, Yggdrasil’s source routing delivers near-optimal paths in most network topologies.
Like CJDNS, Yggdrasil derives addresses from public keys. Each node has a 200::/7 IPv6 address (in the Yggdrasil address range) computed from its Curve25519 public key. The address is self-assigned, globally unique, and cryptographically bound to the node’s identity.
Unlike CJDNS, Yggdrasil also assigns a subnet (300::/8) that the node can use to route traffic for local clients. This means a Yggdrasil node can act as a router for an entire LAN, not just as an endpoint.
| Feature | CJDNS | Yggdrasil |
|---|---|---|
| Language | C | Go |
| Address range | fc00::/8 |
200::/7 |
| Routing | DHT + source routing | Spanning tree + DHT |
| Auto-peering | LAN only | LAN multicast, configurable |
| Active development | Minimal | Active |
| Documentation | Sparse | Good |
| Performance | Moderate | Good |
| Ease of setup | Complex | Simple |
| Subnet delegation | No | Yes (300::/8) |
| Community | Declining | Growing |
For new deployments, Yggdrasil is the clear choice over CJDNS. It is simpler to install, easier to configure, better documented, and more actively maintained. The Go codebase is more accessible for contributions, and the community is engaged and growing.
Yggdrasil is available in package repositories for most distributions:
# Debian/Ubuntu
sudo apt install yggdrasil
# Generate configuration
sudo yggdrasil -genconf | sudo tee /etc/yggdrasil.conf
# Start the service
sudo systemctl enable --now yggdrasil
The generated configuration includes your keypair and a tun interface. To peer with other nodes over the internet, add their addresses to the Peers section:
{
"Peers": [
"tcp://1.2.3.4:54321",
"tls://[2001:db8::1]:54321"
],
"Listen": [
"tcp://0.0.0.0:54321"
],
"MulticastInterfaces": [
{ "Regex": "wlan.*", "Beacon": true, "Listen": true }
]
}
The MulticastInterfaces setting enables automatic LAN peering — any Yggdrasil nodes on the same wireless network will discover each other and peer automatically. This makes Yggdrasil excellent for local mesh deployments: install it on a few devices, connect them to the same Wi-Fi, and they will form an encrypted mesh with no manual peering configuration.
Check your Yggdrasil address and peers:
sudo yggdrasilctl getSelf
sudo yggdrasilctl getPeers
sudo yggdrasilctl getDHT
Mesh protocols like BATMAN-adv and Babel are designed for networks where every node is a peer, the topology changes frequently, and configuration should be minimal. But not every alternative network fits this model. Some community networks grow large enough — or structured enough — that traditional routing protocols become the right tool.
When does this happen?
When a community network has a backbone. Many mature community networks evolve from a flat mesh into a hierarchical design: a high-speed backbone connecting major sites (rooftops with point-to-point links, community centers, fiber connections) with mesh or Wi-Fi access at the edges. The backbone is stable, carefully engineered, and managed by experienced volunteers. It behaves more like a traditional network than a mesh.
When a community network peers with the internet. If your community network operates as a Wireless ISP or peers at an Internet Exchange Point (IXP), you need to speak BGP (Border Gateway Protocol) — the protocol that holds the global internet together. There is no alternative; BGP is the language of inter-domain routing.
When a community network needs traffic engineering. Mesh protocols find a path, usually the best one by their metric. But they offer limited control over which path traffic takes. OSPF and BGP provide tools for traffic engineering: route weights, path attributes, route filtering, and policy-based routing that let network operators shape traffic flows deliberately.
OSPF (Open Shortest Path First) is an interior gateway protocol (IGP) that uses link-state routing to compute shortest paths within an autonomous system. OSPF is the workhorse of enterprise and ISP internal routing, and it works beautifully for the backbone of a well-structured community network.
OSPF divides the network into areas. Each area maintains its own link-state database, and routers at area boundaries (ABRs — Area Border Routers) summarize routes between areas. This hierarchical design limits the size of each router’s link-state database and reduces flooding overhead.
For a community network backbone, a simple OSPF design might have:
If your community network obtains its own Autonomous System Number (ASN) and IP address space — which is entirely possible and increasingly common — you can peer with other networks at IXPs and exchange routes via BGP.
Several community networks operate as full-fledged ASNs:
Obtaining an ASN requires joining a Regional Internet Registry (RIR) — RIPE NCC in Europe, ARIN in North America, APNIC in Asia-Pacific. Community networks can typically qualify for membership with a letter of support from existing members. ASN allocation is inexpensive (often free or nominal cost through sponsoring LIRs — Local Internet Registries).
BIRD is the routing daemon of choice for community networks running BGP and OSPF. It is lightweight, powerful, and widely deployed at IXPs worldwide.
A minimal BIRD configuration for a community network with OSPF internally and BGP to an upstream peer:
# /etc/bird/bird.conf
router id 10.42.0.1;
protocol kernel {
ipv4 { export all; };
scan time 20;
}
protocol device { scan time 10; }
protocol ospf v2 {
ipv4 { import all; export all; };
area 0 {
interface "eth0" { cost 10; };
interface "wg0" { cost 100; };
};
}
protocol bgp upstream {
local as 65042;
neighbor 192.0.2.1 as 64500;
ipv4 {
import filter {
if net = 0.0.0.0/0 then accept;
reject;
};
export where source = RTS_OSPF;
};
}
This configuration runs OSPF on the internal network and BGP to an upstream provider, importing a default route from the upstream and exporting OSPF-learned routes. In practice, BGP configurations grow considerably more complex with route filtering, communities, local preference, and multi-exit discriminator (MED) attributes, but the core structure remains.
If you want to learn BGP without the risk of affecting real internet routing, DN42 (Decentralized Network 42) is an invaluable resource. DN42 is a decentralized VPN network that simulates the real internet’s routing architecture. Participants operate their own ASNs (from a private range), advertise IP prefixes (from private address space), and peer with each other via VPN tunnels.
DN42 uses the same protocols, the same tools (BIRD, FRRouting), and the same operational practices as the real internet. It is the closest thing to a full-scale internet routing lab that you can build at home, and it is an active community of network enthusiasts who are happy to help newcomers.
All the protocols we have discussed so far assume some degree of control over the physical or link-layer network. BATMAN-adv needs mesh interfaces. Babel needs routable IP connectivity. OSPF needs adjacent routers. But what if you do not control the underlying network? What if your nodes are scattered across the internet, behind NATs, firewalls, and ISPs that give you a single IPv4 address and nothing more?
This is where overlay networks shine. An overlay network creates a virtual network on top of an existing one — typically the internet. Nodes connect to each other through encrypted tunnels, and a virtual network topology is maintained regardless of the underlying physical topology. From the perspective of applications, the overlay is the network.
Tinc is one of the oldest mesh VPN projects, dating back to 1998. It creates a virtual private network where every node can communicate with every other node directly, without relying on a central VPN server. Tinc automatically builds a full mesh of encrypted tunnels, handles NAT traversal, and routes traffic through intermediate nodes when direct connections are not possible.
Tinc’s design is elegant: each node has a public/private key pair, and the key exchange is handled automatically. Nodes discover each other through a graph of configured connections — you configure node A to connect to node B, and node B to connect to node C, and Tinc will automatically discover that A can reach C through B (and will establish a direct connection between A and C if possible).
# Install and configure a tinc node
sudo apt install tinc
sudo mkdir -p /etc/tinc/mynet/hosts
# Generate keys
sudo tinc -n mynet generate-keys 4096
# Node configuration: /etc/tinc/mynet/tinc.conf
Name = node1
AddressFamily = ipv4
Interface = tun0
ConnectTo = node2
Tinc is stable and well-tested, but its development has been slow. Tinc 1.1 (with improved performance and a simpler protocol) has been in development for years. For new deployments, WireGuard-based solutions generally offer better performance.
WireGuard is a modern VPN protocol that has taken the networking world by storm. Its kernel-level implementation delivers dramatically better performance than userspace VPNs, and its cryptographic design is minimalist and auditable. But WireGuard itself is a point-to-point tunnel, not a mesh — it does not discover peers, build topologies, or route traffic between non-adjacent nodes.
Several projects build mesh functionality on top of WireGuard:
wg-meshconf generates WireGuard configurations for fully-meshed networks. You describe your nodes and their endpoints, and wg-meshconf produces a configuration file for each node with all the necessary peer entries. It is a configuration generator, not a dynamic protocol — when nodes change, you regenerate and redistribute configurations.
Netmaker is a more dynamic solution. It provides a management server that coordinates WireGuard mesh networks, handling key distribution, peer discovery, and NAT traversal. Nodes run a lightweight agent that maintains the WireGuard configuration. Netmaker can be self-hosted and supports both full mesh and hub-and-spoke topologies.
Innernet (by Tonari) focuses on private overlay networks for organizations, using WireGuard tunnels coordinated by a central server. It emphasizes simplicity and security over feature richness.
ZeroTier takes a fundamentally different approach from traditional VPNs. Rather than creating tunnels between specific pairs of nodes, ZeroTier creates virtual Ethernet networks — Layer 2 overlay networks where every member appears to be on the same LAN, regardless of their physical location.
ZeroTier’s architecture includes:
ZeroTier supports both the company-hosted controller (free for up to 25 devices per network) and self-hosted controllers. For alternative networks, self-hosting is essential to maintain sovereignty.
ZeroTier excels at creating geographically distributed LANs. A community network with nodes in different cities can put them all on the same ZeroTier network, and they will appear to be on the same Ethernet segment — enabling DHCP, mDNS, and other LAN protocols to work across the internet.
Tailscale is built on WireGuard and provides a managed mesh VPN with exceptional ease of use. Tailscale handles peer discovery, NAT traversal, key rotation, and access control through a coordination server. The client is open-source; the coordination server is proprietary.
Headscale is an open-source, self-hosted implementation of the Tailscale coordination server. It provides all of Tailscale’s mesh VPN functionality without depending on Tailscale’s infrastructure. For alternative networks, Headscale is the way to get Tailscale-quality mesh VPN with full sovereignty.
# Install Headscale (coordination server)
wget https://github.com/juanfont/headscale/releases/\
download/v0.23.0/headscale_0.23.0_linux_amd64.deb
sudo dpkg -i headscale_*.deb
# Create a user and pre-auth key
headscale users create community
headscale preauthkeys create --user community
# On each node — register with Headscale
tailscale up --login-server https://hs.example.com \
--authkey <pre-auth-key>
Once connected, nodes can reach each other directly via WireGuard tunnels, with Headscale coordinating the mesh but never carrying the traffic. Tailscale/Headscale also provides MagicDNS (automatic DNS for all nodes), subnet routing (advertising local LANs into the mesh), and exit nodes (routing internet traffic through specific nodes).
Overlay networks are the right choice when:
Overlay networks are the wrong choice when:
With so many protocols available, choosing the right one can feel overwhelming. The decision depends on several factors: the size of your network, the type of connectivity between nodes, whether you need encryption, and how much complexity you are willing to manage.
Here is a structured approach:
Step 1: What is your physical connectivity?
Step 2: How large is your network?
Step 3: Do you need encryption?
Step 4: What IP version do you need?
| Protocol | Layer | Type | Encryption | IPv4 | IPv6 | Scalability | Complexity | Best Use Case |
|---|---|---|---|---|---|---|---|---|
| BATMAN-adv | L2 | Mesh | None | ✓ | ✓ | ~200 nodes | Low | Small community mesh |
| Babel | L3 | Mesh | None | ✓ | ✓ | ~500+ nodes | Medium | Multi-gateway mesh |
| OLSR/OLSRv2 | L3 | Mesh | None | ✓ | ✓ | ~200 nodes | Medium | Legacy mesh networks |
| CJDNS | L3 overlay | DHT | Full | ✗ | ✓ | Large (theory) | High | Encrypted overlay |
| Yggdrasil | L3 overlay | Tree+DHT | Full | ✗ | ✓ | Large | Medium | Encrypted mesh/overlay |
| OSPF | L3 | IGP | None* | ✓ | ✓ | Large | High | Structured backbone |
| BGP | L3 | EGP | None* | ✓ | ✓ | Internet-scale | Very High | Internet peering |
| Tinc | L2/L3 | VPN mesh | Full | ✓ | ✓ | ~100 nodes | Medium | Small overlay mesh |
| WireGuard mesh | L3 | VPN mesh | Full | ✓ | ✓ | ~1000 nodes | Medium | Overlay with performance |
| ZeroTier | L2 | SDN overlay | Full | ✓ | ✓ | Large | Low | Distributed virtual LAN |
| Tailscale | L3 | VPN mesh | Full | ✓ | ✓ | Large | Very Low | Managed overlay mesh |
*OSPF and BGP can use IPsec or run over encrypted tunnels for security.
Neighborhood mesh for a dozen houses: BATMAN-adv on OpenWrt routers. Simple, robust, well-documented. Add WireGuard tunnels to a VPS if you need internet access through the mesh.
Rural community network spanning a valley: Babel on OpenWrt routers with directional antennas. Babel’s multi-gateway support handles multiple internet uplinks gracefully, and its scalability handles growth.
City-wide community ISP: Babel or OSPF for the backbone, BATMAN-adv for local access segments, BGP for internet peering. This hybrid architecture mirrors how mature networks like Guifi.net and Freifunk operate.
Privacy-focused encrypted network: Yggdrasil for an all-encrypted mesh. Nodes auto-discover on LAN and can peer over the internet. All traffic is encrypted, addresses are self-assigned, and no central authority is needed.
Connecting remote sites over the internet: Tailscale with Headscale, or WireGuard mesh with Netmaker. These provide encrypted connectivity with minimal configuration and excellent NAT traversal.
Disaster preparedness network: BATMAN-adv for wireless mesh (works with no internet), with Meshtastic/LoRa for long-range text fallback. Pre-configure everything so it works when you power it on during an emergency.
Learning and experimentation: DN42 for BGP/OSPF practice, Yggdrasil for encrypted overlay experimentation, BATMAN-adv for local mesh testing. All three can be run on virtual machines or containers for zero-cost experimentation.
Reading about routing algorithms is one thing. Watching them converge — seeing distance vectors propagate, routes update, and the network settle into a stable state — is something else entirely. Let us build a simple distance-vector routing simulator in Python that demonstrates the core concepts we have been discussing.
This simulator models a network of nodes connected by weighted links. Each node maintains a distance table (distance to every destination via each neighbor), and iteratively exchanges distance vectors with its neighbors until convergence. This is the same fundamental algorithm that powers Babel, RIP, and every other distance-vector protocol.
import random
from collections import defaultdict
class Node:
"""A node in a distance-vector routing network."""
def __init__(self, name):
self.name = name
self.neighbors = {} # neighbor -> link_cost
self.dist = {name: 0} # dest -> best_cost
self.nexthop = {name: name} # dest -> next_hop
def add_link(self, neighbor, cost):
self.neighbors[neighbor.name] = (neighbor, cost)
def get_distance_vector(self):
return dict(self.dist)
Each node starts knowing only how to reach itself (cost 0). The add_link method registers a neighbor with its link cost, and get_distance_vector returns the node’s current view of the network.
The update step is where the Bellman-Ford magic happens. Each node examines the distance vectors received from its neighbors, and for each destination, computes the cost of reaching it through each neighbor. If a shorter path is found, the routing table is updated:
def update(self):
"""Update routes from neighbor distance vectors.
Returns True if any route changed."""
changed = False
for nb_name, (nb_node, link_cost) in self.neighbors.items():
for dest, dest_cost in nb_node.get_distance_vector().items():
new_cost = link_cost + dest_cost
if dest not in self.dist or new_cost < self.dist[dest]:
self.dist[dest] = new_cost
self.nexthop[dest] = nb_name
changed = True
return changed
Now we build a network and run the simulation:
def simulate():
# Create nodes
nodes = {n: Node(n) for n in "ABCDEF"}
# Define topology: (node1, node2, cost)
links = [
("A","B",1), ("A","C",4), ("B","C",2),
("B","D",7), ("C","D",3), ("C","E",5),
("D","F",1), ("E","F",2),
]
for n1, n2, cost in links:
nodes[n1].add_link(nodes[n2], cost)
nodes[n2].add_link(nodes[n1], cost)
# Run iterations until convergence
for i in range(1, 20):
changed = False
for node in nodes.values():
if node.update():
changed = True
print(f"Round {i}: {'updated' if changed else 'converged'}")
if not changed:
break
# Print final routing tables
for name in sorted(nodes):
n = nodes[name]
print(f"\n{name}'s routing table:")
for dest in sorted(n.dist):
print(f" -> {dest}: cost={n.dist[dest]}, "
f"via {n.nexthop[dest]}")
simulate()
Running this simulation produces output like:
Round 1: updated
Round 2: updated
Round 3: updated
Round 4: converged
A's routing table:
-> A: cost=0, via A
-> B: cost=1, via B
-> C: cost=3, via B
-> D: cost=6, via B
-> E: cost=8, via B
-> F: cost=7, via B
Notice that node A reaches node C through B (cost 1 + 2 = 3) rather than directly (cost 4), because the two-hop path is cheaper. This is exactly the kind of non-obvious routing decision that makes distance-vector protocols valuable — and it mirrors how Babel chooses paths in a real mesh network based on link quality metrics.
The real power of routing protocols is their ability to adapt when things break. Let us extend the simulator to handle link failures:
def simulate_failure():
nodes = {n: Node(n) for n in "ABCDEF"}
links = [
("A","B",1), ("A","C",4), ("B","C",2),
("B","D",7), ("C","D",3), ("C","E",5),
("D","F",1), ("E","F",2),
]
for n1, n2, cost in links:
nodes[n1].add_link(nodes[n2], cost)
nodes[n2].add_link(nodes[n1], cost)
# Converge initially
for _ in range(20):
if not any(n.update() for n in nodes.values()):
break
print("Before failure — A->F via:", end=" ")
n = nodes["A"]
hop = "A"
while hop != "F":
hop = nodes[hop].nexthop["F"]
print(f"-> {hop}", end=" ")
print(f"(cost {n.dist['F']})")
This traces the path from A to F through the network, showing each hop. When we remove a link (simulating a node going offline or a wireless link degrading), the protocol reconverges and finds an alternative path — demonstrating the self-healing property that makes mesh routing protocols essential for alternative networks.
This toy simulator captures the essence of distance-vector routing: local information exchange leading to global convergence. Real protocols like Babel add feasibility conditions (to prevent loops), sequence numbers (to handle stale routes), and sophisticated metrics (to measure link quality). But the core algorithm — each node shares what it knows, and every node independently computes the best paths — is exactly what you see in this simulator.
The key insights:
In practice, most successful alternative networks do not use a single routing protocol — they layer multiple protocols, each handling a different scope or function. A mature community network might look like this:
Local mesh layer: BATMAN-adv on OpenWrt routers provides Layer 2 mesh within each neighborhood or building cluster. Clients connect via Wi-Fi and get DHCP addresses from a local server. This layer is entirely autonomous — it works even if the wider network is down.
Inter-site routing layer: Babel or OSPF runs on backbone nodes (rooftop installations with point-to-point links) connecting the mesh clusters. This layer aggregates the local meshes into a single routed network. Each BATMAN-adv segment appears as a subnet to the Babel/OSPF layer.
Internet peering layer: BGP (via BIRD) handles peering with upstream ISPs or at an Internet Exchange Point. Default routes from BGP are redistributed into the interior routing protocol, so every node in the mesh can reach the internet.
Encrypted overlay layer: Yggdrasil or WireGuard/Headscale connects nodes that are not physically adjacent — volunteer nodes in other cities, cloud servers providing services, or remote members who contribute resources but are not physically present. This layer provides encrypted connectivity independent of the physical mesh.
Application layer: Services like DNS, Matrix (chat), IPFS (file sharing), and web hosting run on nodes within the network, reachable via all layers.
This architecture is not theoretical — it describes, in simplified form, how networks like Freifunk, Guifi.net, and NYC Mesh actually operate. Each layer handles what it does best, and the interfaces between layers are clean and well-defined.
Deploying a routing protocol is not a one-time event. Like any infrastructure, it requires ongoing attention:
Deploy and test in a small environment first. Set up three nodes on a table, verify convergence, break links intentionally, and observe recovery. Do not deploy on rooftops until you have tested on tabletops.
Monitor continuously. Every protocol provides status commands (batctl originators, birdc show route, yggdrasilctl getPeers). Set up automated monitoring that alerts you when routes change, paths lengthen unexpectedly, or nodes become unreachable.
Tune parameters gradually. Do not change ten settings at once. Adjust one parameter (OGM interval, Hello interval, metric weights), observe the effect for at least a day, and then decide whether to keep the change.
Document everything. When you figure out why the mesh was routing traffic through a three-hop path instead of a one-hop path, write it down. Future you — or the next person maintaining the network — will thank you.
Plan for growth. The protocol that works perfectly for 10 nodes may struggle at 100. Know the scaling limits of your chosen protocol and have a migration plan. BATMAN-adv meshes that outgrow their protocol can be segmented and interconnected with Babel. OLSR networks can migrate to Babel. Babel networks can add OSPF/BGP backbone layers.
The routing landscape continues to evolve. Several trends are worth watching:
Yggdrasil’s maturation. As Yggdrasil approaches version 1.0, it is becoming a serious contender for encrypted mesh routing in community networks. Its combination of automatic topology discovery, end-to-end encryption, and reasonable performance addresses many needs that previously required multiple protocols.
WireGuard everywhere. WireGuard’s inclusion in the Linux kernel and its availability on every major platform means that encrypted tunnels are becoming a commodity. Tools like Headscale, Netmaker, and Nebula are making WireGuard mesh networks as easy to deploy as centralized VPNs.
Babel’s continued refinement. The Babel protocol continues to improve, with work on better metrics, faster convergence, and integration with segment routing. Its RFC standardization (RFC 8966) gives it legitimacy that helps community networks when dealing with regulators and institutions.
Intent-based networking. Research into protocols that route based on application intent (latency for voice, throughput for file transfers, reliability for control systems) could dramatically improve the quality of service on alternative networks, which currently treat all traffic identically.
The fundamental challenge of routing — finding good paths through dynamic networks with limited information — will never be fully “solved.” But the tools available today are remarkably capable, and they continue to improve. The alternative network builder’s challenge is not finding a protocol that works — it is choosing among many excellent options.
What matters most is not which protocol you choose, but that you understand why you chose it. A network built with intention — where every layer, every protocol, and every configuration parameter reflects a conscious decision — is a network that will serve its community well.
| ← Previous: Community Networks and Wireless ISPs | Table of Contents | Next: Decentralized Applications and Services → |