Device `ipv4.routes`: route on host has no next hop

(Incus 6.23 here).

Let me explain my issue by means of example. I have a container, named “cndo”, sitting on managed bridge incusbr0. The host has address 10.11.12.1/24 on incusbr0, and the container’s eth0 has picked up address 10.11.12.58.

Now I want to static-route a subnet of IPs to that container (because the container itself acts as a router and has an additional subnet inside it). So I add an ipv4.routes attribute to its NIC:

incus config device override cndo eth0 ipv4.routes=192.168.100.0/24

What I find is: a route has been created on the host, but with no next-hop:

nsrc@brian-kit:~$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
...
192.168.100.0   0.0.0.0         255.255.255.0   U         0 0          0 incusbr0

Therefore, traffic sent to 192.168.100.x isn’t forwarded to the container. Instead, the host sends ARP broadcasts on incusbr0 as if the subnet were directly connected.

To demonstrate: on the host I ping 192.168.100.123, and in another window I run tcpdump, then I see:

nsrc@brian-kit:~$ sudo tcpdump -i incusbr0 -nn arp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on incusbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:25:13.901589 ARP, Request who-has 192.168.100.123 tell 10.11.12.1, length 28
20:25:14.920588 ARP, Request who-has 192.168.100.123 tell 10.11.12.1, length 28
20:25:15.944550 ARP, Request who-has 192.168.100.123 tell 10.11.12.1, length 28

What I was expecting to see was that the routes added by incus on the host would have 10.11.12.58 as the next-hop (i.e. the container’s IP address).

# Expected:
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.100.0   10.11.12.58     255.255.255.0   U         0 0          0 incusbr0

The documentation says of the bridged NIC ipv4.routes option:

Comma-delimited list of IPv4 static routes to add on host to NIC [my emphasis]

In order to send traffic for this subnet “to [this particular] NIC” the route needs a next-hop of the NIC’s IP address.

Depending on how this is intended to work, I’d like to raise either a feature request (for routing a subnet to a container) or a bug report.

The thing is, I can’t think of any way in which the current behaviour could ever be used. It’s not even useful as a way to send traffic to a container on incusbr0 which has configured a 192.168.100.x address on its eth0 interface, because the container has no way to respond: the host itself has no IP on 192.168.100. (Note that the ARP broadcasts shown by tcpdump above have a source IP which is not within the 192.168.100 subnet!)

So to me, it seems like a bug. But I thought I’d raise it here for discussion first.

Thanks,

Brian.

P.S. There is a separate ipv4.routes option that you can apply to the bridge itself, as opposed to the container NIC. As far as I can see, this has the same problem: with no next-hop, and no IP address on the host which belongs to the subnet, there doesn’t seem any possible way in which it can be useful as it is right now.

In this case, there is no implicit next-hop - whereas there is when a route is attached to a container NIC. Hence I’d expect the next-hop to be specified as part of the setting, but it doesn’t seem possible:

nsrc@brian-kit:~$ incus network set incusbr0 ipv4.routes '192.168.101/24 10.11.12.58'
Error: Invalid value for network "incusbr0" option "ipv4.routes": Item "192.168.101/24 10.11.12.58": invalid CIDR address: 192.168.101/24 10.11.12.58
nsrc@brian-kit:~$ incus network set incusbr0 ipv4.routes '192.168.101/24 via 10.11.12.58'
Error: Invalid value for network "incusbr0" option "ipv4.routes": Item "192.168.101/24 via 10.11.12.58": invalid CIDR address: 192.168.101/24 via 10.11.12.58

That would be a documentation bug, the behavior you’re seeing is the expected one.

Basically Incus can’t reliably know what IP address the instance will have or even its MAC address as outside of things like a static DHCP lease + a bunch of security features (mac_filtering, ipv4_filtering, ipv6_filtering), the container can pretty freely alter all of those.

You can obviously query the actual information, that’s what we show in incus list, but that’s retrieved on demand by Incus and isn’t data that’s otherwise retrieved in the background.

(Note that OVN likely is different in that regard as with OVN IP addresses are assigned to specific logical switch ports with static routes then being tied to the logical switch port too and so a next-hop can be derived that way, though that’s all OVN-internal as with OVN, the host’s routing table is never touched)

That is true. However, if a NIC has an ipv4.address configured, it does seem to be reasonable to use that for next hop, as it’s the address incus expects a well-behaved container to use. (And if the container chooses a different address, then hard luck - it won’t receive the traffic).

Can you describe a scenario where the current behaviour is useful to deploy? I can’t think of one. I am just wondering why the ipv4.routes feature is there, either on NIC or on bridge.

(If you were able to set on the bridge e.g. ipv4.address.secondary = 192.168.100.1/24 then I could see how that might be used, but the current feature only seems to allow one-way communication from host to container network)

Anyway, back to what I’m trying to do. What I want is a route on the host, 192.168.100.1/24 via 10.11.12.58, which is added to the host’s routing table when incusbr0 is brought up. I can’t do this in netplan, because it’s incus that creates the bridge. I can’t see any hook script which incus calls when the bridge is up. I’d rather not run a dynamic routing protocol or OVN just for this simple requirement. Is there an option I’ve missed?

Here’s an outline feature proposal (described from the point of view of v4, but same applies to v6)

  • ipv4.routes takes an optional extra gateway parameter separated by a space, e.g. 192.168.0.0/24 1.2.3.4,172.16.0.0/24 5.6.7.8
  • If not specified the default is 0.0.0.0, which makes it fully backwards-compatible
  • (Optional extra , for convenience) the special value * for gateway means “the ipv4.address configured on this NIC (if any)”.

I’ve had a quick look at the code. type Route (internal/server/ip/route.go) has a Via net.IP member already, so AFAICS it just needs plumbing in func networkNICRouteAdd (internal/server/device/device_utils_network.go) and func bridge.setup (internal/server/network/driver_bridge.go)

I found an ugly workaround: networkd-dispatcher notices when the incusbr0 bridge has been created.

==> /etc/networkd-dispatcher/routable.d/50-cndo <==
#!/bin/sh
[ "$IFACE" = "incusbr0" ] && ip route add 192.168.0.0/24 via 10.11.12.58
stgraber@castiana:~$ incus launch images:debian/13 d13
Launching d13
stgraber@castiana:~$ incus config device override d13 eth0 ipv4.routes=1.2.3.4/32
Device eth0 overridden for d13
stgraber@castiana:~$ incus exec d13 -- ip -4 a add dev eth0 1.2.3.4/32
stgraber@castiana:~$ ping 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.
64 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from 1.2.3.4: icmp_seq=2 ttl=64 time=0.068 ms
^C
--- 1.2.3.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1062ms
rtt min/avg/max/mdev = 0.068/0.070/0.072/0.002 ms
stgraber@castiana:~$ 

In your case, enabling proxy_arp should get the container responding to ARP requests for those addresses and then forward the traffic onwards without needing to have a next-hop set.

I don’t love the configurable next-hop as validation may get fun since we’d want to make sure that the next-hop is on the bridge, but then that means that bridge subnet changes need to go and validate all the ipv4.routes entries, … It gets messy and slow pretty quickly.

But having the next-hop be set to the relevant ipv4.address or ipv6.address if present, that should be pretty safe and easy to do. Can you file a feature request for that?

Done: NIC ipv[46].routes: use ipv[46].address as next-hop · Issue #3156 · lxc/incus · GitHub