Veth router mode - why no src in the static route?

sloshdots · November 19, 2021, 6:56pm

I run lxc on a Gentoo host and created a container (Alpine Linux) with the following network setup:

lxc.net.0.type = veth
lxc.net.0.veth.mode = router
lxc.net.0.link = vl291
lxc.net.0.veth.pair = vl291get
lxc.net.0.name = eth0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 02:b6:aa:9b:87:29
lxc.net.0.ipv4.address = 192.168.191.4/32
lxc.net.0.ipv4.gateway = auto
lxc.net.0.l2proxy = 1

The host is connected to two networks – ip -br a s up (after container start):

lo               UNKNOWN        127.0.0.1/8 ::1/128 
vl191            UP             xxx.yyy.zzz.3/24 fe80::6e2b:59ff:feb0:2534/64 
vl291            UP             192.168.191.3/24 fe80::6e2b:59ff:feb0:2535/64 
vl291get@if2     UP             fe80::fc91:d2ff:fed6:1400/64

(xxx.yyy.zzz.3 is a public IP address). A static route is created, by LXC, to the container IP address (as expected), ip r:

default via xxx.yyy.zzz.1 dev vl191 metric 2 
xxx.yyy.zzz.0/24 dev vl191 proto kernel scope link src xxx.yyy.zzz.3 
192.168.191.0/24 dev vl291 proto kernel scope link src 192.168.191.3 
192.168.191.4 dev vl291get scope link

However, the static route is missing a src to set the source address – and with my setup ip route get 192.168.191.4 (from host) outputs:

192.168.191.4 dev vl291get src xxx.yyy.zzz.3 uid 0
    cache

So when the host communicates with the container it will use the IP address of the other interface (i.e. not the IP address of the interface set by lxc.net.0.link).

In my case this makes the connection between host and container non functional – because I also use policy based routing with the following rules (ip rule):

0:	from all lookup local
32764:	from 192.168.191.3 lookup vl291 realms vl291/vl291
32765:	from xxx.yyy.zzz.3 lookup vl191 realms vl191/vl191
32766:	from all lookup main
32767:	from all lookup default

So, the result is that packets from the host to the container will enter into the wrong table, vl191 (instead of vl291). To fix this I ran:

ip route change 192.168.191.4 dev vl291get scope link src 192.168.191.3

and now:

$ ip r g 192.168.191.4
192.168.191.4 dev vl291get src 192.168.191.3 uid 0 
    cache

To be complete, I also had to add a static route in table vl291 in order to get my setup working (similar to the solution in #7152). Before this change table vl291 looked like this:

$ ip route show table vl291
default via 192.168.191.1 dev vl291 metric 3 
192.168.191.0/24 dev vl291 scope link src 192.168.191.3 metric 3

i,e, all packets would go to interface vl291 (including those to 192.168.191.4) – which can also be verified by running:

$ ip r g 192.168.191.4 from 192.168.191.3
192.168.191.4 from 192.168.191.3 dev vl291 table vl291 realms vl291/vl291 uid 0 
    cache

To fix this I added the following static route in table vl291:

ip route add table vl291 192.168.191.4 dev vl291get src 192.168.191.3

In order to permanently solve this issue I run the above two ip-route commands in lxc.hook.start-host. There are probably several other, perhaps better, ways to solve it… – but, my main question is if the lack of src address in the static route (added by LXC) is intentional? I would rather expect it to be set to the IP address of the interface defined by lxc.net.0.veth.link.

tomp · November 22, 2021, 9:23am

Its not intentional per-se, but then neither would be always adding the source address of the linked parent interface.

However, what may be more appropriate is to set the source address of the inbound route to the same address as is used for the container’s gateway. This way when the host communicates with the container it will appear to be coming from the gateway (which it technically is).

In this case you have lxc.net.0.ipv4.gateway = auto which will use the linked parent interfaces IP as gateway, and in that case would then set the source address of the inbound route to the same IP.

In LXD we always add an IP address to the host-side interface of the veth pair (for IPv4 its the link-local 169.254.0.1) and use this as the gateway, and so the source address used for host->container comms is always that IP for host originated traffic. So this issue doesn’t come up AFAIK.

It also has the benefit that the IP address of the host’s external interface doesn’t matter.

In LXC I use a start up hook like this to get a similar behaviour:

/usr/share/lxc/hooks/lxc-router-up

#!/bin/sh
if [ -z "${LXC_NET_PEER}" ]
then
        echo "LXC_NET_PEER not set"
        exit 1
fi

sysctl net.ipv6.conf."${LXC_NET_PEER}".autoconf=0
sysctl net.ipv6.conf."${LXC_NET_PEER}".accept_dad=0
sysctl net.ipv6.conf."${LXC_NET_PEER}".accept_ra=0
sysctl net.ipv6.conf."${LXC_NET_PEER}".dad_transmits=0
sysctl net.ipv6.conf."${LXC_NET_PEER}".addr_gen_mode=1
ip a flush local dev "${LXC_NET_PEER}" scope link
ip a add fe80::1/64 dev "${LXC_NET_PEER}"
ip a add 169.254.0.1 dev "${LXC_NET_PEER}"

And a container NIC config like this:

lxc.net.0.type = veth
lxc.net.0.veth.mode = router
lxc.net.0.l2proxy = 1
lxc.net.0.link = eth1
lxc.net.0.flags = up
lxc.net.0.name = eth0
lxc.net.0.script.up = /usr/share/lxc/hooks/lxc-router-up
lxc.net.0.ipv4.gateway = 169.254.0.1
lxc.net.0.ipv6.gateway = fe80::1
lxc.net.0.ipv4.address = 192.168.1.15/32 0.0.0.0
lxc.net.0.ipv6.address = 2a02:nnnn:nnnn:1::15/128

You could open an issue over at https://github.com/lxc/lxc/issues to request that we add source address of the route based on the gateway setting.

sloshdots · November 26, 2021, 11:33am

Thanks for your answer and the example – it helped to understand some of the LXD example on this. I see how it could be useful to give the parent interface an IP address in some situations (as in your example). Since I am still experimenting on how to set this up, it would be interesting to hear if you see any obvious disadvantages of not doing that (as I did above)?

I opened #4037 about adding a source address to the static route.

tomp · November 26, 2021, 11:37am

Well in your example the gateway inside the container maps to one of the host’s IPs so should be fine.
However I found when adding routed NIC to LXD, that the guest OS would periodically send ICMPv6 packets to the gateway IP for liveness detection, and if just using proxy NDP without the gateway IP actually responding, it would cause periodic packet loss until the NDP resolution occurred again.