"routed" nictype not forwarding DHCP requests / cannot use dynamic address for containers using routed nictype

Darcache · January 5, 2021, 12:26pm

Hello,

Thanks to the feedback from this community (thanks @Tomp), I tried to use the “routed” nictype to solve the following use case.

A baremetal server is having 2 network interfaces:

eth0 with dynamic IP address provided by the company hosting the server. This enables containers to access the internet through lxdbr0. Containers receive a 192.168.xx.xx IP address on their own eth0 through LXD internal DNSmasq (I think). This is pretty standard and works correctly.
tap0 is a peervpn managed interface providing access to an overlay network over which other containers are connected, both locally and remotely (hosted on other baremetal servers). A DHCP server is active on the overlay network to provide dynamic IP address to any computer/container connected to the overlay network.

Each container managed by LXD on this server is therefore having 2 interfaces:

eth0 connected to LXD bridge, receiving dynamic IP address. This works flawlessly
eth1 connected to the overlay network, receiving dynamic IP address from the DHCP server connected to the overlay network. It is important to note that the DHCP server is located on another baremetal server.

The previous setup was based on a bridge interface on top of tap0, which provided all containers seamless access to the overlay network. With the deprecation of IP aliasing and bridge-utils, this solution does not work for recent Debian/Ubuntu versions.
Based on previous feedback (Unable to create Slack/Nebula bridge to use as overlay network for LXD containers) we tried skipping the creation of abridge interface on top of tap0 (peervpn) and adding a “routed nictype” to containers to provide eth1 access to tap0 through the following command:

lxc config device add mycontainer eth1 nic nictype=routed parent=tap0

This does not work, meaning the eth1 within the container is not able to receive an IP address from the overlay network DHCP server, nor is able to ping the local tap0 IP address. The DHCP server do not “see” the DHCP request from the container.
Even manually setting the eth1 IP address in the container after starting it won’t help “see” the overlay network.

However, it works when we specify an IP address while adding the eth1 to the container AND manually set the route to the overlay network:

lxc config device add mycontainer eth1 nic nictype=routed parent=tap0 ipv4.address=$overlay_network_static_ip
(in the container) route add -net etc…

In this case the container can see the overlay network, resolve internal hostnames ping tap0 IP address, ping any IP accessible on the VPN. Unfortunately, if we manually change the container IP address after boot, or if we try to claim a dynamic IP address, access to the VPN is lost, as previously. The IP address within the container must be the same IP as the one declared when adding the NIC.

The problem is that we cannot use static IP addresses, so the solution of manually setting each container IP address is not sustainable.

Is it possible through “routed” nictype to send/receive/forward DHCP requests, and if not which nictype would be able to do so?

As mentioned earlier the current setup, based on deprecated Debian/Ubuntu versions, relied on the creation of a br0 bridge to allow all containers to connect to tap0 seamlessly, effectively hiding the fact that tap0 is an overlay VPN shared across multiple servers.

Below is the setup, hopefully the format will be readable (edit: no it’s not, linked to image instead).

Thanks for your help,
Best (and happy new year)

tomp · January 5, 2021, 1:45pm

What you describe is how the routed NIC type is designed to work namely:

Isolates containers into their own layer 2 domain (so broadcast packets like DHCP are not sent to other instances).
Prevent IP spoofing by only allowing the externally specified IP to be routed to the instance NIC.

I’m confused why you can’t connect the tap device to another bridge and then perform DHCP on it inside each container (with the proviso that the overlay DHCP server isn’t providing a default route that would cause issues with the one being provided by the DHCP server on lxdbr0).

In fact LXD has the ability to add external interfaces to an LXD managed bridge, i.e:

lxc network create lxdbr1 \
    bridge.external_interfaces=tap0  \
    ipv4.address=none ipv4.dhcp=false \
    ipv6.address=none ipv6.dhcp=false

This would create a new bridge without an IP address or DHCP service, and then connect the tap0 interface to it, so that DHCP requests from the containers would be sent onto the tap0 interface.

I’ve not tried it, but you could also try using the macvlan NIC type connected to the tap0 interface, e.g.

lxc config device add <instance> eth0 nic nictype=macvlan parent=tap0

This would connect your containers to the tap0 interface at layer 2 without an intermediate bridge.
The possible drawback to this approach (assuming it works) is that macvlan NICs prevent communication with the host, whereas using an intermediate bridge you could replace ipv4.address=none with ipv4.address=<static IP on overlay network> and the containers could also contact the host if needed.

Darcache · January 5, 2021, 5:52pm

Hello Tom, savior of the day, again

tomp:

In fact LXD has the ability to add external interfaces to an LXD managed bridge, i.e:
lxc network create lxdbr1 \
    bridge.external_interfaces=tap0  \
    ipv4.address=none ipv4.dhcp=false \
    ipv6.address=none ipv6.dhcp=false

Indeed that did the trick.
We dropped the routed NIC and created the bridge via LXD (the tap0 VPN interface had to be deactivated first), using your command above, and then added the eth1 to the container via the following command:

lxc config device add mycontainer eth1 nic name=eth1 nictype=bridged parent=lxdbr1

I will check whether LXD need to be started before or after the VPN at (baremetal) boot time so that everything works.

Many thanks!

tomp · January 5, 2021, 5:54pm

Glad it worked!
I would expect the VPN would need to be started first so that LXD can add the tap interface to the bridge when it starts up.

Otherwise (or as well as) you’d need to get peervpn to run a hook to add it to the lxdbr1 bridge when it starts up.

Darcache · January 5, 2021, 6:19pm

On the contrary, LXD won’t bind the bridge interface lxdbr1 to tap0 is tap0 is already configured.
So it is LXD first, then VPN mount.

tomp · January 5, 2021, 6:19pm

What error do u see? Perhaps it’s because it has an IP?

Darcache · January 5, 2021, 6:27pm

The error message is “Only unconfigured network interfaces can be bridged”.
Indeed PeerVPN requires that an IP address be allocated to the VPN interface, so as soon as it is activated/up, the interface is unclaimable by LXD.

tomp · January 5, 2021, 6:42pm

Right that makes sense. Do you remove the ip in order to add the tap interface to the bridge?

Ideally you would get peervpn to not add the ip to the tap and instead use that IP as the ip of the lxdbr1 interface.

Darcache · January 5, 2021, 6:53pm

Good points.
In the previous setup (manually created bridge with bridge-utils) the IP address allocated to tap0 became unreachable as soon as the bridge was mounted. Which is not an issue since the bridge is only useful as a gateway to the VPN.
The behavior is similar using “your” command above, although the IP still appears allocated to tap0, it is unreachable from the VPN.

At this stage “it works” and this is good enough for me, thanks again.