LXD: Network interfaces get renamed, container restart fails

Hello,

I’ve had this problem as long as I can remember. Now using Ubuntu 22.04 Beta and LXD 5.

I pass through all my physical NICs to a Openwrt container. This works fine the first time I boot the host. But when I try to restart the openwrt container with “lxc restart”, some of the parent NICs get renamed to something seemingly random like phys****** and lxc fails to start as parent NIC does not exist.

The phys****** adapter does have the correct MAC and has a property “altname” which does have the real interface name.

Would it be possible to passthrough the NIC with MAC instead of Ifname? Or is there anything else I could do to stop this behavior? I tried disabling Predictable Network Interface Names, but with this the NICs get renamed phys****** during host boot and the container wont start once.

Is it always the same parent NICs that get renamed?
Are there any conflicting interfaces on the parent when the container gets restarted?

Please can you show the output of lxc config show <instance> --expanded along with the output of ip a before and after the container has been started and then restarted.

This isn’t possible at this time, we should focus on fixing the bug thats causes them to be renamed.

I think it’s mostly the same one, but it’s a part of 4-port Ethernet card so I’m not sure it matters. There should not be any conflicts I think, when everything is working as it should, the host has only one interface visible, br0 everything physical goes to the container.

I will get back to you with logs when I get a chance, it will take some doing because of my config.

I should add that this does not happen every time, but I’ve started to just reboot the whole host when needed.

So I had a look at the liblxc source code and found this:

I wonder if NIC is clashing with another NIC inside the container and then not being renamed so that when its moved back LXD doesn’t recognise it to rename it back to its original name.

Got it to fail on the very first restart, Here is the info you asked for:

Lost 2 interfaces this time.

This very well might have something to do with my Open-wrt container, I use the same PC as my server and router.

As a side note, how does the new feature “Startup with degraded networking” work? Shouldn’t it start the container even if the NICs are missing?

No its for allowing LXD to start without starting all its managed networks, not for allowing an instance to start without all its devices.

Can you reproduce this by running lxc stop <instance> and then show the output of lxc info <instance> --show-log?

It just shows:

Name: router
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2021/11/03 23:19 EET
Last Used: 2022/04/08 12:35 EEST

Log:

lxc router 20220408101124.376 WARN     network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 2 from "eth3" to its initial name "enp4s0f0"
lxc router 20220408101124.379 WARN     network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 3 from "eth4" to its initial name "enp4s0f1"


OK well thats something, we can see its liblxc having trouble renaming the interface.

@brauner do you have any idea why lxc_netdev_rename_by_index would fail renaming an interface back to the host side name when the container stops?

It can happen if there’s a network device on the host with the same name. Other than that it’s not obvious what would cause it.

When the container is stopped LXC will move the network device back to the host. In order to that it will use a “transient” name which it has used during interface creation. It’s basically a low-effort way to avoid name collisions on the host when moving a network device back that usually has a high-collision probability name such as “eth0” in the container.

In the final step it is renamed from the transient name to its original name on the host. Since the rename step fails after the device has been moved back it makes it somewhat likely that it’s a naming collision, i.e. it’s original hostname has been taken by another device.

1 Like

Perhaps something on the host is renaming an earlier NIC to the same name as a latter NIC to be removed that is causing the conflict.

Does it only happen if you have multiple NICs in your container?

I guess I can test later, but I will always have multiple nics in the container, it is a router/switch after all.

That collision thing seems probable. If I disable Predictable Network Interface Names, Host nics stay as eth0 etc. instead of enps, and then the container wont start at all.

Perhaps I could try renaming container nics to eth01 etc do avoid collision.

I’m not saying that is a problem, but it may indicate what is happening, perhaps something on the host is restoring them to the same name.

Ok, I did test this by removing all but one physical NIC from the container and adding them back one by one. The problems start when adding the third one of four.

I wonder if something on your host machine is renaming the NICs as they are added back to the host, causing the conflict.

No idea, I have Ubuntu server with minimal extra packages. Predictable Network Interface Names does this on boot of course, but like I said before, If I disable that the LXC container wont start even once and I have those phys*** nics listed when it has tried.

I’m using only systemd-networkd with only br0 configured if that makes any difference. And I compile lxd from source, I don’t have snap installed.