I think it’s mostly the same one, but it’s a part of 4-port Ethernet card so I’m not sure it matters. There should not be any conflicts I think, when everything is working as it should, the host has only one interface visible, br0 everything physical goes to the container.
I will get back to you with logs when I get a chance, it will take some doing because of my config.
I should add that this does not happen every time, but I’ve started to just reboot the whole host when needed.
So I had a look at the liblxc source code and found this:
I wonder if NIC is clashing with another NIC inside the container and then not being renamed so that when its moved back LXD doesn’t recognise it to rename it back to its original name.
Name: router
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2021/11/03 23:19 EET
Last Used: 2022/04/08 12:35 EEST
Log:
lxc router 20220408101124.376 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 2 from "eth3" to its initial name "enp4s0f0"
lxc router 20220408101124.379 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 3 from "eth4" to its initial name "enp4s0f1"
When the container is stopped LXC will move the network device back to the host. In order to that it will use a “transient” name which it has used during interface creation. It’s basically a low-effort way to avoid name collisions on the host when moving a network device back that usually has a high-collision probability name such as “eth0” in the container.
In the final step it is renamed from the transient name to its original name on the host. Since the rename step fails after the device has been moved back it makes it somewhat likely that it’s a naming collision, i.e. it’s original hostname has been taken by another device.
I guess I can test later, but I will always have multiple nics in the container, it is a router/switch after all.
That collision thing seems probable. If I disable Predictable Network Interface Names, Host nics stay as eth0 etc. instead of enps, and then the container wont start at all.
Perhaps I could try renaming container nics to eth01 etc do avoid collision.
Ok, I did test this by removing all but one physical NIC from the container and adding them back one by one. The problems start when adding the third one of four.
No idea, I have Ubuntu server with minimal extra packages. Predictable Network Interface Names does this on boot of course, but like I said before, If I disable that the LXC container wont start even once and I have those phys*** nics listed when it has tried.
I’m using only systemd-networkd with only br0 configured if that makes any difference. And I compile lxd from source, I don’t have snap installed.
I did find evidence of interface name changes on my failing LXD 5.0 box.
I have just 1 NIC that I use (also a WiFi, but it was never configured in Netplan).
I got this (and then some) problems after a host reboot for maintenance. Now I’m getting a bunch of different errors on Ubuntu 20.04 with LXD 5.0 and containers can’t start.
I am using snapd with latest/stable; this was the first orderly restart after (unexpected) upgrade of LXD to 5.0 by snapd (I shouldn’t have used latest/stable, 4.0 worked fine before).
Anyway, as far as one of the issues - the network interface name change - is concerned, I see this in syslog:
Apr 15 06:42:45 server kernel: [ 43.465960] device vethe34f7be2 entered promiscuous mode
Apr 15 06:42:45 server zed: eid=15 class=history_event pool_guid=0x7FA2A2DA6C8B7235
Apr 15 06:42:45 server zed: eid=16 class=history_event pool_guid=0x7FA2A2DA6C8B7235
Apr 15 06:42:45 server zed: eid=17 class=history_event pool_guid=0x7FA2A2DA6C8B7235
Apr 15 06:42:45 server kernel: [ 43.582430] audit: type=1400 audit(1650004965.651:61): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-mycont </var/snap/lxd/common/lxd>" pid=14896 comm="apparmor_parser"
Apr 15 06:42:45 server kernel: [ 43.660602] physi2Ry4Y: renamed from vetha32d6312
Apr 15 06:42:45 server kernel: [ 43.684939] eth0: renamed from physi2Ry4Y
Apr 15 06:42:45 server systemd-networkd[1579]: vethe34f7be2: Gained carrier
Apr 15 06:42:45 server kernel: [ 43.709518] lxdbr0: port 2(vethe34f7be2) entered blocking state
Apr 15 06:42:45 server kernel: [ 43.709520] lxdbr0: port 2(vethe34f7be2) entered forwarding state
Apr 15 06:42:48 server systemd[1]: systemd-hostnamed.service: Succeeded.
I looked around and it seems this could be related to systemd (current version 250, current version in Ubuntu 20.04 is 245 and at that time they still didn’t publish change logs so it’s hard to figure out what changed in 246 or 247, for example). I also looked at udev rules, but didn’t find any unusual, the same with networkd-dispatcher (running with in verbose mode revealed nothing new).
I also tried configuring systemd-network to start after udev. No difference.
[Unit]
After=systemd-udev-settle.service
My last try was to change default bridge to br0 and bring br0 up via Netplan rather than leave it to LXD, but that doesn’t help.
Sorry for replying to an old topic. I had the same trouble as the poster of this topic.
And today I am very lucky. I found this thread.
@Pieter I followed your advice and my container rebooted successfully.
I am a beginner in both Linux and LDX (and English), so I didn’t even know how to look for tips to solve the problem. Unfortunately, I don’t even really understand what the settings I made this time mean.
Regardless, I am so full of gratitude to you and all the people on this thread that I immediately created an account on this forum.