Error: Failed start validation for device "enp3s0f0": Instance DNS name "net17-nicole-munoz-marketing" already used on network

net13 # lxc start net13-template-focal 
Error: Failed start validation for device "eno1": Instance DNS name "net13-template-focal" already used on network
Try `lxc info --show-log net13-template-focal` for more info

net13 # lxc config show net13-template-focal --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20200402_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20200402_07:42"
  image.type: squashfs
  volatile.base_image: 47e9e45537ddf24cc2c5c13c00c3c4dbf36ec188b2598b560b714bc266f79834
  volatile.eno1.hwaddr: 00:16:3e:ce:b0:46
  volatile.eno1.name: eth1
  volatile.eth0.hwaddr: 00:16:3e:8b:c7:89
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
  volatile.uuid: e0006a6a-83fa-4dbb-a2f1-0f5651d2818d
devices:
  eno1:
    nictype: bridged
    parent: lxdbr0
    type: nic
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
  tools:
    path: /david-favor
    source: /david-favor
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

So the issue is you have two NICs, eno1 and eth0, connected to the same parent bridge, lxdbr0.

This can cause dns name conflicts in dnsmasq and so lxd 5.7 added a start time check for this scenario.

What is the reason you have 2 NICs connected to the same bridge?

It appears you’ve correctly expressed the exact nature of the bug…

net17 # ip link | head
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether d4:5d:64:3f:ff:24 brd ff:ff:ff:ff:ff:ff
3: enp3s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d4:5d:64:3f:ff:25 brd ff:ff:ff:ff:ff:ff
4: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:7f:38:43 brd ff:ff:ff:ff:ff:ff
6: veth23aef49a@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP mode DEFAULT group default qlen 1000
    link/ether da:19:20:44:85:0f brd ff:ff:ff:ff:ff:ff link-netnsid 0

So there are no eno* or eth* interfaces + never have been on this machine.

Only the above interfaces exist, so this appears to be 5.7 mistakenly “guessing” at what base interface names “should” be, rather than looking them up.

No, in your instance config you have shown two NIC devices both connected to the same lxdbr0:

devices:
  eno1:
    nictype: bridged
    parent: lxdbr0
    type: nic
  eth0:
    name: eth0
    network: lxdbr0
    type: nic

The devices are named eno1 and eth0.

The instance name will be setup in dnsmasq’s DNS pointing to the NIC’s DHCP IP address.
However if you connect multiple NICs to the the same parent bridge (lxdbr0 in this case) then there is the possibility that both NICs will have DHCP run on them and this would result in multiple IPs for the same DNS name, causing unpredictability.

If you don’t know of the reason why you have two NICs on your container, then I suggest removing one (probably the eno1 one, as the eth0 is more conventional) and see if that solves the issue.

Regardless of what LXD did to create this situation, I’m only interested in a solution.

The current solution is to revert to 5.6, which fixes all problems.

If you can provide exact commands to attempt fixing this, I currently have 100s of containers in this state where I can try your fix, for example… sounds like the fix to try is some sort of lxc config command.

Provide an command to try. I’ll try it, then update this thread with results.

Indeed. That is what I am trying to get to. But first I need to understand why you have 2 NICs connected to the same bridge. Without understanding that I cannot suggest a way forward.

On the related thread Usecase for multiple interfaces in a single bridge we had a productive discussion around their use case and were able to come up with a solution.

We need to do the same thing here.

No clue why. This is something LXD has done internally.

This machine has never had an “eth0” or “eno1”, so unsure how to proceed.

I still have some machines in this state, so if you can provide me with commands to kill off bad interfaces, let me know + I’ll run the command on one of the… still broken machines… then report back on what occurs…

No this isn’t correct. LXD never adds a NIC called eno1 automatically.
But its possible this was added by yourself in the past and it was never actively used, nor did it cause problems until the LXD validation change.

The eth0 NIC is part of the default profile that LXD generates during initialization.

There’s no way for me to know whether your containers are configured to use eno1 or eth0 for their connectivity, but if I were a betting man I would say that as eth0 is the default NIC, its more likely that the manually added (and apparently forgotten about) eno1 NIC would be a good candidate for removal.

So to remove this from the container use:

lxc config device remove <instance> eno1

If this fails saying the device doesn’t exist, then its likely its part of the profile.
You can check this by doing lxc config show <instance> and if it doesn’t show without the --expanded flag then you can see its coming from the profile.

To remove it from the profile so you can use:

lxc profile device remove <profile> eno1

Keep in mind this will remove it from all instances using that profile.

1 Like

Fails…

net14 # lxc config device remove net14-ian-farr-infoanywhere-spare-v2 eno1
Error: Device from profile(s) cannot be removed from individual instance. Override device or modify profile instead

Provide the command to determine what profile is being used for a device or container, unsure as I’ve never worked with profiles.

This also fails…

net14 # lxc profile eno1 show
Error: unknown command "eno1" for "lxc profile"

I did already.

To remove it from the profile so you can use:

lxc profile device remove <profile> eno1

Keep in mind this will remove it from all instances using that profile.

What I’m looking for is the value for , for example, to show all profiles.

To see the profiles you’re using for your instance do lxc config show <instance> and see the profiles section.

net14 # lxc config show net14-ian-farr-infoanywhere-spare-v2 
architecture: x86_64
config:
  boot.autostart: "1"
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20200402_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20200402_07:42"
  image.type: squashfs
  volatile.base_image: 47e9e45537ddf24cc2c5c13c00c3c4dbf36ec188b2598b560b714bc266f79834
  volatile.eno1.hwaddr: 00:16:3e:ea:7e:62
  volatile.eno1.name: eth1
  volatile.eth0.hwaddr: 00:16:3e:39:af:7c
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 897aea5f-514b-472e-a0fc-00d358e378c9
devices:
  tools:
    path: /david-favor
    source: /david-favor
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Works… This fixed the problem…

lxc profile device remove default eno1

Thanks!

1 Like

I don’t quite get why this behaviour is enforced. I would argue that having multiple network interfaces connected to the same network (for example for redundancy in a physical system) is a perfectly valid usecase.

I would also argue that the unpredictability should be the user’s problem and not result in the container unable to start, maybe printing a warning but not fail.

1 Like

If you are connecting an instance to multiple physical interfaces then this wouldn’t be using a managed LXD network and wouldn’t trigger the error.

Equally if you have turned off managed DNS on a managed network then the check also doesn’t fire, see:

1 Like

I thought you could use p2p NIC, but it doesn’t have a parent setting.

What is the scenario you are encountering?

Was there any backwards compatible testing done??? Did a system update and suddenly none of my containers would start. Lxdbr0 was set up by lxd init, years ago. Eth0 was as part of default, Suddenly getting errors.

Error: Failed start validation for device “eth0”: Instance DNS name “mimir” already used on network
Try lxc info --show-log mimir for more info

~$ lxc config show mimir
architecture: x86_64
config:
  boot.autostart.delay: "10"
  boot.autostart.priority: "99"
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20181029)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20181029"
  image.version: "18.04"
  limits.cpu: "2"
  volatile.base_image: 30b9f587eb6fb50566f4183240933496d7b787f719aafb4b58e6a341495a38ad
  volatile.cloud-init.instance-id: 48aa28be-d718-4733-820c-1cc615dc1608
  volatile.eth0.hwaddr: 00:16:3e:7d:86:c4
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 0e1228c2-5e4c-4fc2-a3f6-231892ab72ec
devices:
  lxdbr0:
    nictype: bridged
    parent: lxdbr0
    type: nic
  shared-storage:
    path: /storage
    source: /storagepool/storage/lxc_shared
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
~$ lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd
    type: disk
name: default
used_by:
- /1.0/instances/tech-nuage
- /1.0/instances/collaboraonline
- /1.0/instances/nextcloud
- /1.0/instances/gitlab
- /1.0/instances/airstack-docker
- /1.0/instances/mimir
- /1.0/instances/webproxy
- /1.0/instances/vulcan
- /1.0/instances/odin
~$ lxc network show lxdbr0
config:
  ipv4.address: 10.66.146.1/24
  ipv4.nat: "true"
  ipv6.address: none
  ipv6.nat: "false"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/airstack-docker
- /1.0/instances/collaboraonline
- /1.0/instances/collaboraonline
- /1.0/instances/gitlab
- /1.0/instances/gitlab
- /1.0/instances/mimir
- /1.0/instances/mimir
- /1.0/instances/nextcloud
- /1.0/instances/nextcloud
- /1.0/instances/odin
- /1.0/instances/tech-nuage
- /1.0/instances/tech-nuage
- /1.0/instances/vulcan
- /1.0/instances/webproxy
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

Doing the suggested

lxc profile device remove default eno1

does nothing because I don’t have a en01, changing it to eth0 does allow containers to start,but… without any networking so that is pointless.

LXD/LXC has been working rock solid for years, and was set it and forget it as far as networking was concerned. Someone broke something in the 5.9 version.

~$ lxc --version
5.9

In your case it looks like you have added (at some point) an additional unused NIC called lxdbr0 to your container. This combined with the eth0 NIC from the profile has triggered the new validation check that was recently added to LXD (to detect this sort of scenario) as both NICs are connected to the lxdbr0 network.

Try doing this to remove the extra NIC from your instance so it just uses the eth0 NIC from the profile:

lxc config device remove mimir lxdbr0

It would be fine if you can solve also my problem which is similar. I am just creating a container in a new system. I have already other devices running with the same configuration. Lxc version is 5.11.
When I want to assign the profile to the container, I get the error :
Failed add validation for device “lan”: Instance DNS name “router2” already used on network
router2 is the instance which I created, “lan” is the bridge interface.

architecture: aarch64
config:
  image.architecture: arm64
  image.description: Openwrt 22.03 arm64 (20230304_12:00)
  image.os: Openwrt
  image.release: "22.03"
  image.serial: "20230304_12:00"
  image.type: squashfs
  image.variant: default
  volatile.apply_template: create
  volatile.base_image: 327ad513c4ab595bbcf1520de758b310c58c669bb9e0438282505b1613b3dfd8
  volatile.cloud-init.instance-id: 7b8833c5-032a-488e-b172-65c7ed4a0b01
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.lan.hwaddr: 00:16:3e:29:1d:bb
  volatile.last_state.idmap: '[]'
  volatile.uuid: 91715417-6d31-4e5a-b7f0-ba8360cfaf62
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default

I do not see anything which is not normal here.
profile is :

lxc profile show profile2
config: {}
description: ""
devices:
  lan:
    name: lan
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
  wan:
    name: wan
    nictype: physical
    parent: eth0
    type: nic
name: profile2
used_by: []

I tried to delete router2 and create router3 : same error, with “router3”.

Tell me if you need other information.

From what was said before, I concluded that the problem was eth0.
lxc profile device remove default eth0
did the trick.
But I think there are two problems :

  1. the message is totally confusing. It says :
    Failed add validation for device “lan”: Instance DNS name “router2” already used on network
    This text means : “the name router2 is already used”, which is false. This is really misleading. The message suggests also that the source of the problem is the device “lan”, which is false. If the system finds a collision between two “eth0” devices, the message should say there is a collision between devices named “eth0”. Why mentioning names which have nothing to do with the problem ?
  2. The collision was created by lxd which choosed “eth0” as name for its default profile, when there was an already existing “eth0” port. If collision of names much be avoided, it is very strange to choose as default one of the most common port name, without checking that it will not create problems, leaving the user perplex and spending hours to understand what is happening. The default name should be something like “lxeth0”, or whatever you want, but not “eth0”.

Thanks for the explanations in this thread, I would never have found the problem without them. But I wonder how I made an installation last month, with a similar device and existing “eth0” port, and it did not create problems. This is strange.