Error: Failed start validation for device "enp3s0f0": Instance DNS name "net17-nicole-munoz-marketing" already used on network

davidfavor · November 24, 2022, 4:48pm

net14 # lxc config show net14-ian-farr-infoanywhere-spare-v2 
architecture: x86_64
config:
  boot.autostart: "1"
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20200402_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20200402_07:42"
  image.type: squashfs
  volatile.base_image: 47e9e45537ddf24cc2c5c13c00c3c4dbf36ec188b2598b560b714bc266f79834
  volatile.eno1.hwaddr: 00:16:3e:ea:7e:62
  volatile.eno1.name: eth1
  volatile.eth0.hwaddr: 00:16:3e:39:af:7c
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 897aea5f-514b-472e-a0fc-00d358e378c9
devices:
  tools:
    path: /david-favor
    source: /david-favor
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Works… This fixed the problem…

lxc profile device remove default eno1

Thanks!

randombenj · January 3, 2023, 2:59pm

I don’t quite get why this behaviour is enforced. I would argue that having multiple network interfaces connected to the same network (for example for redundancy in a physical system) is a perfectly valid usecase.

I would also argue that the unpredictability should be the user’s problem and not result in the container unable to start, maybe printing a warning but not fail.

tomp · January 3, 2023, 3:18pm

If you are connecting an instance to multiple physical interfaces then this wouldn’t be using a managed LXD network and wouldn’t trigger the error.

Equally if you have turned off managed DNS on a managed network then the check also doesn’t fire, see:

github.com

lxc/lxd/blob/master/lxd/device/nic_bridged.go#L390-L402

      
        
            		// Skip NICs connected to other VLANs (not perfect though as one NIC could
            		// explicitly specify the default untagged VLAN and these would be connected to
            		// same L2 even though the values are different, and there is a different default
            		// value for native and openvswith parent bridges).
            		if d.config["vlan"] != nicConfig["vlan"] {
            			return nil
            		}
            
            
		// Check there isn't another instance with the same DNS name connected to a managed network
            		// that has DNS enabled and is connected to the same untagged VLAN.
            		if d.network != nil && d.network.Config()["dns.mode"] != "none" && nicCheckDNSNameConflict(d.inst.Name(), inst.Name) {
            			return api.StatusErrorf(http.StatusConflict, "Instance DNS name %q already used on network", strings.ToLower(inst.Name))
            		}

tomp · January 3, 2023, 3:32pm

I thought you could use p2p NIC, but it doesn’t have a parent setting.

What is the scenario you are encountering?

Joseph_Rice · January 10, 2023, 6:06pm

Was there any backwards compatible testing done??? Did a system update and suddenly none of my containers would start. Lxdbr0 was set up by lxd init, years ago. Eth0 was as part of default, Suddenly getting errors.

Error: Failed start validation for device “eth0”: Instance DNS name “mimir” already used on network
Try lxc info --show-log mimir for more info

~$ lxc config show mimir
architecture: x86_64
config:
  boot.autostart.delay: "10"
  boot.autostart.priority: "99"
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20181029)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20181029"
  image.version: "18.04"
  limits.cpu: "2"
  volatile.base_image: 30b9f587eb6fb50566f4183240933496d7b787f719aafb4b58e6a341495a38ad
  volatile.cloud-init.instance-id: 48aa28be-d718-4733-820c-1cc615dc1608
  volatile.eth0.hwaddr: 00:16:3e:7d:86:c4
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 0e1228c2-5e4c-4fc2-a3f6-231892ab72ec
devices:
  lxdbr0:
    nictype: bridged
    parent: lxdbr0
    type: nic
  shared-storage:
    path: /storage
    source: /storagepool/storage/lxc_shared
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

~$ lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd
    type: disk
name: default
used_by:
- /1.0/instances/tech-nuage
- /1.0/instances/collaboraonline
- /1.0/instances/nextcloud
- /1.0/instances/gitlab
- /1.0/instances/airstack-docker
- /1.0/instances/mimir
- /1.0/instances/webproxy
- /1.0/instances/vulcan
- /1.0/instances/odin

~$ lxc network show lxdbr0
config:
  ipv4.address: 10.66.146.1/24
  ipv4.nat: "true"
  ipv6.address: none
  ipv6.nat: "false"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/airstack-docker
- /1.0/instances/collaboraonline
- /1.0/instances/collaboraonline
- /1.0/instances/gitlab
- /1.0/instances/gitlab
- /1.0/instances/mimir
- /1.0/instances/mimir
- /1.0/instances/nextcloud
- /1.0/instances/nextcloud
- /1.0/instances/odin
- /1.0/instances/tech-nuage
- /1.0/instances/tech-nuage
- /1.0/instances/vulcan
- /1.0/instances/webproxy
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

Doing the suggested

lxc profile device remove default eno1

does nothing because I don’t have a en01, changing it to eth0 does allow containers to start,but… without any networking so that is pointless.

LXD/LXC has been working rock solid for years, and was set it and forget it as far as networking was concerned. Someone broke something in the 5.9 version.

~$ lxc --version
5.9

tomp · January 10, 2023, 8:08pm

In your case it looks like you have added (at some point) an additional unused NIC called lxdbr0 to your container. This combined with the eth0 NIC from the profile has triggered the new validation check that was recently added to LXD (to detect this sort of scenario) as both NICs are connected to the lxdbr0 network.

Try doing this to remove the extra NIC from your instance so it just uses the eth0 NIC from the profile:

lxc config device remove mimir lxdbr0

Dysmas · March 6, 2023, 10:01pm

It would be fine if you can solve also my problem which is similar. I am just creating a container in a new system. I have already other devices running with the same configuration. Lxc version is 5.11.
When I want to assign the profile to the container, I get the error :
Failed add validation for device “lan”: Instance DNS name “router2” already used on network
router2 is the instance which I created, “lan” is the bridge interface.

architecture: aarch64
config:
  image.architecture: arm64
  image.description: Openwrt 22.03 arm64 (20230304_12:00)
  image.os: Openwrt
  image.release: "22.03"
  image.serial: "20230304_12:00"
  image.type: squashfs
  image.variant: default
  volatile.apply_template: create
  volatile.base_image: 327ad513c4ab595bbcf1520de758b310c58c669bb9e0438282505b1613b3dfd8
  volatile.cloud-init.instance-id: 7b8833c5-032a-488e-b172-65c7ed4a0b01
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.lan.hwaddr: 00:16:3e:29:1d:bb
  volatile.last_state.idmap: '[]'
  volatile.uuid: 91715417-6d31-4e5a-b7f0-ba8360cfaf62
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default

I do not see anything which is not normal here.
profile is :

lxc profile show profile2
config: {}
description: ""
devices:
  lan:
    name: lan
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
  wan:
    name: wan
    nictype: physical
    parent: eth0
    type: nic
name: profile2
used_by: []

I tried to delete router2 and create router3 : same error, with “router3”.

Tell me if you need other information.

Dysmas · March 7, 2023, 1:43am

From what was said before, I concluded that the problem was eth0.
lxc profile device remove default eth0
did the trick.
But I think there are two problems :

the message is totally confusing. It says :
Failed add validation for device “lan”: Instance DNS name “router2” already used on network
This text means : “the name router2 is already used”, which is false. This is really misleading. The message suggests also that the source of the problem is the device “lan”, which is false. If the system finds a collision between two “eth0” devices, the message should say there is a collision between devices named “eth0”. Why mentioning names which have nothing to do with the problem ?
The collision was created by lxd which choosed “eth0” as name for its default profile, when there was an already existing “eth0” port. If collision of names much be avoided, it is very strange to choose as default one of the most common port name, without checking that it will not create problems, leaving the user perplex and spending hours to understand what is happening. The default name should be something like “lxeth0”, or whatever you want, but not “eth0”.

Thanks for the explanations in this thread, I would never have found the problem without them. But I wonder how I made an installation last month, with a similar device and existing “eth0” port, and it did not create problems. This is strange.

tomp · March 17, 2023, 9:57am

Hrm, I don’t think you’ve quite gotten to the root of the problem yet.

Because you were showing the profile profile2 but the problem instance was using profile default.

If you had run lxc config show <instance> --expanded it would have shown you the complete config for the instance (i.e the profile config being applied as well as the local config from the instance).

I suspect you would have then seen that the profile was adding another NIC that was also connected to lxdbr0. Hence you likely effectively had the instance connected twice to the same network, causing a DNS name conflict.

The NICs are added in alphabetical order, so eth0 would have been added successfully and then when it came to add the lan interface it would have failed saying there is already an instance, by way of eth0, using that DNS name.

As an aside, the LXD built images come with network configuration that allows for automatic network configuration on the eth0 interface, so if you are using a NIC called lan from the profile, it may cause you unexpected issues because they will not autoconfigure by default. If this is not what you want anyway then all good.

Dysmas · April 26, 2023, 5:23pm

Thanks Thomas, I understand better.