Network device exists without a container

Dakado · November 10, 2020, 3:00pm

Somehow one of my network devices did not get deleted together with the container and the device still has folder in /var/snap/lxd/common/lxd/devices however currently no containers are present in the system. Is it safe to just manually delete this folder or is there any safe command ? This device is a proxy device and is blocking my IP address.

tomp · November 10, 2020, 3:03pm

When you say “blocking my IP address”, do you mean the proxy device is still running and listening on a port on your IP address?

Dakado · November 10, 2020, 3:04pm

Yes, exactly, the device exists by itself without any container so I cannot do lxc config device remove because it is not attached to any container.

tomp · November 10, 2020, 3:05pm

And you can see the process running? Have you tried killing the process?

What is the actual error you are getting?

Dakado · November 10, 2020, 3:08pm

There is error only when I try to create a new container with the IP that is blocked by this device. So I was looking into it and found that device folder still exists but the container has been removed already.

Here is what happens when I try to use the blocked IP with a new container:

Log:

lxc c8 20201110144650.550 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory “/sys/fs/cgroup/cpuset//lxc.monitor.c8”
lxc c8 20201110144650.550 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory “/sys/fs/cgroup/cpuset//lxc.payload.c8”
lxc c8 20201110144650.551 ERROR utils - utils.c:lxc_can_use_pidfd:1846 - Kernel does not support pidfds
lxc c8 20201110144650.552 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1573 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc c8 20201110144650.558 ERROR network - network.c:setup_ipv4_addr_routes:163 - Unknown error -17 - Failed to setup ipv4 address route for network device with eifindex 42
lxc c8 20201110144650.558 ERROR network - network.c:instantiate_veth:422 - Unknown error -17 - Failed to setup ip address routes for network device “vethfb838ed0”
lxc c8 20201110144650.600 ERROR network - network.c:lxc_create_network_priv:3068 - Unknown error -17 - Failed to create network device
lxc c8 20201110144650.600 ERROR start - start.c:lxc_spawn:1786 - Failed to create the network
lxc c8 20201110144650.600 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:860 - Received container state “ABORTING” instead of “RUNNING”
lxc c8 20201110144650.601 ERROR start - start.c:__lxc_start:1999 - Failed to spawn container “c8”
lxc c8 20201110144650.601 WARN start - start.c:lxc_abort:1018 - No such process - Failed to send SIGKILL to 30946
lxc 20201110144650.664 WARN commands - commands.c:lxc_cmd_rsp_recv:126 - Connection reset by peer - Failed to receive response for command “get_state”

How can I kill the device then ?

tomp · November 10, 2020, 3:11pm

Can you show the following:

The container you’re trying to start’s config:

lxc config show <container> --expanded

The listening ports on your LXD host

sudo ss -tlpn

Dakado · November 10, 2020, 4:13pm

lxc config show c1 --expanded

architecture: x86_64
config:
image.architecture: amd64
image.description: Debian stretch amd64 (20201110_05:24)
image.os: Debian
image.release: stretch
image.serial: “20201110_05:24”
image.type: squashfs
image.variant: default
volatile.base_image: d8f9e35d2b2dca9bd8d2c0f1cf388e6915bc547c4ce0101a8ef62a29bd4700a8
volatile.eth0.hwaddr: 00:16:3e:81:22:57
volatile.eth0.name: eth0
volatile.idmap.base: “0”
volatile.idmap.current: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.power: STOPPED
devices:
eth0:
ipv4.address: 138…16.151
nictype: routed
parent: eth0
type: nic
root:
path: /
pool: default
type: disk
ephemeral: false
profiles:

default
stateful: false
description: “”

in ss -tlpn there is not any entry in regards to that IP.

ss -tlpn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 32 10.10.10.10:53 0.0.0.0:* users:((“dnsmasq”,pid=7311,fd=9))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:((“sshd”,pid=1301,fd=3))
LISTEN 0 32 [fd42:1657:a2e5:b7e6::1]:53 [::]:* users:((“dnsmasq”,pid=7311,fd=13))
LISTEN 0 32 [fe80::216:3eff:fe18:8a1a]%lxdbr0:53 [::]:* users:((“dnsmasq”,pid=7311,fd=11))

Just to clarify, with other IPs containers are starting just fine, just this 1 IP is somehow blocked by “ghost” device that does not belong to any container.

tomp · November 10, 2020, 4:15pm

Please show output of ip r on lxd host?

Dakado · November 10, 2020, 4:17pm

ip r
default via 138…16.129 dev eth0 onlink
10.10.0.0/16 dev lxdbr0 proto kernel scope link src 10.10.10.10 linkdown
138…16.128/26 via 138…16.129 dev eth0
138…16.128/26 dev eth0 proto kernel scope link src 138…16.132
138…16.132 dev lxdbr0 proto static scope link linkdown
138…16.151 dev veth5f74fc3e scope link

tomp · November 10, 2020, 4:18pm

Ah OK so this is actually nothing to do with the proxy device.

Instead its an old veth interface that has not been removed.

If you’re confident that veth interface isn’t needed you can delete it and the route using:

sudo ip link delete veth5f74fc3e

Dakado · November 10, 2020, 4:37pm

Is there any way how can I prevent this from happening in the future ? I have a script that is creating containers and another that is deleting them and 1 container was apparently not deleted correctly I assume so the IP was hanging there ?

tomp · November 10, 2020, 4:37pm

If you can recreate the issue then we can certainly look into it, does it happen every time you stop the container?

Dakado · November 10, 2020, 4:42pm

The creation process:

lxc init images:debian/stretch c10
lxc config set c10 limits.cpu 8
lxc config set c10 limits.cpu.allowance 87%
lxc config set c10 limits.memory 8192MB
lxc config set c10 limits.memory.swap false
lxc config device add c10 root disk pool=default path=/
lxc config device set c10 root size 100GB
lxc config device set c10 root limits.read 10000MB
lxc config device set c10 root limits.write 10000MB
lxc config device add c10 eth0 nic nictype=routed parent=eth0 ipv4.address=138…16.151
lxc start c10

Adn when I delete containers I always do just this with my script:

lxc stop c10
lxc delete c10

So far it never failed to delete the container, but this hanging IP prevented the script from using this certain IP address for new containers.

tomp · November 10, 2020, 4:43pm

And to be 100 % clear, this is happening every time you stop the container?

Dakado · November 10, 2020, 4:46pm

No, it happened only once so far but since I was using a script to create new machines it did everytime the same thing so I have no idea what could lead to this problem. I will try to create and delete few more machines and will post the results of the test here.

Dakado · November 11, 2020, 2:54pm

This issue can be closed, the problem was resolved by removing the link from:

ip r
ip link delete veth5f74fc3e ///interface bound to that ip

After creating more than 20 more virtual containers I was not able to reproduce the issue.