Container can not be started a second time after creation with Error: Common start logic: Failed to create tc filter:

layer7 · August 24, 2019, 9:03am

Hi,

lxc --version

3.16
( via snap )
“driver”: “lxc”,
“driver_version”: “3.2.1”,

uname -a

Linux node2 4.19.66

CentOS 7 OS

I used centos/7/cloud image and did some testing with cloud-init. It all worked fine ( by the way can it be that centos 6 will not eat v2 network config ? ).

As soon as i run the “reboot” command inside of the container, the container will shutdown:

[…]
[ OK ] Stopped Load/Save Random Seed.
[ OK ] Unmounted /dev/full.
[ OK ] Unmounted /proc/swaps.
[ OK ] Unmounted /dev/zero.
[ OK ] Unmounted /proc/stat.
[ OK ] Unmounted /proc/uptime.
[FAILED] Failed unmounting /dev/ptmx.
[FAILED] Failed unmounting /dev/null.
[ OK ] Unmounted /sys/devices/system/cpu/online.
[ OK ] Unmounted /dev/tty.
[ OK ] Unmounted /proc/meminfo.
[ OK ] Unmounted /dev/random.
[ OK ] Unmounted /proc/cpuinfo.
[ OK ] Unmounted /proc/sys/kernel/random/boot_id.
[ OK ] Unmounted /dev/.lxd-mounts.
[ OK ] Unmounted /dev/lxd.
[ OK ] Stopped target Swap.
[ OK ] Stopped Configure read-only root support.
[ OK ] Reached target Shutdown.
Syncing filesystems and block devices.
Sending SIGTERM to remaining processes…
Sending SIGKILL to remaining processes…
Rebooting.

But the container wont reboot, it will remain in the “STOPPED” state.

So i tried to start it manually:

lxc start lxc1465

Error: Common start logic: Failed to create tc filter:
Try lxc info --show-log lxc1465 for more info

lxc info --show-log lxc1465

Name: lxc1465
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/08/23 22:38 UTC
Status: Stopped
Type: persistent
Profiles: lxc1465

Log:

lxc lxc1465 20190823223835.767 ERROR utils - utils.c:safe_mount:1212 - No such device - Failed to mount “/proc/sys/fs/binfmt_misc” onto “/var/snap/lxd/common/lxc//proc/sys/fs/binfmt_misc”
lxc lxc1465 20190823224133.749 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1
lxc lxc1465 20190823224133.749 ERROR start - start.c:lxc_fini:1064 - Failed to run lxc.hook.post-stop for container “lxc1465”
lxc lxc1465 20190823224133.749 WARN start - start.c:lxc_fini:1066 - Container will be stopped instead of rebooted
lxc 20190823224133.749 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”

So i was checking around a bit:

ls -la

total 0
drwxr-xr-x 2 root root 0 Aug 24 01:27 .
dr-xr-xr-x 1 root root 0 Aug 24 01:25 …
–w------- 1 root root 0 Aug 24 01:27 register
-rw-r–r-- 1 root root 0 Aug 24 01:27 status

cat /proc/sys/fs/binfmt_misc/status

enabled

ls -la /var/snap/lxd/common/lxc
total 0
drwxr-xr-x 2 root root 24 Aug 17 18:41 .
drwxr-xr-x 7 root root 137 Aug 24 01:25 …
-rw-r–r-- 1 root root 0 Aug 24 01:25 local.conf

I rebooted the server. Did not help.

The only thing that helps is to completely remove the container.

When i recreate it again, and issue again the reboot command inside of the container ( and if i am fast enough ) it will show:

lxc list

±--------±------±-----±-----±-----------±----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
±--------±------±-----±-----±-----------±----------+
| lxc1465 | ERROR | | | PERSISTENT | 0 |
±--------±------±-----±-----±-----------±----------+

and then

lxc list

±--------±--------±-----±-----±-----------±----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
±--------±--------±-----±-----±-----------±----------+
| lxc1465 | STOPPED | | | PERSISTENT | 0 |
±--------±--------±-----±-----±-----------±----------+

before i try to start the container again:

lxc info --show-log lxc1465

Name: lxc1465
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/08/24 08:27 UTC
Status: Stopped
Type: persistent
Profiles: lxc1465

Log:

lxc lxc1465 20190824083108.403 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1
lxc lxc1465 20190824083108.403 ERROR start - start.c:lxc_fini:1064 - Failed to run lxc.hook.post-stop for container “lxc1465”
lxc lxc1465 20190824083108.403 WARN start - start.c:lxc_fini:1066 - Container will be stopped instead of rebooted
lxc 20190824083108.404 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”
lxc 20190824083108.404 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”

the lxd log will show:

t=2019-08-24T10:27:28+0200 lvl=info msg=“Deleting container” created=2019-08-24T00:38:31+0200 ephemeral=false name=lxc1465 project=default used=2019-08-24T00:38:35+0200
t=2019-08-24T10:27:28+0200 lvl=info msg=“Deleted container” created=2019-08-24T00:38:31+0200 ephemeral=false name=lxc1465 project=default used=2019-08-24T00:38:35+0200
t=2019-08-24T10:27:30+0200 lvl=info msg=“Creating container” ephemeral=false name=lxc1465 project=default
t=2019-08-24T10:27:30+0200 lvl=info msg=“Created container” ephemeral=false name=lxc1465 project=default
t=2019-08-24T10:27:34+0200 lvl=info msg=“Starting container” action=start created=2019-08-24T10:27:30+0200 ephemeral=false name=lxc1465 project=default stateful=false used=1970-01-01T01:00:00+0100
t=2019-08-24T10:27:34+0200 lvl=info msg=“Started container” action=start created=2019-08-24T10:27:30+0200 ephemeral=false name=lxc1465 project=default stateful=false used=1970-01-01T01:00:00+0100
t=2019-08-24T10:31:08+0200 lvl=info msg=“Container initiated reboot” action=reboot created=2019-08-24T10:27:30+0200 ephemeral=false name=lxc1465 project=default stateful=false used=2019-08-24T10:27:34+0200

Here is the lxc config file for this container:

cat lxc.conf

lxc.log.file = /var/snap/lxd/common/lxd/logs/lxc1465/lxc.log
lxc.log.level = debug
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/snap/lxd/common/lxd/logs/lxc1465/console.log
lxc.mount.auto = proc:rw sys:rw
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional 0 0
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional 0 0
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional 0 0
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional 0 0
lxc.include = /snap/lxd/current/lxc/config//common.conf.d/
lxc.arch = linux64
lxc.hook.version = 1
lxc.hook.pre-start = /proc/3197/exe callhook /var/snap/lxd/common/lxd 5 start
lxc.hook.stop = /snap/lxd/current/bin/lxd callhook /var/snap/lxd/common/lxd 5 stopns
lxc.hook.post-stop = /snap/lxd/current/bin/lxd callhook /var/snap/lxd/common/lxd 5 stop
lxc.tty.max = 0
lxc.uts.name = lxc1465
lxc.mount.entry = /var/snap/lxd/common/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.seccomp.profile = /var/snap/lxd/common/lxd/security/seccomp/lxc1465
lxc.idmap = u 0 1000000 1000000000
lxc.idmap = g 0 1000000 1000000000
lxc.cgroup.memory.limit_in_bytes = 1048576000
lxc.cgroup.memory.soft_limit_in_bytes = 943718400
lxc.cgroup.memory.swappiness = 0
lxc.rootfs.path = dir:/var/snap/lxd/common/lxd/containers/lxc1465/rootfs
lxc.mount.entry = /var/snap/lxd/common/lxd/devices/lxc1465/unix.tun.dev-net-tun dev/net/tun none bind,create=file 0 0
lxc.mount.auto = shmounts:/var/snap/lxd/common/lxd/shmounts/lxc1465:/dev/.lxd-mounts
lxc.net.0.name = eth0
lxc.net.0.type = phys
lxc.net.0.flags = up
lxc.net.0.link = veth56955e15

Any idea how i can debug that ? I am currently a bit out of idea’s.

Thank you !
Greetings
Oliver

stgraber · August 24, 2019, 12:33pm

Looks like systemd’s automount on /proc/sys/fs/binfmt_misc is acting up. That’s a thing we’ve noticed happen before though hasn’t been a problem on more recent systems somehow.

You may want to try running umount /proc/sys/fs/binfmt_misc on the host, see if that unblocks it. You may also want to find and stop the associated systemd unit for that mount point.

layer7 · August 24, 2019, 1:49pm

Hi,

systemctl -a | grep binfmt

proc-sys-fs-binfmt_misc.automount loaded active waiting Arbitrary Executable File Formats File System Automount Point
proc-sys-fs-binfmt_misc.mount loaded inactive dead Arbitrary Executable File Formats File System
systemd-binfmt.service loaded inactive dead Set Up Additional Binary Formats

That does not look quiet healthy.

To give it a fast test, i unmounted this:

systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=58,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=320354)

and remount it manually:

mount | grep binfmt

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

I also removed the container, and created it again.

Even i have debugging on in lxd deamon , i dont get more information than this:

lxc list

±--------±--------±--------------------±--------------------------------------------±-----------±----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
±--------±--------±--------------------±--------------------------------------------±-----------±----------+
| lxc1465 | RUNNING | 192.168.0.2 (eth0) | | PERSISTENT | 0 |
±--------±--------±--------------------±--------------------------------------------±-----------±----------+

running:

lxc exec lxc1465 /bin/bash and then “reboot” inside of the container

resulting in:

tail -f /var/snap/lxd/common/lxd/logs/lxc1465/lxc.log

lxc lxc1465 20190824133427.918 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1
lxc lxc1465 20190824133427.918 ERROR start - start.c:lxc_fini:1064 - Failed to run lxc.hook.post-stop for container “lxc1465”
lxc lxc1465 20190824133427.918 WARN start - start.c:lxc_fini:1066 - Container will be stopped instead of rebooted
lxc 20190824133427.919 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”
lxc 20190824133427.919 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”
lxc 20190824133427.919 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”

tail -f /var/snap/lxd/common/lxd/logs/lxd.log

t=2019-08-24T13:57:47+0200 lvl=info msg=“Done pruning expired container backups”
t=2019-08-24T14:57:47+0200 lvl=info msg=“Pruning expired container backups”
t=2019-08-24T14:57:47+0200 lvl=info msg=“Done pruning expired container backups”
t=2019-08-24T15:30:40+0200 lvl=info msg=“Deleting container” created=2019-08-24T10:27:30+0200 ephemeral=false name=lxc1465 project=default used=2019-08-24T10:27:34+0200
t=2019-08-24T15:30:40+0200 lvl=info msg=“Deleted container” created=2019-08-24T10:27:30+0200 ephemeral=false name=lxc1465 project=default used=2019-08-24T10:27:34+0200
t=2019-08-24T15:31:26+0200 lvl=info msg=“Creating container” ephemeral=false name=lxc1465 project=default
t=2019-08-24T15:31:26+0200 lvl=info msg=“Created container” ephemeral=false name=lxc1465 project=default
t=2019-08-24T15:31:29+0200 lvl=info msg=“Starting container” action=start created=2019-08-24T15:31:26+0200 ephemeral=false name=lxc1465 project=default stateful=false used=1970-01-01T01:00:00+0100
t=2019-08-24T15:31:30+0200 lvl=info msg=“Started container” action=start created=2019-08-24T15:31:26+0200 ephemeral=false name=lxc1465 project=default stateful=false used=1970-01-01T01:00:00+0100
t=2019-08-24T15:34:27+0200 lvl=info msg=“Container initiated reboot” action=reboot created=2019-08-24T15:31:26+0200 ephemeral=false name=lxc1465 project=default stateful=false used=2019-08-24T15:31:29+0200

lxc start lxc1465

Error: Common start logic: Failed to create tc filter:
Try lxc info --show-log lxc1465 for more info

No matter what i do with binfmt_misc, this error remains.

I have started lxd.daemon with the debug flag:

ExecStart=/usr/bin/snap run lxd.daemon -debug

lxc monitor –type=logging

location: none
metadata:
context: {}
level: dbug
message: ‘New event listener: 365a531a-8c4d-4783-95d7-496d65f25051’
timestamp: “2019-08-24T15:46:10.926694662+02:00”
type: logging

But i still dont get more debug logging, what actually bothers him.

Even if i restart the physical server, the container still cant be started. I dont understand what does this “reboot” command inside of the container mess up so much, that it can not be started anymore at all.

Every idea is greatly appriciated. Thank you !

UPDATE: Ok i figure out, that this behaviour has something to do with a missing kernel module. I will report back, as soon as i found out which one.

layer7 · August 24, 2019, 7:05pm

Hi,

ok so essentially:

If you run into this binfmt_misc error, make sure that the following kernel modules are loaded into your kernel:

CONFIG_BINFMT_ELF
CONFIG_COMPAT_BINFMT_ELF
CONFIG_BINFMT_SCRIPT
CONFIG_BINFMT_MISC

If you run into the Failed to create tc filter then make sure you have:

Ingress/classifier-action Qdisc ( thats inside of QoS and/or fair queueing )
and all other modules, needed by tc

in your kernel loaded.

Once you did both, you can reboot your container by issuing the reboot command inside of your container, as well as start it with bandwidth limitations that you set via lxc set limit.*

Problem solved, thank you very much for your pointing and have a nice weekend !