/proc mounts broken after auto refresh on 3.22

From what I understand, this is an lxcfs bug, but I’m running the snap installation on Ubuntu.
This has happened twice now on the 3.22 stable channel, why does this keep happening? Why is this not caught on the non-stable channels?

sudo ls /proc
ls: cannot access ‘/proc/stat’: Transport endpoint is not connected
ls: cannot access ‘/proc/swaps’: Transport endpoint is not connected
ls: cannot access ‘/proc/uptime’: Transport endpoint is not connected
ls: cannot access ‘/proc/cpuinfo’: Transport endpoint is not connected
ls: cannot access ‘/proc/loadavg’: Transport endpoint is not connected
ls: cannot access ‘/proc/meminfo’: Transport endpoint is not connected
ls: cannot access ‘/proc/diskstats’: Transport endpoint is not connected

DISTRIB_DESCRIPTION=“Ubuntu 18.04.1 LTS”

snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     3.22/stable
refresh-date: yesterday at 21:25 CDT

 snap changes
ID   Status  Spawn                   Ready                   Summary
52   Done    yesterday at 12:20 CDT  yesterday at 12:20 CDT  Auto-refresh snap "lxd"
53   Done    yesterday at 21:25 CDT  yesterday at 21:25 CDT  Auto-refresh snap "lxd"

Also, can I downgrade to 3.21 or 3.20? Would those be more stable and less likely to have a breaking lxcfs change?

You cannot downgrade as the database schema has changed.

Can you show ‘grep lxcfs /var/log/syslog’ and ‘snap changes’?

As for your original question, lxcfs development versions have been in the edge channel for years, the lack of people running those with production systems is what’s causing the recent issues we fixed to have been missed.

We do have all your usual testing on LXCFS, both at the time things get merged and several times a day on all distros we support, those do certainly find bugs and get fixed, the ones that don’t get caught as the much weirder ones that automated testing and static analysis just can’t find.

I did get a crash overnight on one of my own production systems, would still like the result of grep lxcfs /var/log/syslog to see if it’s the same thing and if we’re maybe lucky enough to get slightly more details from your case.

@brauner tells me that based on patterns we saw in my crash, he’s got a reproducer and is working on it.

The current bug likely only affects those running on very old kernels (pre-cgns) or those running containers with nesting enabled (or at least with a /var/lib/lxcfs path inside them).

It’s not triggered based on time as far as we can tell but based on something actually accessing that path. In my case it looks like it would have been something like updatedb or a backup script.

We’re working on a fix now and will push it as soon as available to prevent any further breakage.

Thanks for the reply.

We’re on 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

And I don’t know if we have nesting enabled or not. It looks like /var/lib/lxcfs does exist but lxcfs is broken:

stat /var/lib/lxcfs
stat: cannot stat '/var/lib/lxcfs': Transport endpoint is not connected

Here’s the excerpt from syslog:

/var/log/syslog.1:Mar 17 12:20:09 team-dev0 lxd.daemon[3741]: Closed liblxcfs.so
/var/log/syslog.1:Mar 17 12:20:09 team-dev0 lxd.daemon[3741]: Running destructor lxcfs_exit
/var/log/syslog.1:Mar 17 12:20:09 team-dev0 lxd.daemon[3741]: Running constructor lxcfs_init to reload liblxcfs
/var/log/syslog.1:Mar 17 21:25:09 team-dev0 lxd.daemon[3741]: Closed liblxcfs.so
/var/log/syslog.1:Mar 17 21:25:09 team-dev0 lxd.daemon[3741]: Running destructor lxcfs_exit
/var/log/syslog.1:Mar 17 21:25:09 team-dev0 lxd.daemon[3741]: Running constructor lxcfs_init to reload liblxcfs
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: *** Error in `lxcfs': double free or corruption (fasttop): 0x00007f8bc4021c70 ***
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: /snap/lxd/current/lib/liblxcfs.so(+0xda0b)[0x7f8d1d2efa0b]
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: /snap/lxd/current/lib/liblxcfs.so(+0x9fe6)[0x7f8d1d2ebfe6]
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: /snap/lxd/current/lib/liblxcfs.so(+0xa1f2)[0x7f8d1d2ec1f2]
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: /snap/lxd/current/lib/liblxcfs.so(cg_readdir+0x1ff)[0x7f8d1d2ec4d0]
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: lxcfs[0x401b13]
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: lxcfs[0x402642]
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 00400000-00406000 r-xp 00000000 07:02 39                                 /bin/lxcfs
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 00605000-00606000 r--p 00005000 07:02 39                                 /bin/lxcfs
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 00606000-00607000 rw-p 00006000 07:02 39                                 /bin/lxcfs
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 7f8d1d2e2000-7f8d1d306000 r-xp 00000000 07:07 177                        /snap/lxd/13840/lib/liblxcfs.so
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 7f8d1d306000-7f8d1d505000 ---p 00024000 07:07 177                        /snap/lxd/13840/lib/liblxcfs.so
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 7f8d1d505000-7f8d1d506000 r--p 00023000 07:07 177                        /snap/lxd/13840/lib/liblxcfs.so
/var/log/syslog.1:Mar 18 01:25:02 team-dev0 lxd.daemon[3741]: 7f8d1d506000-7f8d1d507000 rw-p 00024000 07:07 177                        /snap/lxd/13840/lib/liblxcfs.so

Is there a way to disable lxcfs entirely? What does it get us?

There is no global way to turn it off, I also wouldn’t recommend it as it will generally lead to odd crashes.

Without lxcfs, your reported memory and cpu in the container will not follow the limits you apply, so software in the container will think it can use far more memory or CPU than it actually is allowed, causing it to crash when the limit is hit.

lxcfs also handles things like the uptime of the container and the load average.

Turning it off would effectively cause all containers to show the raw system-wide values for all resources rather than the values specific to the container you’re running.

The crash above matches the one we’ve isolated and fixed now in lxcfs, a fix will be rolled out within the next 2 hours. Do note that having the fix installed will not fix the broken containers, those containers need to either have lxcfs unmounted (at which point they’ll see the host values) or be fully restarted (at which point they’ll see their own values again).

1 Like

Looks like the fix was sent out and I haven’t seen any issues since. Thanks!

ID   Status  Spawn                   Ready                   Summary
54   Done    yesterday at 17:33 CDT  yesterday at 17:33 CDT  Auto-refresh snap "lxd"

snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     3.22/stable
refresh-date: yesterday at 17:33 CDT