Mount fails when starting container. ZFS filesystem already mounted [LXD 3.22]

Seemingly just got the update via autorefresh and apparently the issue was not fixed, as LXD containers on 3 other hosts now failed in the same fashion.

Not all containers however, not sure what differs those that didn’t fail from those which did.

Worked around it in the same way again by umounting everything /proc managed by lxcfs.

for m in $(grep lxcfs /proc/mounts | grep proc | awk {'print $2'}); do umount ${m}; done

Which release is this fixed in? Is 3.22 both broken and fixed? How do I tell if I have the fix?

The snap packages do have the fix. If you got hit before you received the fix, you need to unmount lxcfs from your containers or restart them.

If you’ve hit this issue within the past 24h or so, it’s most likely something else.
In such cases, please provide grep lxcfs /var/log/syslog and snap changes.

Mar 18 01:11:02 hqserv lxd.daemon[7784]: Closed liblxcfs.so  
Mar 18 01:11:02 hqserv lxd.daemon[7784]: Running destructor lxcfs_exit
Mar 18 01:11:02 hqserv lxd.daemon[7784]: Running constructor lxcfs_init to reload liblxcfs
Mar 18 07:25:10 hqserv lxd.daemon[7784]: *** Error in `lxcfs': double free or corruption (fasttop): 0x00007f328c009050 ***
Mar 18 07:25:10 hqserv lxd.daemon[7784]: /snap/lxd/current/lib/liblxcfs.so(+0xda0b)[0x7f333c9b4a0b]
Mar 18 07:25:10 hqserv lxd.daemon[7784]: /snap/lxd/current/lib/liblxcfs.so(+0x9fe6)[0x7f333c9b0fe6]
Mar 18 07:25:10 hqserv lxd.daemon[7784]: /snap/lxd/current/lib/liblxcfs.so(+0xa1f2)[0x7f333c9b11f2]
Mar 18 07:25:10 hqserv lxd.daemon[7784]: /snap/lxd/current/lib/liblxcfs.so(cg_readdir+0x1ff)[0x7f333c9b14d0]
Mar 18 07:25:10 hqserv lxd.daemon[7784]: lxcfs[0x401ba0]
Mar 18 07:25:10 hqserv lxd.daemon[7784]: lxcfs[0x4026cf]
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 00400000-00406000 r-xp 00000000 07:00 39                                 /snap/lxd/13814/bin/lxcfs
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 00605000-00606000 r--p 00005000 07:00 39                                 /snap/lxd/13814/bin/lxcfs
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 00606000-00607000 rw-p 00006000 07:00 39                                 /snap/lxd/13814/bin/lxcfs
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 7f333c9a7000-7f333c9cb000 r-xp 00000000 07:04 177                        /snap/lxd/13840/lib/liblxcfs.so
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 7f333c9cb000-7f333cbca000 ---p 00024000 07:04 177                        /snap/lxd/13840/lib/liblxcfs.so
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 7f333cbca000-7f333cbcb000 r--p 00023000 07:04 177                        /snap/lxd/13840/lib/liblxcfs.so
Mar 18 07:25:10 hqserv lxd.daemon[7784]: 7f333cbcb000-7f333cbcc000 rw-p 00024000 07:04 177                        /snap/lxd/13840/lib/liblxcfs.so 

and

ID   Status  Spawn                   Ready                   Summary
38   Done    yesterday at 17:29 CET  yesterday at 17:29 CET  Auto-refresh snap "lxd"
39   Done    today at 01:10 CET      today at 01:11 CET      Auto-refresh snap "lxd"

Ok, that matches the one we’re fixing now, thanks.

I’m suffering the same issue on my container hosts - be they 16.04 or 18.04:

Mar 18 10:12:07 lxdhost lxd.daemon[1142]: Closed liblxcfs.so
Mar 18 10:12:07 lxdhost lxd.daemon[1142]: Running destructor lxcfs_exit
Mar 18 10:12:07 lxdhost lxd.daemon[1142]: Running constructor lxcfs_init to reload liblxcfs
Mar 18 11:26:58 lxdhost lxd.daemon[1142]: *** Error in `lxcfs': double free or corruption (fasttop): 0x00007f123c021d60 ***

and

# snap changes
ID   Status  Spawn                    Ready                    Summary
32   Done    yesterday at 10:11 AEDT  yesterday at 10:12 AEDT  Auto-refresh snap "lxd"
33   Done    today at 07:26 AEDT      today at 07:27 AEDT      Auto-refresh snap "lxd"

Looking forward for a fix.

In the mean time, is there an RSS feed I can subscribe to to get updates on this and any other LXD related issues?

The fix for the double free thing has been pushed quite a few hours ago.
If you’re on 13901 or higher, then you have the fix.

Obviously not helping with broken systems as those still need to restart the containers or unmount lxcfs from them.

https://github.com/lxc/lxd-pkg-snap/commits/latest-candidate shows the upcoming changes to the snap. Note that something being in there doesn’t mean it’s in stable yet, but it does mean that the next stable update will include it.

I’ve just finished off restarting my LXD hosts and will monitor the situation.

One thing to note is that my SWAP numbers are completely off inside my containers:

On host:

# free -m
              total        used        free      shared  buff/cache   available
Mem:          32167        8416       14805          65        8946       23356
Swap:          8191          35        8156

Inside containers:

# lxc exec container -- free -m
              total        used        free      shared  buff/cache   available
Mem:            953         368         108           0         477         585
Swap:   8796093021254           0 8796093021254

Also, unrelated to LXD, snap changes gives me an error whereas just yesterday it showed me some changes:

# snap changes
error: no changes found

# snap list
Name  Version    Rev    Tracking  Publisher   Notes
core  16-2.43.3  8689   stable    canonical✓  core
go    1.13.8     5364   stable    mwhudson    classic
lxd   3.22       13901  stable    canonical✓  -

Is there anything special I need to do on my end to resolve this SWAP issue? Prior to recent updates, this issue appeared to be present on 18.04 only. But it is now plaguing both platforms.

@brauner can you look into what’s going on with swap?

Not sure what this would be about but I’ve improved the meminfo codepath now that should handle corner-cases better.

Heads up:

After just 6 days of uptime, the /proc mount issue has returned for containers on my 16.04 host. No issues identified yet for containers on my 18.04 host.

grep lxcfs /var/log/syslog please

Apologies for the omission.

$ sudo grep lxcfs /var/log/syslog
Mar 24 07:23:41 lxcserver lxd.daemon[2753]: Closed liblxcfs.so
Mar 24 07:23:41 lxcserver lxd.daemon[2753]: Running destructor lxcfs_exit
Mar 24 07:23:41 lxcserver lxd.daemon[2753]: Running constructor lxcfs_init to reload liblxcfs
Mar 26 06:49:04 lxcserver kernel: [519660.024260] lxcfs[32252]: segfault at 0 ip 00007f02baec6a32 sp 00007f02a1ffa6b0 error 6 in libc-2.23.so[7f02bae50000+1c0000]
Mar 26 17:38:44 lxcserver lxd.daemon[24169]: Running constructor lxcfs_init to reload liblxcfs

And snap version:

$ snap list lxd
Name  Version  Rev    Tracking  Publisher   Notes
lxd   3.23     13987  stable    canonical✓  -

Snap changes:

$ snap changes lxd
ID   Status  Spawn                    Ready                    Summary
233  Done    yesterday at 17:38 AEDT  yesterday at 17:38 AEDT  Auto-refresh snaps "core18", "lxd"

Ok, could you pastebin the entire crash entry that showed up around 06:49 in your syslog?

Interestingly, I see no other entry in the logs related to this crash - just that one line. Other than syslog, is there another log file that may have this info?

Log file kern.log also has just this one line.

Maybe journalctl -u snap.lxd.daemon -n 300 will show more?

Hardly anything to do with the crash in there:

http://paste.ubuntu.com/p/NY6GpWRPdF/

@brauner ideas?

The one line segfault report from the kernel isn’t exactly much to go on…