Best way to solve unproper mounting issue in LXD (snap related)

Hi,

I’m trying to debug a mount situation that happened in one of our servers, this is what I know :

We have a host “centos-lxc” that spans multiple lxd containers. The LXD package has been installed through snap.

One day when logging in, our containers went down, and when we tried to make whatever lxd call like lxc list, or even just an lxc would result in the following :

Error: Get http://unix.socket/1.0: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

I initially thought this issue was socket related so I tried a bunch of refreshes, and other troubleshouting steps found in many threads over here and in other places but that was not the source of the problem.

Eventually got the lxc command to respond and lxc-list to produce the following output :
An lxc list would give me the following :

 [root@centoslxc ~]# lxc list
 Error: Get "http://unix.socket/1.0": EOF

When investigating more intelligently, I figured that the person who initially configured the default LXD profile specified a relative path as a mount point for the storage pool, as suggested by this command :

[root@centoslxc ~]# lxd sql global 'SELECT * FROM storage_pools_config;'

±—±----------------±--------±-------±----------+
| id | storage_pool_id | node_id | key | value |
±—±----------------±--------±-------±----------+
| 3 | 3 | 1 | source | srv/lxd |
| 4 | 4 | 1 | source | srv/store |
±—±----------------±--------±-------±----------+

This most-likely conflicts with the snap utility in some obscure way that I haven’t fully understood yet, but the bottom line is this error in the logs :

Oct 31 14:55:36 centoslxc lxd.daemon[238539]: t=2020-10-31T14:55:36+0100 lvl=eror msg="Failed  to start the daemon: "Failed to start the daemon: Failed initializing storage pool \"store_lxd\": Failed to mount '/var/snap/lxd/18077/srv/lxd' on '/var/snap/lxd/common/lxd/storage-pools/store_lxd': no such file or directory"

As stated lxd fails to mount the storage pool properly despite the existence of the specified directory and the corresponding containers in it as show below :

[root@centoslxc containers]# pwd
/var/snap/lxd/18077/srv/store/containers
[root@centoslxc containers]# ls
logs  logsdmz  logssansdmz  nessus  nessus2  nessus3  nessus4  nessus5  nessus6  test  wifivpn        wifivpn2  wifivpn3

My conclusion is that moving the containers to a storage pool mounted in an absolute directory such as /store/lxd would probably definitevely solve this issue by avoiding the weird interaction with snap, but :

I would like to know what’s the best practice in this kind of scenario to safely move out my containers to a new storage pool, without compromising their existence as I know this step will require me to delete the current storage pool and I’m afraid to lose some of them in the process.

Thanks for your support

What storage backend is used?

srv/lxd, srv/store would be perfectly valid and normal syntax if it’s ZFS and I wouldn’t have expected us to accept such syntax with the other storage drivers.

I have no idea about the parameters that the person who initially configured LXD used, may i ask for the command to show the type of storage used ?

I should add that " lxc profile list " doesn’t work either returning the same EOF error as well

Can you show journalctl -u snap.lxd.daemon -n 30?

[root@centoslxc ~]# journalctl -u snap.lxd.daemon -n 30
– Logs begin at Sat 2020-10-31 21:53:14 CET, end at Mon 2020-11-02 22:56:00 CET. –
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 3: fd: 9: cpuset
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 4: fd: 10: hugetlb
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 5: fd: 11: net_cls,net_prio
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 6: fd: 12: cpu,cpuacct
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 7: fd: 13: memory
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 8: fd: 14: freezer
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 9: fd: 15: pids
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 10: fd: 16: rdma
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: 11: fd: 18: devices
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: Kernel supports swap accounting
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: api_extensions:
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - cgroups
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - sys_cpu_online
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_cpuinfo
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_diskstats
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_loadavg
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_meminfo
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_stat
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_swaps
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - proc_uptime
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - shared_pidns
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - cpuview_daemon
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - loadavg_daemon
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: - pidfds
Nov 02 22:56:00 centoslxc lxd.daemon[1570]: Reloaded LXCFS
Nov 02 22:56:00 centoslxc lxd.daemon[534220]: => Re-using existing LXCFS
Nov 02 22:56:00 centoslxc lxd.daemon[534220]: ==> Cleaning up existing LXCFS namespace
Nov 02 22:56:00 centoslxc lxd.daemon[534220]: => Starting LXD
Nov 02 22:56:00 centoslxc lxd.daemon[534220]: t=2020-11-02T22:56:00+0100 lvl=warn msg=“AppArmor support has been disabled because of lack of kernel support”
Nov 02 22:56:00 centoslxc lxd.daemon[534220]: t=2020-11-02T22:56:00+0100 lvl=warn msg=" - Couldn’t find the CGroup blkio.weight, I/O weight limits will be ignored"

An this is what comes after :

Nov 02 22:58:11 centoslxc lxd.daemon[539396]: t=2020-11-02T22:58:11+0100 lvl=eror msg="Failed to start the daemon: Failed initializing storage pool “store_lxd”: Failed to mount ‘/var/snap/lxd/18077/srv/lxd’ on '/var/snap/lxd/common/lx>
Nov 02 22:58:11 centoslxc lxd.daemon[539396]: Error: Failed initializing storage pool “store_lxd”: Failed to mount ‘/var/snap/lxd/18077/srv/lxd’ on ‘/var/snap/lxd/common/lxd/storage-pools/store_lxd’: no such file or directory
Nov 02 22:58:11 centoslxc lxd.daemon[539396]: => LXD failed to start
Nov 02 22:58:11 centoslxc systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 22:58:11 centoslxc systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Nov 02 22:58:12 centoslxc systemd[1]: snap.lxd.daemon.service: Service RestartSec=100ms expired, scheduling restart.
Nov 02 22:58:12 centoslxc systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 23099.
Nov 02 22:58:12 centoslxc systemd[1]: Stopped Service for snap application lxd.daemon.
Nov 02 22:58:12 centoslxc systemd[1]: Started Service for snap application lxd.daemon.
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: => Preparing the system (18077)
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Loading snap configuration
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Setting up mntns symlink (mnt:[4026532716])
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Setting up kmod wrapper
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Preparing /boot
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Preparing a clean copy of /run
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Preparing /run/bin
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Preparing a clean copy of /etc
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Preparing a clean copy of /usr/share/misc
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Setting up ceph configuration
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Setting up LVM configuration
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Rotating logs
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Escaping the systemd cgroups
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ====> Detected cgroup V1
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Escaping the systemd process resource limits
Nov 02 22:58:12 centoslxc lxd.daemon[539912]: ==> Disabling shiftfs on this kernel (auto)
Nov 02 22:58:12 centoslxc lxd.daemon[1570]: Closed liblxcfs.so
Nov 02 22:58:12 centoslxc lxd.daemon[1570]: Running destructor lxcfs_exit

Ok, indeed looks like you’re dealing with a dir backend and relative paths instead.
Kinda surprising that his ever worked :slight_smile:

To try to fix it, create a file at /var/snap/lxd/common/lxd/database/patch.global.sql containing:

UPDATE storage_pools_config SET key='/srv/lxd' WHERE key='srv/lxd';
UPDATE storage_pools_config SET key='/srv/store' WHERE key='srv/store';

Then try interacting with LXD again to see if it starts properly then.

I still get this : Error: Get “http://unix.socket/1.0”: EOF
Do you know how to fix it ?

Can you get an updated output of journalctl -u snap.lxd.daemon -n 30?

Oh, the patch was obviously bad, sorry.
Try again but this time with this content:

UPDATE storage_pools_config SET value='/srv/lxd' WHERE value='srv/lxd';
UPDATE storage_pools_config SET value='/srv/store' WHERE value='srv/store';

It does effectively change the mount point but still this EOF issue :frowning:

[root@centoslxc ~]# journalctl -u snap.lxd.daemon -n 30
– Logs begin at Sat 2020-10-31 21:53:14 CET, end at Mon 2020-11-02 23:25:04 CET. –
Nov 02 23:24:52 centoslxc lxd.daemon[609563]: t=2020-11-02T23:24:52+0100 lvl=warn msg=“AppArmor support has been disabled because of lack of kernel support”
Nov 02 23:24:52 centoslxc lxd.daemon[609563]: t=2020-11-02T23:24:52+0100 lvl=warn msg=" - Couldn’t find the CGroup blkio.weight, I/O weight limits will be ig>
Nov 02 23:25:02 centoslxc lxd.daemon[609563]: t=2020-11-02T23:25:02+0100 lvl=eror msg="Failed to start the daemon: Failed initializing storage pool "store_l>
Nov 02 23:25:03 centoslxc lxd.daemon[609563]: Error: Failed initializing storage pool “store_lxd”: Failed to mount ‘/var/lib/snapd/hostfs/srv/lxd’ on '/var/s>
Nov 02 23:25:03 centoslxc lxd.daemon[609563]: => LXD failed to start
Nov 02 23:25:03 centoslxc systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 23:25:03 centoslxc systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Nov 02 23:25:03 centoslxc systemd[1]: snap.lxd.daemon.service: Service RestartSec=100ms expired, scheduling restart.
Nov 02 23:25:03 centoslxc systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 23234.
Nov 02 23:25:03 centoslxc systemd[1]: Stopped Service for snap application lxd.daemon.
Nov 02 23:25:03 centoslxc systemd[1]: Started Service for snap application lxd.daemon.
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: => Preparing the system (18077)
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Loading snap configuration
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Setting up mntns symlink (mnt:[4026532716])
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Setting up kmod wrapper
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Preparing /boot
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Preparing a clean copy of /run
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Preparing /run/bin
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Preparing a clean copy of /etc
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Preparing a clean copy of /usr/share/misc
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Setting up ceph configuration
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Setting up LVM configuration
Nov 02 23:25:03 centoslxc lxd.daemon[610082]: ==> Rotating logs
Nov 02 23:25:04 centoslxc lxd.daemon[610082]: ==> Escaping the systemd cgroups
Nov 02 23:25:04 centoslxc lxd.daemon[610082]: ====> Detected cgroup V1
Nov 02 23:25:04 centoslxc lxd.daemon[610082]: ==> Escaping the systemd process resource limits
Nov 02 23:25:04 centoslxc lxd.daemon[610082]: ==> Disabling shiftfs on this kernel (auto)
Nov 02 23:25:04 centoslxc lxd.daemon[1570]: Closed liblxcfs.so
Nov 02 23:25:04 centoslxc lxd.daemon[1570]: Running destructor lxcfs_exit
Nov 02 23:25:04 centoslxc lxd.daemon[1570]: Running constructor lxcfs_init to reload liblxcfs
lines 1-31/31 (END)

Where are those two pools actually located on the system?

That’s the whole problem i’ve been wondering about since 3 hours now, I don’t know, and this is why i wanted to move them to a new “sanitized” storage pool !!

I have these 3 folders with numeric names that seems to indicate some save point :

[root@centoslxc lxd]# pwd
/var/snap/lxd
[root@centoslxc lxd]# ls
17886  18013  18077  common  current

And then inside each of those numeric folders is actually the mount point /srv/lxd and /srv/store (i cant do an ls -laR because my current remote tty is very … painful to work with)

Oh wow, okay, that’s interesting and a massive waste of space.

Ok, so let’s try:

  • mv /var/snap/lxd/current/srv/lxd /srv/lxd
  • mv /var/snap/lxd/current/srv/store /srv/store

If that works properly, you can then go ahead and wipe the copies in the older revisions as that data shouldn’t have been there in the first place :slight_smile:

Right now every time your snap refresh, that data is needlessly copied onto the next revision when it should really be shared state…

I’m so sorry for taking of your time like this, but your help is really saving me right now, can’t thank you enough for that

So i’ve done it, removed the junk and reloaded the daemon but still the same EOF issue, here are the logs once more :

Nov 02 23:41:30 centoslxc lxd.daemon[1570]: - loadavg_daemon
Nov 02 23:41:30 centoslxc lxd.daemon[1570]: - pidfds
Nov 02 23:41:30 centoslxc lxd.daemon[1570]: Reloaded LXCFS
Nov 02 23:41:30 centoslxc lxd.daemon[653078]: => Re-using existing LXCFS
Nov 02 23:41:30 centoslxc lxd.daemon[653078]: ==> Cleaning up existing LXCFS namespace
Nov 02 23:41:30 centoslxc lxd.daemon[653078]: => Starting LXD
Nov 02 23:41:30 centoslxc lxd.daemon[653078]: t=2020-11-02T23:41:30+0100 lvl=warn msg=“AppArmor support has been disabled because of lack of kernel support”
Nov 02 23:41:30 centoslxc lxd.daemon[653078]: t=2020-11-02T23:41:30+0100 lvl=warn msg=" - Couldn’t find the CGroup blkio.weight, I/O weight limits will be ignored"
Nov 02 23:41:41 centoslxc lxd.daemon[653078]: t=2020-11-02T23:41:41+0100 lvl=eror msg="Failed to start the daemon: Failed initializing storage pool “store_lxd”: Failed to mount '/var/lib/snapd/hostfs/>
Nov 02 23:41:41 centoslxc lxd.daemon[653078]: Error: Failed initializing storage pool “store_lxd”: Failed to mount ‘/var/lib/snapd/hostfs/srv/lxd’ on ‘/var/snap/lxd/common/lxd/storage-pools/store_lxd’: >
Nov 02 23:41:41 centoslxc lxd.daemon[653078]: => LXD failed to start
Nov 02 23:41:41 centoslxc systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 23:41:41 centoslxc systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Nov 02 23:41:42 centoslxc systemd[1]: snap.lxd.daemon.service: Service RestartSec=100ms expired, scheduling restart.
Nov 02 23:41:42 centoslxc systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 23317.
Nov 02 23:41:42 centoslxc systemd[1]: Stopped Service for snap application lxd.daemon.
Nov 02 23:41:42 centoslxc systemd[1]: Started Service for snap application lxd.daemon.

So the solution evolved a bit haven’t done anything since the last operation, the mount issue seems to be solved and the socket issue turned into its original form :

[root@centoslxc ~]# lxc list
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

Here are the result of a journalctl -xe , no more mount issue :

Nov 03 00:02:52 centoslxc lxd.daemon[710068]: ====> Detected cgroup V1
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: ==> Escaping the systemd process resource limits
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: ==> Disabling shiftfs on this kernel (auto)
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: Closed liblxcfs.so
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: Running destructor lxcfs_exit
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: Running constructor lxcfs_init to reload liblxcfs
Nov 03 00:02:52 centoslxc kernel: new mount options do not match the existing superblock, will be ignored
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: mount namespace: 5
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: hierarchies:
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 0: fd: 6: name=systemd
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 1: fd: 7: perf_event
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 2: fd: 8: blkio
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 3: fd: 9: cpuset
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 4: fd: 10: hugetlb
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 5: fd: 11: net_cls,net_prio
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 6: fd: 12: cpu,cpuacct
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 7: fd: 13: memory
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 8: fd: 14: freezer
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 9: fd: 15: pids
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 10: fd: 16: rdma
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: 11: fd: 18: devices
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: Kernel supports swap accounting
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: api_extensions:
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - cgroups
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - sys_cpu_online
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_cpuinfo
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_diskstats
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_loadavg
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_meminfo
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_stat
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_swaps
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - proc_uptime
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - shared_pidns
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - cpuview_daemon
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - loadavg_daemon
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: - pidfds
Nov 03 00:02:52 centoslxc lxd.daemon[1570]: Reloaded LXCFS
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: => Re-using existing LXCFS
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: ==> Cleaning up existing LXCFS namespace
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: => Starting LXD
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: t=2020-11-03T00:02:52+0100 lvl=warn msg=“AppArmor support has been disabled because of lack of kernel support”
Nov 03 00:02:52 centoslxc lxd.daemon[710068]: t=2020-11-03T00:02:52+0100 lvl=warn msg=" - Couldn’t find the CGroup blkio.weight, I/O weight limits will be ignored"

Try:

  • systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
  • systemctl start snap.lxd.daemon.unix.socket
  • lxc list

Error: Get “http://unix.socket/1.0”: EOF :frowning:

Ok, can you do:

  • systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
  • lxd --debug --group lxd