Unable to start (snap) LXD on Debian 10 [SOLVED]

Hi there.

First of all, I’m new to LXD.
Yesterday I successfully installed LXD from snap on Debian 10. Today, new machine, plain new fresh Debian 10 just with update and I still can’t run LXD.

  • installed snapd
  • installed LXD with # snap install lxd
root@morfej:~# snap install lxd
lxd 4.0.0 from Canonical✓ installed
root@morfej:/tmp# snap list
Name    Version   Rev    Tracking  Publisher   Notes
core18  20200311  1705   stable    canonical✓  base
lxd     4.0.0     14623  stable    canonical✓  -
root@morfej:~# /snap/bin/lxd init
Error: Failed to connect to local LXD: Get "http://unix.socket/1.0": read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer

After reboot, still errors.

I tried following:

root@morfej:/tmp# snap start lxd
error: cannot perform the following tasks:
- start of [lxd.activate lxd.daemon] (# systemctl start snap.lxd.activate.service snap.lxd.daemon.service
Job for snap.lxd.activate.service failed because the control process exited with error code.
See "systemctl status snap.lxd.activate.service" and "journalctl -xe" for details.
)
- start of [lxd.activate lxd.daemon] (exit status 1)
root@morfej:/tmp# systemctl status snap.lxd.activate.service
● snap.lxd.activate.service - Service for snap application lxd.activate
   Loaded: loaded (/etc/systemd/system/snap.lxd.activate.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-04-16 23:13:42 UTC; 49s ago
  Process: 1981 ExecStart=/usr/bin/snap run lxd.activate (code=exited, status=1/FAILURE)
 Main PID: 1981 (code=exited, status=1/FAILURE)
Apr 16 23:13:39 morfej systemd[1]: Starting Service for snap application lxd.activate...
Apr 16 23:13:39 morfej lxd.activate[1981]: => Starting LXD activation
Apr 16 23:13:39 morfej lxd.activate[1981]: ==> Loading snap configuration
Apr 16 23:13:39 morfej lxd.activate[1981]: ==> Checking for socket activation support
Apr 16 23:13:42 morfej lxd.activate[1981]: ===> System doesn't support socket activation, starting LXD now
Apr 16 23:13:42 morfej lxd.activate[1981]: Job for snap.lxd.daemon.service failed because the control process exited with error code.
Apr 16 23:13:42 morfej lxd.activate[1981]: See "systemctl status snap.lxd.daemon.service" and "journalctl -xe" for details.
Apr 16 23:13:42 morfej systemd[1]: snap.lxd.activate.service: Main process exited, code=exited, status=1/FAILURE
Apr 16 23:13:42 morfej systemd[1]: snap.lxd.activate.service: Failed with result 'exit-code'.
Apr 16 23:13:42 morfej systemd[1]: Failed to start Service for snap application lxd.activate.

Even more debug info:

root@morfej:/tmp# systemctl stop snap.lxd.daemon.service snap.lxd.daemon.unix.socket
root@morfej:/tmp# pkill -9 lxd
root@morfej:/tmp# lxd --debug --group lxd
DBUG[04-16|23:02:36] Connecting to a local LXD over a Unix socket 
DBUG[04-16|23:02:36] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
INFO[04-16|23:02:36] LXD 4.0.0 is starting in normal mode     path=/var/snap/lxd/common/lxd
INFO[04-16|23:02:36] Kernel uid/gid map: 
INFO[04-16|23:02:36]  - u 0 0 4294967295 
INFO[04-16|23:02:36]  - g 0 0 4294967295 
INFO[04-16|23:02:36] Configured LXD uid/gid map: 
INFO[04-16|23:02:36]  - u 0 1000000 1000000000 
INFO[04-16|23:02:36]  - g 0 1000000 1000000000 
INFO[04-16|23:02:36] Kernel features: 
DBUG[04-16|23:02:36] Failed to attach to host network namespace 
INFO[04-16|23:02:36]  - netnsid-based network retrieval: no 
INFO[04-16|23:02:36]  - uevent injection: no 
INFO[04-16|23:02:36]  - seccomp listener: no 
INFO[04-16|23:02:36]  - seccomp listener continue syscalls: no 
INFO[04-16|23:02:36]  - unprivileged file capabilities: yes 
INFO[04-16|23:02:36]  - cgroup layout: hybrid 
WARN[04-16|23:02:36]  - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored 
WARN[04-16|23:02:36]  - Couldn't find the CGroup memory swap accounting, swap limits will be ignored 
INFO[04-16|23:02:36]  - shiftfs support: no 
INFO[04-16|23:02:36] Initializing local database 
DBUG[04-16|23:02:36] Initializing database gateway 
DBUG[04-16|23:02:36] Start database node                      address= role=voter id=1
EROR[04-16|23:02:36] Failed to start the daemon: Failed to autobind unix socket: listen unix : listen: operation not permitted 
INFO[04-16|23:02:36] Starting shutdown sequence 
DBUG[04-16|23:02:36] Not unmounting temporary filesystems (containers are still running) 
Error: Failed to autobind unix socket: listen unix : listen: operation not permitted

I reinstalled OS from scratch, repeated snapd and lxd installation, exactly same thing happening. Any clue what’s going on and how to fix it?

Thanks.

Anything useful in dmesg?

Not really. This is dmesg log when i try to # snap start lxd:

[ 5500.254072] audit: type=1326 audit(1587082760.500:82): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=2943 comm="mount" exe="/bin/mount" sig=0 arch=c000003e syscall=165 compat=0 ip=0x7f2cfa41e3ca code=0x50000
[ 5500.398636] audit: type=1326 audit(1587082760.644:83): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=2968 comm="mount" exe="/bin/mount" sig=0 arch=c000003e syscall=165 compat=0 ip=0x7f8a4b28d3ca code=0x50000
[ 5500.649288] audit: type=1326 audit(1587082760.896:84): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=2992 comm="mount" exe="/bin/mount" sig=0 arch=c000003e syscall=165 compat=0 ip=0x7fd942a133ca code=0x50000
[ 5500.899775] audit: type=1326 audit(1587082761.144:85): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=3017 comm="mount" exe="/bin/mount" sig=0 arch=c000003e syscall=165 compat=0 ip=0x7f34a1dea3ca code=0x50000
[ 5501.148751] audit: type=1326 audit(1587082761.392:86): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=3040 comm="mount" exe="/bin/mount" sig=0 arch=c000003e syscall=165 compat=0 ip=0x7f1bc5b6a3ca code=0x50000

What’s in cat /proc/mounts?

root@morfej:~> cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=2005932k,nr_inodes=501483,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=404148k,mode=755 0 0
/dev/sda / ext4 rw,noatime,errors=remount-ro 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=10737 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
/dev/loop0 /snap/core18/1705 squashfs ro,nodev,relatime 0 0
/dev/loop1 /snap/lxd/14623 squashfs ro,nodev,relatime 0 0
tmpfs /run/snapd/ns tmpfs rw,nosuid,noexec,relatime,size=404148k,mode=755 0 0
nsfs /run/snapd/ns/lxd.mnt nsfs rw 0 0
tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=404144k,mode=700,uid=1000,gid=1000 0 0

Ok, nothing standing out.

Can you show ls -lh /var/snap/lxd/common/lxd/?

Sure, here it is:

root@morfej:~> ls -lh /var/snap/lxd/common/lxd/
total 72K
drwx------ 2 root root 4.0K Apr 16 22:49 backups
drwx------ 2 root root 4.0K Apr 16 22:49 cache
drwx--x--x 2 root root 4.0K Apr 16 22:49 containers
drwx------ 3 root root 4.0K Apr 16 22:49 database
drwx--x--x 2 root root 4.0K Apr 16 22:49 devices
drwxr-xr-x 2 root root 4.0K Apr 16 22:49 devlxd
drwx------ 2 root root 4.0K Apr 16 22:49 disks
drwx------ 2 root root 4.0K Apr 16 22:49 images
drwx------ 2 root root 4.0K Apr 16 22:46 logs
drwx--x--x 2 root root 4.0K Apr 16 22:49 networks
drwx------ 2 root root 4.0K Apr 16 22:49 security
-rw-r--r-- 1 root root  761 Apr 16 22:49 server.crt
-rw------- 1 root root  288 Apr 16 22:49 server.key
drwx--x--x 2 root root 4.0K Apr 16 22:49 shmounts
drwx------ 2 root root 4.0K Apr 16 22:49 snapshots
drwx--x--x 2 root root 4.0K Apr 16 22:46 storage-pools
srw-rw---- 1 root root    0 Apr 16 22:47 unix.socket
drwx--x--x 2 root root 4.0K Apr 16 22:49 virtual-machines
drwx------ 2 root root 4.0K Apr 16 22:49 virtual-machines-snapshots

Also looks quite reasonable…

Can you try:

  • rm /var/snap/lxd/common/lxd/unix.socket
  • systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
  • strace -o debug lxd --debug --group lxd

Assuming it gets you the same error, pastebin the generated debug

(You’ll need the strace package if not already installed)

This is stdoutput:

root@morfej:/etc/nginx/sites-enabled> strace -o debug lxd --debug --group lxd
INFO[04-17|13:52:34] LXD 4.0.0 is starting in normal mode     path=/var/snap/lxd/common/lxd
INFO[04-17|13:52:34] Kernel uid/gid map: 
INFO[04-17|13:52:34]  - u 0 0 4294967295 
INFO[04-17|13:52:34]  - g 0 0 4294967295 
INFO[04-17|13:52:34] Configured LXD uid/gid map: 
INFO[04-17|13:52:34]  - u 0 1000000 1000000000 
INFO[04-17|13:52:34]  - g 0 1000000 1000000000 
INFO[04-17|13:52:34] Kernel features: 
DBUG[04-17|13:52:34] Failed to attach to host network namespace 
INFO[04-17|13:52:34]  - netnsid-based network retrieval: no 
INFO[04-17|13:52:34]  - uevent injection: no 
INFO[04-17|13:52:34]  - seccomp listener: no 
INFO[04-17|13:52:34]  - seccomp listener continue syscalls: no 
INFO[04-17|13:52:34]  - unprivileged file capabilities: yes 
INFO[04-17|13:52:34]  - cgroup layout: hybrid 
WARN[04-17|13:52:34]  - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored 
WARN[04-17|13:52:34]  - Couldn't find the CGroup memory swap accounting, swap limits will be ignored 
INFO[04-17|13:52:34]  - shiftfs support: no 
INFO[04-17|13:52:34] Initializing local database 
DBUG[04-17|13:52:34] Initializing database gateway 
DBUG[04-17|13:52:34] Start database node                      address= role=voter id=1
EROR[04-17|13:52:34] Failed to start the daemon: Failed to autobind unix socket: listen unix : listen: operation not permitted 
INFO[04-17|13:52:34] Starting shutdown sequence 
DBUG[04-17|13:52:34] Not unmounting temporary filesystems (containers are still running) 
Error: Failed to autobind unix socket: listen unix : listen: operation not permitted

Generated debug file is here: https://cloud.ekirin.com/index.php/s/ijZ7EC8NiFPJqJA

Thanks for all the help, Stephane.

bind(7, {sa_family=AF_UNIX}, 2)         = 0
listen(7, 128)                          = -1 EPERM (Operation not permitted)

So the kernel doesn’t look very happy at all…

Can you try:

  • nc -U /var/snap/lxd/common/lxd/unix.socket

See if that does bind it properly.

I rebooted machine to start all the services again since unix.socket file was deleted. Also, I had to install netstat-openbsd to support -U option. Here’s the result:

root@morfej:~> dir /var/snap/lxd/common/lxd/
total 72
drwx------ 2 root root 4096 Apr 16 22:49 backups/
drwx------ 2 root root 4096 Apr 16 22:49 cache/
drwx--x--x 2 root root 4096 Apr 16 22:49 containers/
drwx------ 3 root root 4096 Apr 16 22:49 database/
drwx--x--x 2 root root 4096 Apr 16 22:49 devices/
drwxr-xr-x 2 root root 4096 Apr 16 22:49 devlxd/
drwx------ 2 root root 4096 Apr 16 22:49 disks/
drwx------ 2 root root 4096 Apr 16 22:49 images/
drwx------ 2 root root 4096 Apr 16 22:46 logs/
drwx--x--x 2 root root 4096 Apr 16 22:49 networks/
drwx------ 2 root root 4096 Apr 16 22:49 security/
-rw-r--r-- 1 root root  761 Apr 16 22:49 server.crt
-rw------- 1 root root  288 Apr 16 22:49 server.key
drwx--x--x 2 root root 4096 Apr 16 22:49 shmounts/
drwx------ 2 root root 4096 Apr 16 22:49 snapshots/
drwx--x--x 2 root root 4096 Apr 16 22:46 storage-pools/
srw-rw---- 1 root root    0 Apr 17 14:30 unix.socket=
drwx--x--x 2 root root 4096 Apr 16 22:49 virtual-machines/
drwx------ 2 root root 4096 Apr 16 22:49 virtual-machines-snapshots/
root@morfej:~> nc -U /var/snap/lxd/common/lxd/unix.socket
nc: unix connect failed: Connection refused

Oh, sorry, you’ll need nc -l -U /var/snap/lxd/common/lxd/unix.socket

Looks like this command successfully listen to the socket since no error is reported. Btw, I don’t know if I should update kernel to newer version?

Linux morfej 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

Does that match your working system?

Can you also show snap connections?

Looks like my version of snap (2.37.4) doesn’t support this parameter.

root@morfej:~> snap interfaces
Slot     Plug
lxd:lxd  -
-        lxd:lxd-support
-        lxd:network
-        lxd:network-bind
-        lxd:system-observe

Ok, not very familiar with the old syntax, but this looks like the interfaces didn’t get connected for some reason.

Can you try: snap connect core18:lxd-support lxd:lxd-support see if that works to connect that one? If it does, repeat for the other 3.

I did reproduce your setup, this looks like a snapd bug…

A workaround is to do:

  • snap remove lxd (this will fail)
  • snap remove lxd (this will work)
  • snap install core
  • snap version (should show 2.44.1)
  • snap install lxd
1 Like

System that have or have had a snap based on core installed before will not hit this issue. It’s only really a problem when directly installing a core18 based snap which depends on newer snapd features.

I can’t believe that was the problem! One more reason to hate snaps.
Thank you for all the help, Stephane. LXD now finally works!

Eden