Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd-user/unix.socket: connect: connection refused

gohweixun · January 9, 2023, 7:03am

Hi, I am having issues with the LXD daemon refusing to start.

lxc info shows the following

$ lxc info
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd-user/unix.socket: connect: connection refused

When I try to restart the service and check journalctl -xe, I get the following

Jan 09 14:56:05 gold systemd[1]: Started Service for snap application lxd.daemon.
░░ Subject: A start job for unit snap.lxd.daemon.service has finished successfully
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit snap.lxd.daemon.service has finished successfully.
░░
░░ The job identifier is 6922.
Jan 09 14:56:05 gold lxd.daemon[1740567]: => Preparing the system (24175)
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Loading snap configuration
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Setting up mntns symlink (mnt:[4026533551])
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Setting up kmod wrapper
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Preparing /boot
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Preparing a clean copy of /run
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Preparing /run/bin
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Preparing a clean copy of /etc
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Preparing a clean copy of /usr/share/misc
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Setting up ceph configuration
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Setting up LVM configuration
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Setting up OVN configuration
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Rotating logs
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Setting up ZFS (2.1)
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Escaping the systemd cgroups
Jan 09 14:56:05 gold lxd.daemon[1740567]: ====> Detected cgroup V2
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Escaping the systemd process resource limits
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Disabling shiftfs at user request
Jan 09 14:56:05 gold lxd.daemon[25308]: Closed liblxcfs.so
Jan 09 14:56:05 gold lxd.daemon[25308]: Running destructor lxcfs_exit
Jan 09 14:56:05 gold lxd.daemon[25308]: Running constructor lxcfs_init to reload liblxcfs
Jan 09 14:56:05 gold lxd.daemon[25308]: mount namespace: 6
Jan 09 14:56:05 gold lxd.daemon[25308]: hierarchies:
Jan 09 14:56:05 gold lxd.daemon[25308]:   0: fd:   8: cpuset,cpu,io,memory,hugetlb,pids,rdma,misc
Jan 09 14:56:05 gold lxd.daemon[25308]: Kernel supports pidfds
Jan 09 14:56:05 gold lxd.daemon[25308]: Kernel does not support swap accounting
Jan 09 14:56:05 gold lxd.daemon[25308]: api_extensions:
Jan 09 14:56:05 gold lxd.daemon[25308]: - cgroups
Jan 09 14:56:05 gold lxd.daemon[25308]: - sys_cpu_online
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_cpuinfo
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_diskstats
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_loadavg
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_meminfo
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_stat
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_swaps
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_uptime
Jan 09 14:56:05 gold lxd.daemon[25308]: - proc_slabinfo
Jan 09 14:56:05 gold lxd.daemon[25308]: - shared_pidns
Jan 09 14:56:05 gold lxd.daemon[25308]: - cpuview_daemon
Jan 09 14:56:05 gold lxd.daemon[25308]: - loadavg_daemon
Jan 09 14:56:05 gold lxd.daemon[25308]: - pidfds
Jan 09 14:56:05 gold lxd.daemon[25308]: Reloaded LXCFS
Jan 09 14:56:05 gold lxd.daemon[1740567]: => Re-using existing LXCFS
Jan 09 14:56:05 gold lxd.daemon[1740567]: ==> Reloading LXCFS
Jan 09 14:56:05 gold lxd.daemon[1740567]: => Starting LXD
Jan 09 14:56:06 gold lxd.daemon[1741161]: time="2023-01-09T14:56:06+08:00" level=warning msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
Jan 09 14:56:06 gold lxd.daemon[1741161]: time="2023-01-09T14:56:06+08:00" level=warning msg="Instance type not operational" driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
Jan 09 14:56:06 gold lxd.daemon[1741161]: time="2023-01-09T14:56:06+08:00" level=error msg="Failed to start the daemon" err="Bind network address: listen tcp :8443: bind: address already in use"
Jan 09 14:56:06 gold lxd.daemon[1741161]: Error: Bind network address: listen tcp :8443: bind: address already in use

However, sudo netstat -plnta | grep 8443 gives nothing.

It might also be worth noting that I think these issues started after I tried installing juju and bootstrapping with localhost. However, I had issues with that, and ended up snap remove --purge juju. This did not resolve the issue with lxd though, even after a system restart.

cemzafer · January 9, 2023, 11:53am

Hi @gohweixun,
Please check 8443 with this command sudo lsof -i :8443 and kill that pid.
And restart the snap.lxd.daemon service.
Regards.

gohweixun · January 9, 2023, 1:18pm

Hi @cemzafer ,

I have checked 8443 with lsof -i :8443, there is nothing.

Same goes with sudo netstat -plnta | grep 8443

I have also noticed that when I try to snap remove lxd, followed by snap install lxd, lxc info works again. However, when I do snap restore to restore the data, the problem happens again.

cemzafer · January 9, 2023, 2:29pm

Hi,
Humm, can you try purge lxd and reinstall again.

sudo snap remove lxd --purge
sudo snap install lxd

So after refresh install, stop the daemon and start manually if you get the error again.

sudo systemctl stop snap.lxd.daemon
sudo lxd --debug --group lxd

P.S. Before this, post the lxc config show command output please.
Regards.

tomp · January 9, 2023, 3:55pm

Search these forums for “/var/snap/lxd/common/lxd-user/unix.socket” shows some similar issues for users with custom user ID mapping that isn’t accessible from the snap package.

Are you trying to run lxc as a non-root user?

gohweixun · January 10, 2023, 2:06am

I have tried to purge lxd and reinstall again. After reinstallation, I can call lxc info fine and it seems to work, but when I try to sudo snap restore to restore the data from an older snapshot, the problem reoccurs.

When I call sudo lxd --debug --group lxd, I get the following output:

sudo lxd --debug --group lxd
INFO   [2023-01-10T09:54:22+08:00] LXD is starting                               mode=normal path=/var/snap/lxd/common/lxd version=5.9
INFO   [2023-01-10T09:54:22+08:00] Kernel uid/gid map:
INFO   [2023-01-10T09:54:22+08:00]  - u 0 0 4294967295
INFO   [2023-01-10T09:54:22+08:00]  - g 0 0 4294967295
INFO   [2023-01-10T09:54:22+08:00] Configured LXD uid/gid map:
INFO   [2023-01-10T09:54:22+08:00]  - u 0 1000000 1000000000
INFO   [2023-01-10T09:54:22+08:00]  - g 0 1000000 1000000000
INFO   [2023-01-10T09:54:22+08:00] Kernel features:
INFO   [2023-01-10T09:54:22+08:00]  - closing multiple file descriptors efficiently: yes
INFO   [2023-01-10T09:54:22+08:00]  - netnsid-based network retrieval: yes
INFO   [2023-01-10T09:54:22+08:00]  - pidfds: yes
INFO   [2023-01-10T09:54:22+08:00]  - core scheduling: yes
INFO   [2023-01-10T09:54:22+08:00]  - uevent injection: yes
INFO   [2023-01-10T09:54:22+08:00]  - seccomp listener: yes
INFO   [2023-01-10T09:54:22+08:00]  - seccomp listener continue syscalls: yes
INFO   [2023-01-10T09:54:22+08:00]  - seccomp listener add file descriptors: yes
INFO   [2023-01-10T09:54:22+08:00]  - attach to namespaces via pidfds: yes
INFO   [2023-01-10T09:54:22+08:00]  - safe native terminal allocation : yes
INFO   [2023-01-10T09:54:22+08:00]  - unprivileged file capabilities: yes
INFO   [2023-01-10T09:54:22+08:00]  - cgroup layout: cgroup2
WARNING[2023-01-10T09:54:22+08:00]  - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored
WARNING[2023-01-10T09:54:22+08:00]  - Couldn't find the CGroup network priority controller, network priority will be ignored
INFO   [2023-01-10T09:54:22+08:00]  - shiftfs support: yes
INFO   [2023-01-10T09:54:22+08:00]  - idmapped mounts kernel support: yes
WARNING[2023-01-10T09:54:22+08:00] Instance type not operational                 driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
INFO   [2023-01-10T09:54:22+08:00] Instance type operational                     driver=lxc features="[]" type=container
INFO   [2023-01-10T09:54:22+08:00] Initializing local database
DEBUG  [2023-01-10T09:54:22+08:00] Refreshing local trusted certificate cache
INFO   [2023-01-10T09:54:22+08:00] Set client certificate to server certificate  fingerprint=0873aa75a432f295b901487cad44f681ffc1ac9f5e059109efd73399eef4c9b6
DEBUG  [2023-01-10T09:54:22+08:00] Initializing database gateway
INFO   [2023-01-10T09:54:22+08:00] Starting database node                        id=1 local=1 role=voter
WARNING[2023-01-10T09:54:23+08:00] Failed setting up shared mounts               err="no such file or directory"
INFO   [2023-01-10T09:54:23+08:00] Loading daemon configuration
INFO   [2023-01-10T09:54:23+08:00] Closing socket                                socket="[::]:8443" type="REST API TCP socket"
INFO   [2023-01-10T09:54:23+08:00] Closing socket                                socket=/var/snap/lxd/common/lxd/unix.socket type="REST API Unix socket"
INFO   [2023-01-10T09:54:23+08:00] Closing socket                                socket=/var/snap/lxd/common/lxd/devlxd/sock type="devlxd socket"
ERROR  [2023-01-10T09:54:23+08:00] Failed to start the daemon                    err="Bind network address: listen tcp :8443: bind: address already in use"
INFO   [2023-01-10T09:54:23+08:00] Starting shutdown sequence                    signal=interrupt
DEBUG  [2023-01-10T09:54:23+08:00] Cancel ongoing or future gRPC connection attempts
INFO   [2023-01-10T09:54:23+08:00] Stop database gateway
INFO   [2023-01-10T09:54:23+08:00] Not unmounting temporary filesystems (instances are still running)
INFO   [2023-01-10T09:54:23+08:00] Daemon stopped
Error: Bind network address: listen tcp :8443: bind: address already in use

Is there any way that I can save my profiles, containers, storage volumes, and config, and possibly import these into a fresh lxd installation without using snap restore?

Alternatively, for a snap lxd installation, is there any way that I might be able to modify lxd conf files to get it to try to bind to an alternative port, and see if that might fix the problem?

gohweixun · January 10, 2023, 2:21am

Running lxc as non-root has generally not been an issue for me, however when I run lxc as root, I see the following issue:

sudo lxc config show
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

I would suppose that happens because the daemon isn’t running:

sudo lxd --debug --group lxd
INFO   [2023-01-10T09:54:22+08:00] LXD is starting                               mode=normal path=/var/snap/lxd/common/lxd version=5.9
INFO   [2023-01-10T09:54:22+08:00] Kernel uid/gid map:
INFO   [2023-01-10T09:54:22+08:00]  - u 0 0 4294967295
INFO   [2023-01-10T09:54:22+08:00]  - g 0 0 4294967295
INFO   [2023-01-10T09:54:22+08:00] Configured LXD uid/gid map:
INFO   [2023-01-10T09:54:22+08:00]  - u 0 1000000 1000000000
INFO   [2023-01-10T09:54:22+08:00]  - g 0 1000000 1000000000
INFO   [2023-01-10T09:54:22+08:00] Kernel features:
INFO   [2023-01-10T09:54:22+08:00]  - closing multiple file descriptors efficiently: yes
INFO   [2023-01-10T09:54:22+08:00]  - netnsid-based network retrieval: yes
INFO   [2023-01-10T09:54:22+08:00]  - pidfds: yes
INFO   [2023-01-10T09:54:22+08:00]  - core scheduling: yes
INFO   [2023-01-10T09:54:22+08:00]  - uevent injection: yes
INFO   [2023-01-10T09:54:22+08:00]  - seccomp listener: yes
INFO   [2023-01-10T09:54:22+08:00]  - seccomp listener continue syscalls: yes
INFO   [2023-01-10T09:54:22+08:00]  - seccomp listener add file descriptors: yes
INFO   [2023-01-10T09:54:22+08:00]  - attach to namespaces via pidfds: yes
INFO   [2023-01-10T09:54:22+08:00]  - safe native terminal allocation : yes
INFO   [2023-01-10T09:54:22+08:00]  - unprivileged file capabilities: yes
INFO   [2023-01-10T09:54:22+08:00]  - cgroup layout: cgroup2
WARNING[2023-01-10T09:54:22+08:00]  - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored
WARNING[2023-01-10T09:54:22+08:00]  - Couldn't find the CGroup network priority controller, network priority will be ignored
INFO   [2023-01-10T09:54:22+08:00]  - shiftfs support: yes
INFO   [2023-01-10T09:54:22+08:00]  - idmapped mounts kernel support: yes
WARNING[2023-01-10T09:54:22+08:00] Instance type not operational                 driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
INFO   [2023-01-10T09:54:22+08:00] Instance type operational                     driver=lxc features="[]" type=container
INFO   [2023-01-10T09:54:22+08:00] Initializing local database
DEBUG  [2023-01-10T09:54:22+08:00] Refreshing local trusted certificate cache
INFO   [2023-01-10T09:54:22+08:00] Set client certificate to server certificate  fingerprint=0873aa75a432f295b901487cad44f681ffc1ac9f5e059109efd73399eef4c9b6
DEBUG  [2023-01-10T09:54:22+08:00] Initializing database gateway
INFO   [2023-01-10T09:54:22+08:00] Starting database node                        id=1 local=1 role=voter
WARNING[2023-01-10T09:54:23+08:00] Failed setting up shared mounts               err="no such file or directory"
INFO   [2023-01-10T09:54:23+08:00] Loading daemon configuration
INFO   [2023-01-10T09:54:23+08:00] Closing socket                                socket="[::]:8443" type="REST API TCP socket"
INFO   [2023-01-10T09:54:23+08:00] Closing socket                                socket=/var/snap/lxd/common/lxd/unix.socket type="REST API Unix socket"
INFO   [2023-01-10T09:54:23+08:00] Closing socket                                socket=/var/snap/lxd/common/lxd/devlxd/sock type="devlxd socket"
ERROR  [2023-01-10T09:54:23+08:00] Failed to start the daemon                    err="Bind network address: listen tcp :8443: bind: address already in use"
INFO   [2023-01-10T09:54:23+08:00] Starting shutdown sequence                    signal=interrupt
DEBUG  [2023-01-10T09:54:23+08:00] Cancel ongoing or future gRPC connection attempts
INFO   [2023-01-10T09:54:23+08:00] Stop database gateway
INFO   [2023-01-10T09:54:23+08:00] Not unmounting temporary filesystems (instances are still running)
INFO   [2023-01-10T09:54:23+08:00] Daemon stopped
Error: Bind network address: listen tcp :8443: bind: address already in use

I quite puzzled why it’s giving the error of being unable to bind to tcp :8443 when nothing is showing up from lsof -i :8443 or sudo netstat -plnta | grep 8443.

cemzafer · January 10, 2023, 10:44am

I dont know that command is working, havent tried, but you can use lxd recover options for recover old storage instead of this you should backup your configs manually, please examine the link.
https://linuxcontainers.org/lxd/docs/master/backup/
Regards.

tomp · January 10, 2023, 2:49pm

Does this still occur after a fresh reboot?

gohweixun · January 11, 2023, 6:00am

That’s great, thanks. I managed to recover through the following process:

Removal via snap remove lxd
Fresh installation via snap install lxd
lxd init without creating a new storage pool (I mainly wanted it to create the network bridge and configure default settings)
lxd recover the default storage pool

Sidenote: I also had another storage pool for Docker created via a process similar to that described at this tutorial, recovery was pretty straightforward too, apart from having to figure out what path to provide (turned out to be something similar to /var/snap/lxd/common/lxd/disks/docker.img)

This whole process turned out to be fairly painless in the end, and your advice on the manual backup of configs is something that I will definitely be implementing.

Many thanks to @cemzafer and @tomp for suggestions on how to debug and recover!

As a side question: I’m curious about the usage of namespaces in LXD. I used rsync to copy out the contents of /var/snap/lxd/common/lxd and things seemed to work out, but I’m wondering if there are situations where things may not be properly backed up if files are not visible due to the separate namespaces.

tomp · January 11, 2023, 1:36pm

Yes there are situations where that wouldn’t work.
Anything that requires a mount will not show up in /var/snap/lxd/common/lxd on the host, but would only be inside the snap’s mount namespace. For example instances have a directory in /var/snap/lxd/common/lxd but it may be empty on the host, whereas inside the snap its mounted onto a storage pool.

See Linux Containers - LXD - Has been moved to Canonical

gohweixun · January 14, 2023, 9:37am

I have been thinking a bit about this problem and how I could avoid it in future. Since I was able to restore from my storage pools (one on an external ZFS pool, one on a loop-backed BTRFS pool), starting from a fresh installation, it seems that I must have somehow messed up my config somewhere in the database.

My LXD installation (and root installation) is on an ext4 volume. It seems to me that if the LXD installation had been on a ZFS volume with regular snapshots scheduled, the solution to this would have been as easy as rolling back /var/snap/lxd/common/lxd to a previous snapshot. Would I be right in this assumption?

tomp · January 16, 2023, 8:26am

Was LXD stopped when you took the backup?

tomp · January 16, 2023, 8:28am

Possibly, but again, that would only snapshot the contents of the ZFS dataset hosting LXD, and if you were using other storage pools that mounted ontop of that those would not be included in the snapshot.