I am having issues after upgrading Ubuntu from kernel 5.15 to 6.5. Now I cannot run any lxc command. I for example for lxc list, I get the same answer as always:
$ lxc list
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory
However, from looking at a few forum posts here, I found I can start it manually with:
$ sudo lxd --group lxd
and while this is active (the command does not terminate, just sits there), I can run my lxc commands just fine, start my instances, list them, etc.
With just
$ sudo lxd
this does not work. The command seems to do the same, I do not get an error, however, I cannot run any lxc commands like this.
I would be so grateful if someone could help me out, I have tried so much, but nothing seems to fix it.
For debugging purposes I append a few commands I ran and their outputs:
works:
$ sudo lxd --group lxd
WARN[03-18|01:48:28] - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored
WARN[03-18|01:48:28] - Couldn't find the CGroup network priority controller, network priority will be ignored
WARN[03-18|01:48:28] Instance type not operational type=virtual-machine driver=qemu err="KVM support is missing"
does not work:
$ sudo lxd
WARN[03-18|01:48:37] - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored
WARN[03-18|01:48:37] - Couldn't find the CGroup network priority controller, network priority will be ignored
WARN[03-18|01:48:37] Instance type not operational type=virtual-machine driver=qemu err="KVM support is missing"
when I do not manually start:
$ sudo systemctl status snap.lxd.daemon
× snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static)
Active: failed (Result: exit-code) since Mon 2024-03-18 00:53:12 UTC; 33min ago
TriggeredBy: × snap.lxd.daemon.unix.socket
Process: 11516 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=1/FAILURE)
Main PID: 11516 (code=exited, status=1/FAILURE)
CPU: 197ms
Mar 18 00:53:12 instance-20210714-1042 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 5.
Mar 18 00:53:12 instance-20210714-1042 systemd[1]: Stopped Service for snap application lxd.daemon.
Mar 18 00:53:12 instance-20210714-1042 systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Mar 18 00:53:12 instance-20210714-1042 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Mar 18 00:53:12 instance-20210714-1042 systemd[1]: Failed to start Service for snap application lxd.daemon.
Mar 18 01:12:42 instance-20210714-1042 systemd[1]: snap.lxd.daemon.service: Unit cannot be reloaded because it is inactive.
when I manually start:
$ sudo systemctl start snap.lxd.daemon
$ sudo systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static)
Active: active (running) since Mon 2024-03-18 01:39:31 UTC; 1s ago
TriggeredBy: × snap.lxd.daemon.unix.socket
Main PID: 31752 (daemon.start)
Tasks: 0 (limit: 28539)
Memory: 324.0K
CPU: 247ms
CGroup: /system.slice/snap.lxd.daemon.service
‣ 31752 /bin/sh /snap/lxd/24065/commands/daemon.start
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[25952]: - cpuview_daemon
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[25952]: - loadavg_daemon
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[25952]: - pidfds
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[25952]: Reloaded LXCFS
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[31752]: => Re-using existing LXCFS
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[31752]: => Starting LXD
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[31897]: t=2024-03-18T01:39:31+0000 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
Mar 18 01:39:31 instance-20210714-1042 lxd.daemon[31897]: t=2024-03-18T01:39:31+0000 lvl=warn msg="Instance type not operational" driver=qemu err="KVM support is missing" type=virtual-machine
Mar 18 01:39:32 instance-20210714-1042 lxd.daemon[31897]: t=2024-03-18T01:39:32+0000 lvl=eror msg="Failed to start the daemon" err="Failed initializing storage pool \"default\": Required tool 'zpool' is missing"
Mar 18 01:39:33 instance-20210714-1042 lxd.daemon[31897]: Error: Failed initializing storage pool "default": Required tool 'zpool' is missing
$ lsmod | grep zfs
zfs 5341184 6
spl 172032 1 zfs
$ journalctl -u snap.lxd.daemon -n 300
Hint: You are currently not seeing messages from other users and the system.
Users in groups 'adm', 'systemd-journal' can see all messages.
Pass -q to turn off this notice.
-- No entries --
$ systemctl -a | grep lxd
sys-devices-virtual-net-lxdbr0.device loaded active plugged /sys/devices/virtual/net/lxdbr0
sys-subsystem-net-devices-lxdbr0.device loaded active plugged /sys/subsystem/net/devices/lxdbr0
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-23999.mount loaded active mounted Mount unit for lxd, revision 23999
snap-lxd-24065.mount loaded active mounted Mount unit for lxd, revision 24065
var-snap-lxd-common-ns-mntns.mount loaded active mounted /var/snap/lxd/common/ns/mntns
var-snap-lxd-common-ns-shmounts.mount loaded active mounted /var/snap/lxd/common/ns/shmounts
var-snap-lxd-common-ns.mount loaded active mounted /var/snap/lxd/common/ns
snap.lxd.lxd-0d39690d-36aa-4bd2-bb97-62289b92ccf9.scope loaded active running snap.lxd.lxd-0d39690d-36aa-4bd2-bb97-62289b92ccf9.scope
● lxd-agent-9p.service not-found inactive dead lxd-agent-9p.service
lxd-agent.service loaded inactive dead LXD - agent
snap.lxd.activate.service loaded inactive dead Service for snap application lxd.activate
● snap.lxd.daemon.service loaded failed failed Service for snap application lxd.daemon
snap.lxd.workaround.service loaded active exited /bin/true
● snap.lxd.daemon.unix.socket loaded failed failed Socket unix for snap application lxd.daemon
$ sudo cat /var/snap/lxd/common/lxd/logs/lxd.log
t=2024-03-18T01:32:16+0000 lvl=info msg="LXD is starting" mode=normal path=/var/snap/lxd/common/lxd version=4.0.9
t=2024-03-18T01:32:16+0000 lvl=info msg="Kernel uid/gid map:"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - u 0 0 4294967295"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - g 0 0 4294967295"
t=2024-03-18T01:32:16+0000 lvl=info msg="Configured LXD uid/gid map:"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - u 0 1000000 1000000000"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - g 0 1000000 1000000000"
t=2024-03-18T01:32:16+0000 lvl=info msg="Kernel features:"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - closing multiple file descriptors efficiently: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - pidfds: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - core scheduling: no"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - uevent injection: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - seccomp listener: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - seccomp listener add file descriptors: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - attach to namespaces via pidfds: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - safe native terminal allocation : yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - unprivileged file capabilities: yes"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - cgroup layout: cgroup2"
t=2024-03-18T01:32:16+0000 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
t=2024-03-18T01:32:16+0000 lvl=info msg=" - shiftfs support: disabled"
t=2024-03-18T01:32:16+0000 lvl=warn msg="Instance type not operational" driver=qemu err="KVM support is missing" type=virtual-machine
t=2024-03-18T01:32:16+0000 lvl=info msg="Initializing local database"
t=2024-03-18T01:32:16+0000 lvl=info msg="Set client certificate to server certificate" fingerprint=9c43c4e47646714340c408b2ca86ccde3c6ea6eb8a96d774a16e0b377c47dee3
t=2024-03-18T01:32:16+0000 lvl=info msg="Starting database node" id=1 local=1 role=voter
t=2024-03-18T01:32:17+0000 lvl=info msg="Starting /dev/lxd handler:"
t=2024-03-18T01:32:17+0000 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2024-03-18T01:32:17+0000 lvl=info msg="REST API daemon:"
t=2024-03-18T01:32:17+0000 lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2024-03-18T01:32:17+0000 lvl=info msg=" - binding TCP socket" socket=[::]:8443
t=2024-03-18T01:32:17+0000 lvl=info msg="Initializing global database"
t=2024-03-18T01:32:17+0000 lvl=info msg="Connecting to global database"
t=2024-03-18T01:32:17+0000 lvl=info msg="Connected to global database"
t=2024-03-18T01:32:17+0000 lvl=info msg="Initialized global database"
t=2024-03-18T01:32:17+0000 lvl=info msg="Firewall loaded driver" driver=nftables
t=2024-03-18T01:32:17+0000 lvl=info msg="Initializing storage pools"
t=2024-03-18T01:32:17+0000 lvl=eror msg="Failed to start the daemon" err="Failed initializing storage pool \"default\": Required tool 'zpool' is missing"
t=2024-03-18T01:32:17+0000 lvl=info msg="Starting shutdown sequence" signal=interrupt
t=2024-03-18T01:32:17+0000 lvl=info msg="Closing the database"
t=2024-03-18T01:32:17+0000 lvl=info msg="Stop database gateway"
t=2024-03-18T01:32:17+0000 lvl=info msg="Stopping REST API handler:"
t=2024-03-18T01:32:17+0000 lvl=info msg=" - closing socket" socket=[::]:8443
t=2024-03-18T01:32:17+0000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2024-03-18T01:32:17+0000 lvl=info msg="Stopping /dev/lxd handler:"
t=2024-03-18T01:32:17+0000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2024-03-18T01:32:17+0000 lvl=info msg="Unmounting temporary filesystems"
t=2024-03-18T01:32:17+0000 lvl=info msg="Done unmounting temporary filesystems"
t=2024-03-18T01:32:17+0000 lvl=info msg="Daemon stopped"
$ sudo systemctl daemon-reload
$ sudo systemctl stop snap.lxd.daemon.service
$ sudo systemctl stop snap.lxd.daemon.unix.socket
$ sudo snap refresh lxd
snap "lxd" has no updates available
$ sudo systemctl reload snap.lxd.daemon
snap.lxd.daemon.service is not active, cannot reload.
$ ps aux | grep lxd.*logfile
ubuntu+ 25795 0.0 0.0 6424 1920 pts/3 S+ 01:36 0:00 grep --color=auto lxd.*logfile
$ sudo snap connections lxd
Interface Plug Slot Notes
lxd - lxd:lxd -
lxd-support lxd:lxd-support :lxd-support -
network lxd:network :network -
network-bind lxd:network-bind :network-bind -
system-observe lxd:system-observe :system-observe -
$ getent group lxd
lxd:x:118:ubuntu
$ uname -a
Linux instance-20210714-1042 6.5.0-1018-oracle #18~22.04.1-Ubuntu SMP Sat Feb 17 22:00:50 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
EDIT:
I am even more confused now, after another restart (I have tried a few times before), I get:
$ lxc list
Error: Get "http://unix.socket/1.0": EOF
But when I now run just
$ sudo lxd
even without the --group lxd, all the lxc commands work fine.
After I have interrupted, I get:
$ lxc list
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory
Now just lxd
does not work at all anymore:
$ sudo lxd
WARN[03-18|02:24:16] - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored
WARN[03-18|02:24:16] - Couldn't find the CGroup network priority controller, network priority will be ignored
WARN[03-18|02:24:16] Instance type not operational type=virtual-machine driver=qemu err="KVM support is missing"
EROR[03-18|02:24:17] Failed to start the daemon err="Failed to start dqlite server: raft_start(): io: load closed segment 0000000000045726-0000000000045731: entries batch 13 starting at byte 123992: entries count in preamble is zero"
Error: Failed to start dqlite server: raft_start(): io: load closed segment 0000000000045726-0000000000045731: entries batch 13 starting at byte 123992: entries count in preamble is zero
EDIT 2:
I now deleted the offending segment 0000000000045726-0000000000045731 from /var/snap/lxd/common/lxd/database/global
and lxc works again if I manually run sudo lxd --group lxd
. However, I still have the original issue where the systemd snap.lxd.daemon and snap.lxd.daemon.unix.socket fail to start.