Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

Jimbo · July 15, 2021, 9:20am

Yesterday I was testing listening on local Ip addresses, e.g. lxc config set core.https_address I am sure i restored it, then i was playing around an Alpine VM, after adding a proxy and restarting it, started getting strange errors in the VM, i just left it. Today when booting up, LXD is no longer working.

How do I debug? Cant run any lxc commands.

notroot@desktop:~$ lxc list
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

Ubuntu 20.04 desktop.
Lxc 4.0.6

stgraber · July 15, 2021, 1:56pm

Can you show journalctl -u snap.lxd.daemon -n 30 ?

Jimbo · July 15, 2021, 2:21pm

hmm, it seems my storage has gone. Must of done something in the wrong window .

It should be directory based storage on this computa, is there a way to point back or do i just reinstall.

-- Logs begin at Sat 2021-06-05 16:25:33 CEST, end at Thu 2021-07-15 16:10:51 CEST. --
Jul 15 06:32:30 desktop lxd.daemon[2158]: Kernel supports pidfds
Jul 15 06:32:30 desktop lxd.daemon[2158]: Kernel supports swap accounting
Jul 15 06:32:30 desktop lxd.daemon[2158]: api_extensions:
Jul 15 06:32:30 desktop lxd.daemon[2158]: - cgroups
Jul 15 06:32:30 desktop lxd.daemon[2158]: - sys_cpu_online
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_cpuinfo
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_diskstats
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_loadavg
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_meminfo
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_stat
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_swaps
Jul 15 06:32:30 desktop lxd.daemon[2158]: - proc_uptime
Jul 15 06:32:30 desktop lxd.daemon[2158]: - shared_pidns
Jul 15 06:32:30 desktop lxd.daemon[2158]: - cpuview_daemon
Jul 15 06:32:30 desktop lxd.daemon[2158]: - loadavg_daemon
Jul 15 06:32:30 desktop lxd.daemon[2158]: - pidfds
Jul 15 06:32:30 desktop lxd.daemon[2158]: Reloaded LXCFS
Jul 15 06:32:30 desktop lxd.daemon[2828]: => Re-using existing LXCFS
Jul 15 06:32:30 desktop lxd.daemon[2828]: => Starting LXD
Jul 15 06:32:30 desktop lxd.daemon[2958]: t=2021-07-15T06:32:30+0200 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
Jul 15 06:32:30 desktop lxd.daemon[2958]: t=2021-07-15T06:32:30+0200 lvl=eror msg="Failed to start the daemon: Failed initializing storage pool \"default\": Failed to run: z>
Jul 15 06:32:30 desktop lxd.daemon[2958]: Error: Failed initializing storage pool "default": Failed to run: zpool import lxd: cannot import 'lxd': no such pool available
Jul 15 06:32:31 desktop lxd.daemon[2828]: => LXD failed to start
Jul 15 06:32:31 desktop systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jul 15 06:32:31 desktop systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jul 15 06:32:31 desktop systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 5.
Jul 15 06:32:31 desktop systemd[1]: Stopped Service for snap application lxd.daemon.
Jul 15 06:32:31 desktop systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jul 15 06:32:31 desktop systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jul 15 06:32:31 desktop systemd[1]: Failed to start Service for snap application lxd.daemon.

Jimbo · July 15, 2021, 2:37pm

My setup

/data is mount point for sdb
/data/zfs/lxdpool.image

I cant run sudo lxd init. The lxd pool does not contain any important data. I use this for testing so i need this behave like a clean install, if i uninstall and reinstall will previous data/settings remain?

stgraber · July 15, 2021, 3:34pm

So /data/zfs/lxdpool.image is a loop file that provides that lxd storage pool?

If so, maybe just run zpool import lxd -d /data/zfs see if that loads it?

Jimbo · July 15, 2021, 3:59pm

That command worked, but i am now getting another error, something went wrong yesterday, not sure what. I presume i did lxd init in the wrong window or something but i am not so sure. I just remember things started going wrong when playing around with an alpine VM, and i left it that.

$ lxc start c1
Error: Failed preparing container for start: Failed to start device "eth0": Failed to run: ovs-vsctl --may-exist add-port vnet0 veth3b65555e: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Try `lxc info --show-log c1` for more info

$ lxc info --show-log c1
Name: c1
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/07/14 11:32 UTC
Status: Stopped
Type: container
Profiles: 

Log:

 lxc info --show-log vm1
Name: vm1
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/07/14 11:54 UTC
Status: Stopped
Type: virtual-machine
Profiles: 
Error: open /var/snap/lxd/common/lxd/logs/vm1/qemu.log: no such file or directory

stgraber · July 15, 2021, 4:47pm

Can you show lxc config show --expanded c1 and lxc network list?

Jimbo · July 15, 2021, 6:00pm

lxc config show --expanded c1
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Alpine 3.13 amd64 (20210713_13:00)
  image.os: Alpine
  image.release: "3.13"
  image.serial: "20210713_13:00"
  image.type: squashfs
  image.variant: default
  limits.cpu: "1"
  limits.memory: 1GB
  security.secureboot: "false"
  volatile.base_image: 4694810deab600c56d8b660eba859fed6130de06b26697bb07e24d175a56e84e
  volatile.eth0.host_name: veth584eab88
  volatile.eth0.hwaddr: 00:16:3e:7f:1c:bd
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
  volatile.uuid: fb4f1cc5-6a5d-4834-84fa-5ced9f187895
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: vnet0
    type: nic
  root:
    path: /
    pool: default
    size: 20GB
    type: disk
ephemeral: false
profiles: []
stateful: false
description: ""

+--------+----------+---------+-----------------------+---------+
|  NAME  |   TYPE   | MANAGED |      DESCRIPTION      | USED BY |
+--------+----------+---------+-----------------------+---------+
| eno1   | physical | NO      |                       | 0       |
+--------+----------+---------+-----------------------+---------+
| lxdbr0 | bridge   | YES     |                       | 1       |
+--------+----------+---------+-----------------------+---------+
| virbr0 | bridge   | NO      |                       | 0       |
+--------+----------+---------+-----------------------+---------+
| vnet0  | bridge   | YES     |                        | 3       |
+--------+----------+---------+-----------------------+---------+

The config data looks really thin. I was doing a few lxc config set core.https_address on that host such the local ip address and so forth and then switching back. So not sure if that might of caused the corruption or it was just me…

stgraber · July 15, 2021, 6:17pm

Ok, can you show lxc network show vnet0 and ip link?

Jimbo · July 15, 2021, 6:42pm

lxc network show vnet0
config:
  ipv4.address: 10.0.0.1/24
  ipv4.nat: "true"
  ipv6.address: none
  ipv6.nat: "true"
description: 
name: vnet0
type: bridge
used_by:
- /1.0/instances/c1
- /1.0/instances/c2
- /1.0/instances/c3
managed: true
status: Created
locations:
- none

Interesting, I have lost IPv4 traffic on the desktop. I defo did not touch network setitngs.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 18:c0:4d:97:d8:27 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:bf:69:5a brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:bf:69:5a brd ff:ff:ff:ff:ff:ff
5: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:59:b8:46 brd ff:ff:ff:ff:ff:ff
6: vnet0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:89:1f:cd brd ff:ff:ff:ff:ff:ff

stgraber · July 15, 2021, 8:41pm

Any idea why all your interfaces appear to be down? That’s not something that LXD would normally do so may suggest some network management tool had some fun.

Jimbo · July 16, 2021, 4:32am

I checked the netplan, it says its handled by Network Manager. Despite it being a desktop, i am using it remotely . The only I was doing yesterday was testing the listen address for LXD to find if there was a better or more secure way than just listening on everything and testing LXD VMs, then i started to get strange errors inside the alpine VM, noticed apache was not working and could not re-install it. I am just going to delete it and next time just VMs for development.I have 6 lxd hosts setup, and this not happened before. Note I am using both LXD and KVM with virtual machines on the desktop, so maybe the wires got crossed with LXD virtual machines somehow and that sent network manager into a tailspin.

Jimbo · July 16, 2021, 4:41am

This morning I created a container, and it started without problems, tried to start an old container and it also worked without problems. So not sure what happened, but it seems to have resolved itself.