LXD VM problems - LXD-agent connect fail & shiftfs for lxd inside vm

Hello,

I started a new VM with:

lxc launch images:debian/11 kerneltest2 --vm

Now I have two problems:

  1. Sometimes the LXD-agent is not available:

Update:
Intermediate Workaround:
Note: Is maybe already fixed in the images, only apply if you experience the problem!
Workaround is described here:


Error: Failed to connect to lxd-agent

This appears (very often but not always) when I restart a vm and try to connect to it via:

lxc exec kerneltest2 bash

When it occurs, I can’t connect via bash anymore (no matter how often I restart the vm), I need to restart the host.

Notes:

  • first start of vm is working most of the time
  • it seems to occur less often, when I wait longer between stopping and starting the vm again
  • I can always still connect via “lxc console”

Something that could be related to the problem is this error (shown in the console when stopping the vm):

[FAILED] Failed unmounting /run/lxd_config/9p

But this message also occurs when the bash command is still working.

Additional problem is:
I can’t connect as root, because “lxc console” wants a root password, which I don’t have (I will try to (re)set the password).

System Info (Host):
OS: Debian Testing
Kernel: 5.5.0-1-amd64
LXD: 4.0.1 (snap)

  1. I can’t activate shiftfs for lxd inside the VM:

This is a special case of course, but I wanted to test some kernel modules, so I installed shiftfs via my dkms script (seemed to work well) and then I wanted to activate shiftfs in lxd.
I ran the commands:

sudo snap set lxd shiftfs.enable=true
sudo systemctl reload snap.lxd.daemon

But “lxc info” still shows (also after restarts and retrys):

shiftfs: “false”


Update:
Solution:
For others reading this, the problem was that I installed unsigned kernel modules via dkms.
The kernel modules (i.e. shiftfs) were then blocked by secure boot (which is enabled by default in many VM images).
The solution is to disable Secure boot in the VMs.
You can do this with following command:

lxc config set [name of vm] security.secureboot=false

Hi,

Have you followed the steps here Running virtual machines with LXD 4.0 as it sounds like your VM doesn’t have the lxd-agent installed.

As for shiftfs, what OS and kernel are you running?

Yes.

And LXD-agent is installed.
As I said, on first start (after booting my host) of the vm it works, but when I stop and start the VM again, it does not work anymore (most of the time).

OK so can you see lxd-agent process running inside the VM when you get the errors? If not are they any logs from systemd inside the VM showing the process not starting/erroring to start?

To clarify (because it can easily be read over) I wanted to activate shiftfs inside the VM.
uname -a inside the VM shows:

Linux kerneltest 5.5.0-1-amd64 #1 SMP Debian 5.5.13-2 (2020-03-30) x86_64 GNU/Linux

The Debian kernel doesn’t have shiftfs support, so that’s normal.

Not sure what’s going on with the agent, can you get a shell through lxc console when that happens and look at systemctl --failed?

I installed the shiftfs module via dkms (with my script here on github) and checked via “modinfo shiftfs”.
And to add that: my script works on host, so I assume it must have something to do with the VM or LXD or the debian VM (image?) I use.
Still LXD (or snap) should show some error if it could not apply the activation of shiftfs, or not?

Hmm, this is weird. the snap set should have done it.

Can you show cat /proc/$(cat /var/snap/lxd/common/lxd.pid)/environ | tr "\0" "\n" ?

I want to make sure that LXD_SHIFTFS_DISABLE isn’t set.

The rest of the check is effectively just looking for shiftfs in /proc/filesystems and if not there, it attempts a modprobe shiftfs.

Shows this:

ARCH=x86_64-linux-gnu

SNAP_COMMON=/var/snap/lxd/common
SNAP_INSTANCE_KEY=
LXD_LXC_TEMPLATE_CONFIG=/snap/lxd/current/lxc/config/
TEMPDIR=/tmp
LD_LIBRARY_PATH=/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/lxd/14804/lib:/snap/lxd/14804/lib/x86_64-linux-gnu:/snap/lxd/14804/lib/x86_64-linux-gnu/ceph:/snap/lxd/14804/zfs-0.6/lib::/snap/lxd/14804/lib:/snap/lxd/14804/lib/x86_64-linux-gnu:/snap/lxd/current/lib:/snap/lxd/current/lib/x86_64-linux-gnu:/snap/lxd/current/lib/x86_64-linux-gnu/ceph
LISTEN_FDS=1
SNAP_LIBRARY_PATH=/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void
LISTEN_PID=702
HOME=/tmp/
SNAP_USER_DATA=/root/snap/lxd/14804
LXD_EXEC_PATH=/snap/lxd/current/bin/lxd
LXD_DIR=/var/snap/lxd/common/lxd/
SNAP_REVISION=14804
TMPDIR=/tmp
SNAP_CURRENT=/snap/lxd/current
XTABLES_LIBDIR=/snap/lxd/current/lib/xtables/
JOURNAL_STREAM=9:7458
SNAP_CONTEXT=4HYDz6h0oKVUZEtmrbDzG-r9MN3vzwsxNd7oU2FLlmhORSDM9wF1
SNAP_VERSION=4.0.1
SNAP_INSTANCE_NAME=lxd
PATH=/snap/lxd/14804/usr/sbin:/snap/lxd/14804/usr/bin:/snap/lxd/14804/sbin:/snap/lxd/14804/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/lxd/current/bin
INVOCATION_ID=02b7a710525a433b913073338e68658d
LISTEN_FDNAMES=unix
SNAP_DATA=/var/snap/lxd/14804
XDG_RUNTIME_DIR=/run/user/0/snap.lxd
LXD_CLUSTER_UPDATE=/snap/lxd/current/commands/refresh
LANG=en_US.UTF-8
SNAP_ARCH=amd64
SNAP_USER_COMMON=/root/snap/lxd/common
SNAP_COOKIE=4HYDz6h0oKVUZEtmrbDzG-r9MN3vzwsxNd7oU2FLlmhORSDM9wF1
SNAP_REEXEC=
LXD_OVMF_PATH=/snap/lxd/current/share/qemu
SNAP_NAME=lxd
LXD_LXC_HOOK=/snap/lxd/current/lxc/hooks/
PWD=/var/snap/lxd/14804
SNAP=/snap/lxd/14804

Ok, so the snap appears to have done the right thing.

Do you have shiftfs in /proc/filesystems?

Do you have shiftfs in /proc/filesystems?

No.
You are right, somehow the module is not installed correctly.
Strange. Why does it work on the host and not in the VM…

Is it in /proc/modules?

No.

What does modprobe shiftfs do?

filename: /lib/modules/5.5.0-1-amd64/updates/dkms/shiftfs.ko
license: GPL v2
description: id shifting filesystem
author: Christian Brauner christian.brauner@ubuntu.com
author: Seth Forshee seth.forshee@canonical.com
author: James Bottomley
alias: fs-shiftfs
depends:
retpoline: Y
name: shiftfs
vermagic: 5.5.0-1-amd64 SMP mod_unload modversions

That doesn’t look like modprobe shiftfs output.

That doesn’t look like modprobe shiftfs output.

Sry I mistook modprobe for modinfo…
Shows an error:

modprobe: ERROR: could not insert ‘shiftfs’: Operation not permitted

That’s your problem then.

What does dmesg show at the bottom of it?