Failure with readonly mount on VM

I created an issue to track this:

Thank you Thomas!
Just looked at the bug report and have a small correction (not sure whether it matters): the host wher this error occurs is running Debian 11. The VM guest is Ubuntu 20.04.

I have reproduced on ubuntu 20.04 without the snap as well.

It looks like an AppArmor issue btw.

This should fix it:

Is it, as a workaround until the fixed version of lxd is released, possible to force in the configuration of the mount that the 9p driver (instead of virtio-fs) should be used? How would this be done?

The mount with readonly=false should work fine, but the fix is required for readonly=true.

I know that ‘readonly=true’ works, but I am looking for a workaround for ‘readonly=false’.

Disabling apparmor would provide a workaround, but beyond that I do not know of a workaround, beyond the fix that is.

You could in principle mount the source directory as read only and then share that as readonly=false.

You can also use raw.apparmor to directly allow the missing path for that instance.

1 Like

I’d need the host to have write access to the path, but the VM to have only read-only access. So I tried to do a bind mount of the host path /srv with option ‘ro’ to a new path /srv_ro and then specify /srv_ro as source argument in the lxd configuration. Now the VM starts up fine, but still has write access to /srv_ro, ignoring the read-only mount option, so this workaround does not work for me.

@stgraber: I tried your suggestion but could not get it working. I issued:

printf “/var/lib/snapd/hostfs/srv rw,” | lxc config set vm raw.apparmor -

but get the same error message when starting the VM and the log complains about the very same path not being accessible. It looks like my raw.apparmor settings gets ignored.

This works for me, notice the trailing slash, and the r rather than rw (as QEMU is opening the directory in read-only mode as intended).

lxc config device set <instance> <disk> readonly=true
printf "/var/lib/snapd/hostfs/srv/ r," | lxc config set <instance> raw.apparmor -
lxc start <instance>

This PR adds automated tests for this functionality:

I can confirm that this works in that it makes the VM start up without an error. (Though I don’t understand why giving more AppArmor permissions breaks it.)

However, the read-only protection of the path is easy to circumvent in the VM: If I issue

mount -o remount,rw /srv

in the VM, the VM will still be able to write to /srv. It appears that all the readonly flag does is that it signals to the VM to mount the path readonly. That means the VM is in control, but for security reasons I need the host to be in control.

Can you show the output of mount | grep srv inside the VM?

My suspicion is that you’re also being affected by a different bug also fixed in VM: Fix readonly disk shares by tomponline · Pull Request #8810 · lxc/lxd · GitHub

If so:

Short version:
I can’t think of a workaround that will work properly and you’re going to need to wait until that patch lands in the snap channel you’re using.

Long version:

  1. When we originally added directory sharing support we only used 9p sharing, which supports a readonly property. However because we run the QEMU process as non-root, this prevented sharing directories not accessible by the unprivileged user we run QEMU as. To work around this QEMU provides the virtfs-proxy-helper process which we can start as root and the QEMU process uses that to access the directory on the host.
  2. However we discovered whilst adding support for this that there was a bug in QEMU that meant that when using the virtfs-proxy-helper process the readonly property was ignored. We filed a bug upstream (which has subsequently been fixed), and then added an exception LXD that avoided using virtfs-proxy-helper when readonly=true. This meant that only globally accessible directories could be shared as readonly, but at least it wasn’t a security issue.
  3. The problem with 9p though, is that it is not particular performant, and so later we added optional support for virtiofs, which uses a different proxy process called virtiofsd that is run as root to allow access to directories not owned by the unprivileged user we run the QEMU process as.
  4. When virtiofsd is available we launch both that and the 9p virtfs-proxy-helper to allow the guest to use either. The lxd-agent process running inside the VM uses the virtiofs share as preference and if that doesn’t work (either due to lack of support on the host or inside the guest OS) then it fallbacks to the 9p share.
  5. The lxd-agent process mounts the share as ro if readonly=true, but we have always been aware that this is a security feature, just a nicety to help indicate to the guest that this is a read only directory.
  6. Later we also added AppArmor support to the QEMU process (but not the proxy processes). This is what prevented starting the VM when using readonly=true share because due to 2. we weren’t using the proxy process for read only shares, and so QEMU tried to access a directory that wasn’t allowed by its AppArmor process. As an aside, the AppArmor rules have to exactly match the type of file open operation that is occurring (i.e allowing rw access doesn’t allow r only access).
  7. Because the upstream bug in virtfs-proxy-helper that meant the readonly setting was not respected is now fixed, yesterday I switched our 9p share to always use virtfs-proxy-helper. In the process I discovered two more bugs; firstly, the the virtiofs share doesn’t have support for readonly in QEMU, so although the 9p share was correctly being started with readonly mode, the virtiofs share was still writeable (this is what I think you’re finding now when remounting the mount as rw), and secondly, that due to a race condition in lxd-agent when loading the vsock kernel module, this was causing the lxd-agent to exit during the boot process (and be restarted by systemd) with the effect of initially mounting the the share using virtiofs, and then on subsequent restart, attempting to mount it again as virtiofs (and failing) and falling back to mounting it as 9p (causing a 2nd over mount of the same path). However this behaviour was dependent on how quickly the vsock kernel module loaded.
  8. Because in the released LXD versions, the virtiofs share is available (even if not mounted), when you modify the raw.apparmor setting to allow the QEMU process to start when directly accessing the 9p share, someone who is root inside the guest can then choose to mount the virtiofs share and would then be able to write to the readonly share.

As this is a somewhat complex dance between LXD, QEMU and the lxd-agent, I’ve added some automated tests above to try and catch any future regressions in all 3 of the subsystems.

The output of mount | grep srv is:

lxd_srv on /srv type virtiofs (ro,relatime)

Thank you also for the full explanation. I think I understand now that 9p supports read-only access, but virtiofs always provides read-write access, so lxd should use 9p for proper read-only access. Due to a bug in an underlying component, lxd uses virtiofs in this case anyway. By default, this fails on apparmor (so apparmor enforces the read-only security), but when I set raw.apparmor, I override the apparmor profile so it can use virtiofs and thus provides full read-write access to the VM. Finally lxd-agent sets the mount flag ro in the VM, but being a VM process it can’t of course enforce security.

I take it then that there isn’t really a workaround.

I am willing to help with testing if that helps. So, when there is snap channel with the fixes included that I can try, please let me know.

1 Like

The following PRs ensure that readonly=true disk devices are now truly read-only even when using one of the QEMU proxy daemons to work around AppArmor profile and unprivileged user limitations.

We now use a host-side readonly bind mount of the source directory, which is passed to the virtfs-proxy-helper (for 9p) and virtiofsd (for virtio-fs) shares, providing a “belt and braces” approach by using the Linux kernel itself to enforce readonly access and don’t just rely on QEMU’s security restrictions.

And associated test updates:

The reason bind mounting didn’t work for you is because LXD from the SNAP runs inside its own mount namespace and was not seeing the bind mount you setup on the host’s mount namespace. LXD is now going to setup its own bind mount and this will be done inside the SNAP mount namespace and so will take effect.