I created an issue to track this:
Thank you Thomas!
Just looked at the bug report and have a small correction (not sure whether it matters): the host wher this error occurs is running Debian 11. The VM guest is Ubuntu 20.04.
I have reproduced on ubuntu 20.04 without the snap as well.
It looks like an AppArmor issue btw.
This should fix it:
Is it, as a workaround until the fixed version of lxd is released, possible to force in the configuration of the mount that the 9p driver (instead of virtio-fs) should be used? How would this be done?
The mount with readonly=false should work fine, but the fix is required for readonly=true.
I know that ‘readonly=true’ works, but I am looking for a workaround for ‘readonly=false’.
Disabling apparmor would provide a workaround, but beyond that I do not know of a workaround, beyond the fix that is.
You could in principle mount the source directory as read only and then share that as readonly=false.
You can also use raw.apparmor
to directly allow the missing path for that instance.
I’d need the host to have write access to the path, but the VM to have only read-only access. So I tried to do a bind mount of the host path /srv with option ‘ro’ to a new path /srv_ro and then specify /srv_ro as source argument in the lxd configuration. Now the VM starts up fine, but still has write access to /srv_ro, ignoring the read-only mount option, so this workaround does not work for me.
@stgraber: I tried your suggestion but could not get it working. I issued:
printf “/var/lib/snapd/hostfs/srv rw,” | lxc config set vm raw.apparmor -
but get the same error message when starting the VM and the log complains about the very same path not being accessible. It looks like my raw.apparmor settings gets ignored.
This works for me, notice the trailing slash, and the r
rather than rw
(as QEMU is opening the directory in read-only mode as intended).
lxc config device set <instance> <disk> readonly=true
printf "/var/lib/snapd/hostfs/srv/ r," | lxc config set <instance> raw.apparmor -
lxc start <instance>
This PR adds automated tests for this functionality:
I can confirm that this works in that it makes the VM start up without an error. (Though I don’t understand why giving more AppArmor permissions breaks it.)
However, the read-only protection of the path is easy to circumvent in the VM: If I issue
mount -o remount,rw /srv
in the VM, the VM will still be able to write to /srv. It appears that all the readonly flag does is that it signals to the VM to mount the path readonly. That means the VM is in control, but for security reasons I need the host to be in control.
Can you show the output of mount | grep srv
inside the VM?
My suspicion is that you’re also being affected by a different bug also fixed in VM: Fix readonly disk shares by tomponline · Pull Request #8810 · lxc/lxd · GitHub
If so:
Short version:
I can’t think of a workaround that will work properly and you’re going to need to wait until that patch lands in the snap channel you’re using.
Long version:
- When we originally added directory sharing support we only used 9p sharing, which supports a
readonly
property. However because we run the QEMU process as non-root, this prevented sharing directories not accessible by the unprivileged user we run QEMU as. To work around this QEMU provides thevirtfs-proxy-helper
process which we can start as root and the QEMU process uses that to access the directory on the host. - However we discovered whilst adding support for this that there was a bug in QEMU that meant that when using the
virtfs-proxy-helper
process thereadonly
property was ignored. We filed a bug upstream (which has subsequently been fixed), and then added an exception LXD that avoided usingvirtfs-proxy-helper
whenreadonly=true
. This meant that only globally accessible directories could be shared as readonly, but at least it wasn’t a security issue. - The problem with 9p though, is that it is not particular performant, and so later we added optional support for virtiofs, which uses a different proxy process called
virtiofsd
that is run as root to allow access to directories not owned by the unprivileged user we run the QEMU process as. - When
virtiofsd
is available we launch both that and the 9pvirtfs-proxy-helper
to allow the guest to use either. Thelxd-agent
process running inside the VM uses thevirtiofs
share as preference and if that doesn’t work (either due to lack of support on the host or inside the guest OS) then it fallbacks to the 9p share. - The
lxd-agent
process mounts the share asro
ifreadonly=true
, but we have always been aware that this is a security feature, just a nicety to help indicate to the guest that this is a read only directory. - Later we also added AppArmor support to the QEMU process (but not the proxy processes). This is what prevented starting the VM when using
readonly=true
share because due to 2. we weren’t using the proxy process for read only shares, and so QEMU tried to access a directory that wasn’t allowed by its AppArmor process. As an aside, the AppArmor rules have to exactly match the type of file open operation that is occurring (i.e allowingrw
access doesn’t allowr
only access). - Because the upstream bug in
virtfs-proxy-helper
that meant thereadonly
setting was not respected is now fixed, yesterday I switched our 9p share to always usevirtfs-proxy-helper
. In the process I discovered two more bugs; firstly, the thevirtiofs
share doesn’t have support forreadonly
in QEMU, so although the 9p share was correctly being started withreadonly
mode, thevirtiofs
share was still writeable (this is what I think you’re finding now when remounting the mount asrw
), and secondly, that due to a race condition inlxd-agent
when loading thevsock
kernel module, this was causing thelxd-agent
to exit during the boot process (and be restarted by systemd) with the effect of initially mounting the the share using virtiofs, and then on subsequent restart, attempting to mount it again as virtiofs (and failing) and falling back to mounting it as 9p (causing a 2nd over mount of the same path). However this behaviour was dependent on how quickly the vsock kernel module loaded. - Because in the released LXD versions, the virtiofs share is available (even if not mounted), when you modify the raw.apparmor setting to allow the QEMU process to start when directly accessing the 9p share, someone who is root inside the guest can then choose to mount the virtiofs share and would then be able to write to the readonly share.
As this is a somewhat complex dance between LXD, QEMU and the lxd-agent, I’ve added some automated tests above to try and catch any future regressions in all 3 of the subsystems.
The output of mount | grep srv
is:
lxd_srv on /srv type virtiofs (ro,relatime)
Thank you also for the full explanation. I think I understand now that 9p
supports read-only access, but virtiofs
always provides read-write access, so lxd
should use 9p
for proper read-only access. Due to a bug in an underlying component, lxd
uses virtiofs
in this case anyway. By default, this fails on apparmor (so apparmor enforces the read-only security), but when I set raw.apparmor
, I override the apparmor profile so it can use virtiofs and thus provides full read-write access to the VM. Finally lxd-agent
sets the mount flag ro
in the VM, but being a VM process it can’t of course enforce security.
I take it then that there isn’t really a workaround.
I am willing to help with testing if that helps. So, when there is snap channel with the fixes included that I can try, please let me know.
The following PRs ensure that readonly=true disk devices are now truly read-only even when using one of the QEMU proxy daemons to work around AppArmor profile and unprivileged user limitations.
We now use a host-side readonly bind mount of the source directory, which is passed to the virtfs-proxy-helper
(for 9p) and virtiofsd
(for virtio-fs) shares, providing a “belt and braces” approach by using the Linux kernel itself to enforce readonly access and don’t just rely on QEMU’s security restrictions.
And associated test updates:
The reason bind mounting didn’t work for you is because LXD from the SNAP runs inside its own mount namespace and was not seeing the bind mount you setup on the host’s mount namespace. LXD is now going to setup its own bind mount and this will be done inside the SNAP mount namespace and so will take effect.