I had an un-pleasant surprise trying to launch a centos 8 image on my existing lxc machine. It seems in the latest build (20200909_07) the lxd agent fails to start in vm mode making the image unusable for me since I cant exec in. I can use the console but don’t know the starting root password.
For various reasons I’d strongly like to use the vm mode and not run it as a container.
Using a very vanilla default profile and options the LXD agent fails to start in vm mode. If I try centos7, centos 8 in container mode it works fine and I can lxc exec.
The only output in the console of note was:
[FAILED] Failed to start LXD - agent - 9p mount.
See ‘systemctl status lxd-agent-9p.service’ for details.
[DEPEND] Dependency failed for LXD - agent.
$ lxc exec reportpool1 bash
Error: Failed to connect to lxd-agent
I have other images working fine - ubuntu, centos 8 on an earlier build (20200518_07) so it seems like it might just be the latest builds.
Does anyone know how to solve this? Without the root password or lxc exec I can’t even get in to look at any logs to begin to troubleshoot. Its a big locked black box.
Run command:
lxc launch images:centos/8 reportpool1 --vm
Host os: Ubuntu 18.04 with HWE
lxc --version
4.0.3
Any idea on how to solve this? Any ideas on how to “break in” to the container enough (or set it on launch ) to set a root password and troubleshoot?
This is a CentOS 8 regression, they had issues building the 9p kernel module in a recent kernel update and so decided to just not build it anymore, breaking everyone using it in the process…
So I’d recommend heading over there and complaining about it some more, maybe it will get re-enabled?
Until then, I was under the impression that RedHat was promising a very stable kernel where no such thing would ever happen, but apparently I was wrong…
Thanks. I’ve been debugging along those lines. It certainly seems like 9p support is busted. I’ve been trying to google instructions for getting 9p running in centos but despite loading the correct modules and using the centos.plus kernel it doesn’t seem to want to enable 9p support. But I admit I’m rather new to centos and 9p
Is there a simple way just to revert to a previous lxc centos8 image? My older one seems to work fine but I don’t seem to be able to get this exact image build launched.
Sadly, no, we only keep the past 3 images around as we would otherwise very quickly run out of disk space on the image servers…
In our own test systems, the workaround I used was to manually download the old centos8 kernel and forcefully install that, then prevent the system from ever updating it.
This isn’t really a solution I’d recommend though as that means you’re not getting security updates either. In our case, those test VMs are completely disconnected from the internet, so it’s not a problem, but for anyone else, that would likely be an issue…
We are looking at using virtiofs in the future, though it’s unclear that this would be supported any better on the centos8 kernel…
Understood. Thank you anyways for your help! I’ll try to downgrade just the kernel and see how it goes. I’ll also look harder for other workarounds. Maybe we can actually use a container, or maybe we can just use an ubuntu VM vs centos.
One option may also be to copy the LXD agent binary from 9p or from the host and run it manually in the VM. You’ll also need the various certificates that come with it.
The downside to this is that the binary may get out of date as LXD gets updated.
It also will not be getting some of the dynamic files we generate for disk passthrough and other bits, but this may not be an issue for your use.
Yeah, dkms could be an option, though if you figure out what’s needed to make it build through dkms, the same could be applied to the kernel, restoring the module for everyone.
That’s true .
Assuming that it works in other kernels, centOS must have components or changes that interfere with 9p .
But just closing your bugreport with wont-fix is a really bad handling of the situation.
Given I figured out how to create users and use cloud-init, what is the impact of NOT having the LXD-agent running? Obviously lxc exec is impacted. Is the agent critical or just a really-nice-to-have ?
Assuming the console would be fine for use (and ssh to the vm itself), could I still use the vm successfully without the agent?
@stgraber will know best, but I guess it’s mostly “nice-to-have”, because you can use some of the usual lxc commands with it, like lxc exec and lxc file; see for example simos blog for some details: https://blog.simos.info/how-to-use-virtual-machines-in-lxd/
(Section: Using a LXD virtual machine ).
Update:
Also lxc info seems to be affected.
If you can read some code, here is the source code for the lxd-agent:
State information (displayed in lxc info and lxc list)
Seeding & triggering of cloud-init on first boot
Automated mounting of disk devices that pass a host path into the VM (though that similarly relies on 9p so a manual agent will fail that part)
We are planning on adding quite a few more features including handling for CPU/memory hotplug, network device renaming and hotplug, virtiofs filesystems, …
But in general we never expect the agent to be a requirement and so long as you’ve configured some kind of user in the VM, you can access it using lxc console or over SSH.