@stgraber, in a video you mentioned, that for productive services exposed to the internet, you use Incus inside Incus. I understood it the way, that you have a VM with Incus on an Incus Server. Inside the VM you also install incus and spin um containers. I guess you do that because of the “better” isolation of the VM comared to containers. So that if you have a breakout on a container it is still contained inside the VM.
Can you please share some insights? We are currently also thinking about going that way. Which storage provider do you use for the Incus VM? Would it be ZFS (Incus bare metal) inside ZFS (Incus VM)? Do you pass physical Disks to the Incus VM or just a volume? Would you install Incus in the VM using IncusOS or go for a standard install? What are your thoughts about double encrypted ZFS when going IncusOS in IncusOS? How do you pass networks between the two nested Incus instances (via VLANs)? I would be happy to get some best practices based on your experience.
My production clusters typically run with either Ceph or Linstor for storage and with OVN for networking, so it’s pretty easy to make those VMs effectively equivalent to a physical server by having them use the same distributed storage and networking as the physical hosts.
They’re also joined into the same cluster, so when I look at incus cluster list, I see the physical servers and then one VM per server which is used to run the containers.
I then have a fancy placement scriptlet which looks at properties of the project of the instance being placed to determine whether the instance, if it’s a container, should be placed within one of the VMs or if it’s from a trusted project and can be run directly on a physical host.
When using linstor, I’ve heard you say how the local instances use local disks, for reads to be much faster. But on those VM, every disk is a remote one. The disk on the same host machine should be better due to using local networking and not limited by physical connection speeds. Is it possible to have a preferred remote disk in this case?
Isn’t it better in this case to use the cluster groups and restricted.cluster.groups? I have an issue in this when I control the placement with a scriptlet on a multi-user environment. Although I can control the initial placement with a scriptlet, it is not checked the moving resources. Maybe even between projects. So resource placement is not really constrained. Then another user may move the resources without knowing the scriptlet limitations.
On the original question, It really depends on what you want to do. If you want better host isolation, then containers inside VMs as cluster members like @stgraber described is a better solution. This is a nice solution for exposing resources to the internet as host isolation is important.
I do it through containers as cluster members, since I don’t need better host isolation, but better resource isolation. To explain my use case, I have a multi-user cluster with constrained infrastructure. The cluster is mostly for students to run different kinds of instances. Some do computational experiments and they need isolated resources for better estimation of running times. Others just need a conventional distributed cloud infrastructure to test building services through incus.
Since the resources are constrained, most of each hosts resources are allocated for the experiments. And for resource isolation, they need dedicated memory and CPU cores, which incus can do through CPU pinning and memory limits. The experiments and the service tests run on different pinned CPU cores to minimise the interference on CPU usage. Memory limit for many instances on each host isn’t really possible.
The solution I currently use is by using the placement scriptlet with different projects for experiments and for services. Then the scriptlet checks the CPU pinning to make sure they are isolated and limits total memory usage. This can be done without the students knowing about it just by configuring each project’s default profile.
The main issue with this solution is because students don’t know much about how resources are organized, so they may move instances around or may want to change instance configuration or move instances. Since the scriptlet is not checked on changing configuration or moving instances, they may create inconsistent cluster states.
The ideal solution in my case, which is something I’ll eventually do, is to have privileged containers as cluster members and control resources on those containers. Then I can control instance placement just through cluster groups and project constraints, which provides better control on resources. This is the part of incus on incus related to the original question.
Sadly, in this case, you would be constrained to a standard incus install as there is no IncusOS container available (would be quite nice btw).
I don’t know if there’s a mechanism that could be used to better place the two disk copies so one lives on the physical host instead of over the network.
This would restrict everything, both containers and VMs to that list of groups.
In my production environments I tend to want specific projects to have their containers be placed inside of VMs but their VMs still be placed on the physical hosts (and in fact specifically never be placed as nested VMs).
The placement scriptlet allows for that kind of flexibility.