Incus Server OS Upgrade Corruption to incus containers

I performed an OS update on my Incus server OS. I started at Ubuntu 22.04, upgraded to Ubuntu 23.10 and then upgraded finally to Ubuntu 24.04.

All of my containers appeared to be fine at first. As @stgraber indicated to me, post update it is important to re-enable the Zabbly repository:

cd  /etc/apt/sources.list.d
nano zabbly-incus-stable.sources

Change the repository back to “Enabled: yes” and change “Suites: noble”

After that:

apt update && apt upgrade

Your incus server should be fine.

Unfortunately, I discovered that any incus containers with nested docker applications were improperly shut down with the Ubuntu upgrade.

This resulted in those incus containers with nested docker apps to have corrupted “vfs” volume folders and all docker layers for any containers were simply gone and had to be pulled again.

I can’t really explain this. All my incus containers without nested docker are fine.

My advice is to shutdown all containers before beginning an OS upgrade.

incus stop --all

and then afterwards:

incus start --all

Not sure that this is the solution, but I hope so.

@stgraber this subject seems to go a bit deeper. I have a feeling that the:

/var/lib/incus/storage-pools/default/containers

folder structure might have been altered somehow during the Ubuntu OS upgrade on the incus host.

I can confirm that EVERY container that had nested docker appears to have lost all of its docker container layers and it was necessary to do a:

docker compose pull
docker compose up -d

on every docker app nested inside of an incus container on the incus server that was updated from Ubuntu 22.04 → Ubuntu 23.10 → Ubuntu 24.04.

Your thoughts?

What storage driver are you using?

My main guess would be a kernel change that caused some storage features to differ enough to upset Docker’s overlayfs setup.

I am using docker overlayfs. However, if that is a part of the incus container image, how does that change? Any suggestions here?

Sorry, I meant what Incus storage driver is used for this pool?

Awe…zfs. On the same server I have a dir pool also. Now that I think of it, only the containers on the zfs pool were affected. It’s the pool created with “incus admin init” and I defaulted to a virtual (in a file) zfs pool. This server was originally migrated from LXD when you first released incus.

Right, so most likely what happened is some kind of change in how the overlayfs overlays are stored on top of ZFS when it’s directly mapped from the host with a manual shift as opposed to using VFS idmap shifting.

Moving to the newer Ubuntu moved you to ZFS 2.2 which now natively supports VFS idmap shifting, meaning that all the data in /var/lib/incus/containers is stored unshifted (as seen in the container) rather than having all uid/gid altered to match the container’s namespace.

My guess is that overlayfs stores data differently under VFS idmap than under the old way and this caused enough confusion to break Docker.

I’m sure someone more familiar with Docker and overlayfs could figure out exactly what’s different in the overlay data and so provide a way to convert it.

Interesting. There should probably be something in the incus release notes that serves as a caution for those updating to Ubuntu 24.04. My experience was that although the nested docker was running properly in each incus container, every app had lost its containers as though the user performed:

docker compose down
docker system prune -a

Literally everything was gone. I needed to:

docker compose pull
docker compose up -d

Of course apps that had a “build process” were no where near that easy.

Since this involves potential data loss for any incus server resident on an Ubuntu system that is upgraded to 24.04, we need a “best practice” process to avoiding loss of data for those with nested docker in their incus containers.

Here’s something to ponder. After my Incus host upgrade to Ubuntu 24.04 I was able to “docker compose up -d” or “docker build” on all of my incus containers with nested docker. They all came back up, but had to pull all of their container overlays.

By default, docker uses overlay2 as the file system. Interestingly, any Ubuntu 24.04 system with nested docker no longer has a

/var/lib/docker/vfs

folder for docker container images.

Also, any previous incus export that had nested docker lacks its image information when imported to an incus server hosted on Ubuntu 24.04. That is regardless of the OS version for the incus container being imported.

All 70 of my incus containers are functional after the upgrade. I might note that there were no issues with containers that did not have nested docker.

I use NginX Proxy Manager for my reverse proxy for myself and my small business sites. Interestingly, this app behaved as though it was running after the docker container was re-pulled.

However, it was causing the network (even non-proxied) nodes to not behave. I reinstalled the container from scratch multiple times and each time the app would freeze or restart randomly. It is nothing special and looks like most other docker apps.

Eventually, I created a btrfs pool and loaded the app on a container on that pool and NginX Proxy Manager runs perfectly. I am not sure what happened, but it is notable that the Ubuntu 24.04 LTS host still has some integration issues with Incus, especially as it applies to zfs pools.

Ah, okay, so yeah, that’d explain the issue you’re seeing.

I think it may be related to VFS idmap and whether it’s possible for overlayfs to be mounted on top of a VFS idmapped mount. I suspect this may not be supported yet and is forcing Docker onto vfs mode. So basically the kernel change that came with the 22.04 to 24.04 upgrade has caused differences in what’s supported at the VFS/filesystem level.

I find it a bit weird that Docker just silently switches to a different storage backend though when it has existing data in another format. I’d have expected a startup failure stating that it wants to use overlayfs but can’t.

The way I would solve this properly these days, and in general the recommended setup anyway is to create a separate volume for your Docker data and have that mounted on /var/lib/docker. This avoids any of those potential shifting/idmap type issues.

For existing containers, you should be able to move from whatever you have today to a setup with a separate /var/lib/docker by doing:

  • incus storage volume create default my-docker size=100GiB
  • incus config device add my-container docker pool=default source=my-docker path=/mnt/docker
  • incus exec my-container – systemctl stop docker
  • incus exec my-container – sh -c “mv /var/lib/docker/* /mnt/docker/”
  • incus stop my-container
  • incus config device set my-container docker path=/var/lib/docker
  • incus start my-container

So basically moving all the Docker data from the container’s rootfs to the new separate volume.