I’m trying to export and import VMs and containers on a few different hosts (not clustered) and I’m seeing some behaviors that I’m wondering about:
On one system, an “export” command for a non-running container sometimes hangs for several hours, in one case an export process was still active a week later.
On another system, importing an exported VM from another system quite quickly reaches Importing instance: 100% but then takes hours to actually throw an error (like a missing profile, for example)
Lastly and somewhat tangentially, is there any planned work to make missing profiles or networks more user friendly? For example, if the missing profile only contains “atomic” config settings like amount of CPU cores, it would be nice to either include the profile with the export, or “flattening” the config (similar to the extended -e flag) such that the VM does not refer to the profile when imported.
Any ideas on how I can troubleshoot these hangs? Importing and exporting is quite important, not only for backups but also for migrations, so the lack of feedback from the command during export/import when nothing seems to be happening is a bit jarring.
On a separate terminal window you can run the following so that you get real-time feedback as to what is happening during the Incus actions. The --pretty flag will show one record per line.
The only issue is that it’s extremely slow compared to all the other operations, this is an zstd-3 compressed zfs dataset on an nvme drive and from watch -n1 zfs list it looks like the dataset is expanding by about 0.02G per 30 seconds or so. I really can’t understand why.
Also, I’m not sure if this is correct, but I get the feeling that exporting with --compression zstd is significantly slower than exporting without compression and then running zstd -T0 -3 vm.tar, but I haven’t timed it yet.
Could it be that the dataset is heavily compressible, e.g. lots of unused blocks which read as all zeros? If your zfs filesystem has compression enabled, which you say it has, that would mean large amounts of data written would result in much smaller growth in space usage.
A better way to monitor the throughput is to find one of the processes which is handling the compression/decompression (use ps to find a likely process such as gzip), then:
I’m sure that a lot of it is “empty” data, but given an export file that is itself zstd compressed being uncompressed into a zfs dataset with zstd compression, this should speed up the “empty” writing immensely. Instead, I’m seeing something like (rough estimates here) 60 minutes to import a 30GB disk image containing less than 10GB of “real” data – giving something like a 8 MB/s write speed on average.
When running incus export --compression zstd I’m seeing write speeds between 40-100MB/s, which of course is lower than “raw” nvme write speed due to the compression, and highly variable based on the incoming data, but the import (ie the write speed) being so much slower is what really makes me scratch my head.
I just tried exporting a VM with zstd compression, I then deleted the VM with incus rm and re-imported it. The export took about five minutes. The import read speed (“importing instance”) matched that of the export which makes sense. The next step where incus gives no output (when it’s “unpacking virtual machine block volume”) I checked ps aux which showed three processes:
after a while only “zstd -d” remains (with a new pid). checking cat /proc/130515/io shows both read_bytes and write_bytes at zero. I watched this for a few minutes, but only rchar, wchar, syscr and syscw were incrementing. wchar showed something like 2GB written after two minutes, which seems to match the export speed somewhat, and in this period the zfs dataset has only grown by about 300MB.
Maybe I’m missing something obvious, but I guess my assumption is that an export of a VM (zfs+zstd → tar+zstd) should take about the same time as an import (tar+zstd → zfs+zstd) when done on the exact same system, but instead the import is many orders of magnitude slower.
If I have some time to spare, I’ll try to get some actual statistics on what the different combinations give in terms of wall clock results.
That implies a ~7:1 compression ratio which is plausible. 2GB after 2 minutes is 17MB/sec which seems pretty slow though. What does “top” show? Is it CPU-bound?
sysload doesn’t really tell you anything. If it’s running slowly, it’s starved of some resource. The process might be blocked on CPU or I/O. If the process is single threaded, having extra cores won’t help.
Looking at the individual process’ CPU utilization may show you if it’s using a whole of a core. The run states are useful too; most common ones being “D” meaning blocked on disk I/O, “R” being running or runnable (hence CPU-bound).
I checked CPU utilization while it was running with btop and it wasn’t stressed, I guess it could be using one core but I thought that’s what l means in the ps output (I didn’t include this previously, sorry):
I guess it’s quite possible that zstd is running single core when called by incus just like xz is by default, but that still makes me confused as to why the export is then much faster, since --compression is passed to incus I would assume that should be “affected” by single core compression as well.
So in short, zstd is actually faster than none, xz is extremely slow for some reason, and the import speed is more or less constant, except uncompressed imports are slightly faster (less overhead from decompression, I guess).
But, if you don’t specify --compression and instead go for --optimized-storage, the export time is 0m40 (similar to zstd/none), but the import time is only 0m8, way faster than any of the above.
So from this I’ve learned that --optimized-storage should always be used, unless you for some reason need to ensure backups can be imported on a completely different type of storage.
Why it takes about 4x time to import than it does to export (as seen with “none”) is still beyond me though nevermind, obviously it’s the compression.
So I’d say this at least solves the slow import issue.
If I can figure out anything regarding the failed exports issue, I’ll post it as a separate thread.