mrakgr
(Marko Grdinić)
December 9, 2024, 2:51pm
1
On my job, I am spending way too much time configuring these kinds of machines, so I want to make an image that could easily be spun up on them instead. I am new to this, so I don’t know how it’d be possible to configure a VM with GPU support. This is what the Perplexity chatbot had to say.
From what I can tell, Incus doesn’t have the init
command or the --gputype
option. It does have create
which I used to create an VM instance based on Ubuntu 24.04. Now, I think I am supposed to add the GPUs to the instance and then run it, right?
How do I do that?
Also besides adding the GPUs, do I have to add the bridges as well? Here is the full list of PCI devices as outputted by lspci | grep -i 'nvidia'
.
05:00.0 Bridge: NVIDIA Corporation Device 22a3 (rev a1)
06:00.0 Bridge: NVIDIA Corporation Device 22a3 (rev a1)
07:00.0 Bridge: NVIDIA Corporation Device 22a3 (rev a1)
08:00.0 Bridge: NVIDIA Corporation Device 22a3 (rev a1)
18:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
2a:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
3a:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
5d:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
84:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
8b:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
91:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
e4:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
How good is Incus’ GPU support? Would what I am thinking be possible using it?
I’m asking here because I don’t trust the chatbots to get this right.
mrakgr
(Marko Grdinić)
December 9, 2024, 2:54pm
2
For a little bit more background, something as simple as a reboot takes 10m on these machines, and I don’t have physical access to them, so an OS reinstall requires opening a ticket at a facility that manages them which takes over a day. I’m not looking for a career devops engineering, and I want to make them easier to manage in order to actually spend more time programming them.
mrakgr
(Marko Grdinić)
December 9, 2024, 2:59pm
3
As an alternative to VMs if the GPUs are hard to deal with, would there be a way of somehow easily putting the machines to an original state? I am clueless as to how the public clouds do it.
stgraber
(Stéphane Graber)
December 9, 2024, 3:54pm
4
mrakgr:
From what I can tell, Incus doesn’t have the init
command or the --gputype
option. It does have create
which I used to create an VM instance based on Ubuntu 24.04. Now, I think I am supposed to add the GPUs to the instance and then run it, right?
You can feed a full instance definition as YAML to incus create
(official name of incus init
), so that would let you define an instance including a bunch of devices.
Though it’s also possible to do it through the -d
option, something like this:
-d mygpu,type=gpu -d mygpu,gputype=physical -d mygpu,pci=ADDRESS
mrakgr
(Marko Grdinić)
December 9, 2024, 4:01pm
5
Making a full instance definition sounds good. Where could I find an example?
I don’t know the answer but let me at least point to the documentation. Instance options - Incus documentation
stgraber
(Stéphane Graber)
December 9, 2024, 6:15pm
7
Easiest is to get an instance working the way you want it, then you can use incus config show
to get the YAML for that instance and you can feed that back to incus create
or incus launch
through stdin.
mrakgr
(Marko Grdinić)
December 12, 2024, 10:32am
8
incus create images:ubuntu/24.04 gpu-test --vm
incus config device add gpu-test gpu0 gpu gputype=physical pci=18:00.0
incus config device add gpu-test gpu1 gpu gputype=physical pci=2a:00.0
incus start gpu-test
I executed these commands to create an instance, added two GPUs to it, and started it, but it seems it frozen on the last command. What am I doing wrong here? Do I need to add those bridges maybe? Are those supposed to be network devices?
stgraber
(Stéphane Graber)
December 12, 2024, 5:19pm
9
If it gets stuck on incus start
when passing through physical GPUs, it’s most likely because your OS is having a bad day.
Look at dmesg
for any OS kernel errors.
mrakgr
(Marko Grdinić)
December 12, 2024, 5:28pm
10
(trt) ceti@ceti16:~/inference-servers/playbooks$ incus info --show-log gpu-test
Name: gpu-test
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Created: 2024/12/09 14:25 UTC
Last Used: 1970/01/01 00:00 UTC
Error: open /var/log/incus/gpu-test/qemu.log: no such file or directory
(trt) ceti@ceti16:~/inference-servers/playbooks$ sudo dmesg | grep incus
[1112167.520026] audit: type=1400 audit(1733480169.389:108): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_dnsmasq-incusbr0_</var/lib/incus>" pid=1959456 comm="apparmor_parser"
[1112206.971602] audit: type=1400 audit(1733480208.836:109): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-b4808bac-09a4-4b2e-9908-ecf3a8c05f21" pid=1959684 comm="apparmor_parser"
[1112207.072954] audit: type=1400 audit(1733480208.940:110): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-b4808bac-09a4-4b2e-9908-ecf3a8c05f21" pid=1959689 comm="apparmor_parser"
[1112207.127016] audit: type=1400 audit(1733480208.992:111): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959700 comm="apparmor_parser"
[1112207.155822] audit: type=1400 audit(1733480209.020:112): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" name="/sys/devices/system/node/" pid=1959702 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1112207.200161] audit: type=1400 audit(1733480209.064:113): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959707 comm="apparmor_parser"
[1112207.250880] audit: type=1400 audit(1733480209.116:114): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959709 comm="apparmor_parser"
[1112207.273878] audit: type=1400 audit(1733480209.140:115): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" name="/sys/devices/system/node/" pid=1959710 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1112211.927360] audit: type=1400 audit(1733480213.791:116): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959747 comm="apparmor_parser"
[1112213.042246] incusbr0: port 1(tap712f59eb) entered blocking state
[1112213.042255] incusbr0: port 1(tap712f59eb) entered disabled state
[1112215.098083] audit: type=1400 audit(1733480216.963:117): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus-first_</var/lib/incus>" pid=1959781 comm="apparmor_parser"
[1112215.297559] incusbr0: port 1(tap712f59eb) entered blocking state
[1112215.297572] incusbr0: port 1(tap712f59eb) entered forwarding state
[1112215.297746] IPv6: ADDRCONF(NETDEV_CHANGE): incusbr0: link becomes ready
[1112336.928085] audit: type=1400 audit(1733480338.787:118): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-4f4995d3-7fb2-40a3-94c3-dcf72c6b9fde" pid=1960273 comm="apparmor_parser"
[1112337.026116] audit: type=1400 audit(1733480338.887:119): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-4f4995d3-7fb2-40a3-94c3-dcf72c6b9fde" pid=1960278 comm="apparmor_parser"
[1112337.119983] audit: type=1400 audit(1733480338.979:120): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-1137f312-3708-42a4-a61a-2791aef6dfbc" pid=1960281 comm="apparmor_parser"
[1112338.402073] audit: type=1400 audit(1733480340.263:121): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-1137f312-3708-42a4-a61a-2791aef6dfbc" pid=1960492 comm="apparmor_parser"
[1112338.497470] incusbr0: port 2(vethbcc78fe4) entered blocking state
[1112338.497478] incusbr0: port 2(vethbcc78fe4) entered disabled state
[1112338.737733] audit: type=1400 audit(1733480340.599:122): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus-snd_</var/lib/incus>" pid=1960789 comm="apparmor_parser"
[1112338.934435] incusbr0: port 2(vethbcc78fe4) entered blocking state
[1112338.934444] incusbr0: port 2(vethbcc78fe4) entered forwarding state
[1386192.011811] audit: type=1400 audit(1733754187.826:123): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-ee2adc98-9553-4acd-87ae-af7333e4d33e" pid=2458422 comm="apparmor_parser"
[1386192.097772] audit: type=1400 audit(1733754187.910:124): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-ee2adc98-9553-4acd-87ae-af7333e4d33e" pid=2458433 comm="apparmor_parser"
[1386192.191233] audit: type=1400 audit(1733754188.006:125): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-85ad0e21-9acb-4d02-8da8-946bb1018810" pid=2458435 comm="apparmor_parser"
[1386193.743060] audit: type=1400 audit(1733754189.554:126): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-85ad0e21-9acb-4d02-8da8-946bb1018810" pid=2458642 comm="apparmor_parser"
[1386277.640867] incusbr0: port 1(tap712f59eb) entered disabled state
[1386277.826244] incusbr0: port 1(tap712f59eb) entered disabled state
[1386278.221491] audit: type=1400 audit(1733754274.032:127): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-first_</var/lib/incus>" pid=2459091 comm="apparmor_parser"
[1386294.610212] incusbr0: port 2(vethbcc78fe4) entered disabled state
[1386294.744218] incusbr0: port 2(vethbcc78fe4) entered disabled state
[1386295.626368] audit: type=1400 audit(1733754291.436:128): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-snd_</var/lib/incus>" pid=2459299 comm="apparmor_parser"
[1386341.778042] audit: type=1400 audit(1733754337.587:129): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-864c6116-5de9-4a69-96bf-8191edb32e45" pid=2459577 comm="apparmor_parser"
[1386341.865086] audit: type=1400 audit(1733754337.675:130): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-864c6116-5de9-4a69-96bf-8191edb32e45" pid=2459582 comm="apparmor_parser"
[1386341.917246] audit: type=1400 audit(1733754337.727:131): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459584 comm="apparmor_parser"
[1386341.942098] audit: type=1400 audit(1733754337.751:132): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" name="/sys/devices/system/node/" pid=2459585 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1386341.997330] audit: type=1400 audit(1733754337.807:133): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459589 comm="apparmor_parser"
[1386342.051581] audit: type=1400 audit(1733754337.863:134): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459591 comm="apparmor_parser"
[1386342.080253] audit: type=1400 audit(1733754337.891:135): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" name="/sys/devices/system/node/" pid=2459592 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1386345.533249] audit: type=1400 audit(1733754341.343:136): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459626 comm="apparmor_parser"
[1386388.843086] audit: type=1400 audit(1733754384.650:137): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_rsync-7ccae3ba-5834-4e7e-8a2c-031231921851" pid=2459795 comm="apparmor_parser"
[1386388.943847] audit: type=1400 audit(1733754384.754:138): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_rsync-7ccae3ba-5834-4e7e-8a2c-031231921851" pid=2459801 comm="apparmor_parser"
[1631136.391676] incusbr0: port 1(tapc58f47cb) entered blocking state
[1631136.391685] incusbr0: port 1(tapc58f47cb) entered disabled state
[1631263.770556] audit: type=1400 audit(1733999254.140:139): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_rsync-894fc324-55d8-4898-a24b-d5aed7cb136a" pid=2850906 comm="apparmor_parser"
[1631263.864832] audit: type=1400 audit(1733999254.232:140): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_rsync-894fc324-55d8-4898-a24b-d5aed7cb136a" pid=2850912 comm="apparmor_parser"
[1631275.020559] incusbr0: port 2(tapc2b5370f) entered blocking state
[1631275.020568] incusbr0: port 2(tapc2b5370f) entered disabled state
[1631277.078610] audit: type=1400 audit(1733999267.448:141): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus-gpu-test1_</var/lib/incus>" pid=2851021 comm="apparmor_parser"
[1631277.285265] incusbr0: port 2(tapc2b5370f) entered blocking state
[1631277.285275] incusbr0: port 2(tapc2b5370f) entered forwarding state
[1631304.760142] incusbr0: port 2(tapc2b5370f) entered disabled state
[1631304.915777] incusbr0: port 2(tapc2b5370f) entered disabled state
[1631305.127810] audit: type=1400 audit(1733999295.496:142): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-gpu-test1_</var/lib/incus>" pid=2851286 comm="apparmor_parser"
stgraber
(Stéphane Graber)
December 12, 2024, 5:30pm
11
It wouldn’t show up as an Incus error in the kernel log.
If incus start
is stuck, then it’s because QEMU is stuck and the most likely reason for it to be stuck is the kernel being stuck.
mrakgr
(Marko Grdinić)
December 12, 2024, 5:30pm
12
Based on this, do I maybe need to install Qemu? I thought that came with Incus, and since regular (non-GPU using) VMs didn’t ask for it, I didn’t bother.
mrakgr
(Marko Grdinić)
December 12, 2024, 5:31pm
13
[527705.357113] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527705.416458] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527705.437229] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527705.459246] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527705.468725] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527705.485661] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527705.494514] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527713.209295] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[527715.338172] docker0: port 1(vethed21921) entered disabled state
[527715.338533] vethfa1ab53: renamed from eth0
[527715.409089] docker0: port 1(vethed21921) entered disabled state
[527715.415321] device vethed21921 left promiscuous mode
[527715.415330] docker0: port 1(vethed21921) entered disabled state
[528255.677360] docker0: port 1(vethafa8081) entered blocking state
[528255.677367] docker0: port 1(vethafa8081) entered disabled state
[528255.677547] device vethafa8081 entered promiscuous mode
[528256.424269] eth0: renamed from veth96f91fd
[528256.440938] IPv6: ADDRCONF(NETDEV_CHANGE): vethafa8081: link becomes ready
[528256.441211] docker0: port 1(vethafa8081) entered blocking state
[528256.441220] docker0: port 1(vethafa8081) entered forwarding state
[528566.157944] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528566.254940] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528566.288305] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528566.299151] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528566.305449] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528566.306156] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528566.306940] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528574.640256] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[528576.311961] docker0: port 1(vethafa8081) entered disabled state
[528576.312304] veth96f91fd: renamed from eth0
[528576.368390] docker0: port 1(vethafa8081) entered disabled state
[528576.378446] device vethafa8081 left promiscuous mode
[528576.378456] docker0: port 1(vethafa8081) entered disabled state
[529545.042042] perf: interrupt took too long (3989 > 3962), lowering kernel.perf_event_max_sample_rate to 50000
[618938.627741] perf: interrupt took too long (5008 > 4986), lowering kernel.perf_event_max_sample_rate to 39750
[831614.204849] perf: interrupt took too long (6273 > 6260), lowering kernel.perf_event_max_sample_rate to 31750
[1112165.849598] NET: Registered PF_VSOCK protocol family
[1112167.520026] audit: type=1400 audit(1733480169.389:108): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_dnsmasq-incusbr0_</var/lib/incus>" pid=1959456 comm="apparmor_parser"
[1112206.971602] audit: type=1400 audit(1733480208.836:109): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-b4808bac-09a4-4b2e-9908-ecf3a8c05f21" pid=1959684 comm="apparmor_parser"
[1112207.072954] audit: type=1400 audit(1733480208.940:110): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-b4808bac-09a4-4b2e-9908-ecf3a8c05f21" pid=1959689 comm="apparmor_parser"
[1112207.127016] audit: type=1400 audit(1733480208.992:111): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959700 comm="apparmor_parser"
[1112207.155822] audit: type=1400 audit(1733480209.020:112): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" name="/sys/devices/system/node/" pid=1959702 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1112207.200161] audit: type=1400 audit(1733480209.064:113): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959707 comm="apparmor_parser"
[1112207.250880] audit: type=1400 audit(1733480209.116:114): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959709 comm="apparmor_parser"
[1112207.273878] audit: type=1400 audit(1733480209.140:115): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" name="/sys/devices/system/node/" pid=1959710 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1112211.927360] audit: type=1400 audit(1733480213.791:116): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-5c5d32eb425c389922b9246064c2cba21fc4be1fa0378ab0845c2da43f5fc08a.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-first-root.img>" pid=1959747 comm="apparmor_parser"
[1112213.042246] incusbr0: port 1(tap712f59eb) entered blocking state
[1112213.042255] incusbr0: port 1(tap712f59eb) entered disabled state
[1112213.042520] device tap712f59eb entered promiscuous mode
[1112215.098083] audit: type=1400 audit(1733480216.963:117): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus-first_</var/lib/incus>" pid=1959781 comm="apparmor_parser"
[1112215.297559] incusbr0: port 1(tap712f59eb) entered blocking state
[1112215.297572] incusbr0: port 1(tap712f59eb) entered forwarding state
[1112215.297746] IPv6: ADDRCONF(NETDEV_CHANGE): incusbr0: link becomes ready
[1112336.928085] audit: type=1400 audit(1733480338.787:118): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-4f4995d3-7fb2-40a3-94c3-dcf72c6b9fde" pid=1960273 comm="apparmor_parser"
[1112337.026116] audit: type=1400 audit(1733480338.887:119): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-4f4995d3-7fb2-40a3-94c3-dcf72c6b9fde" pid=1960278 comm="apparmor_parser"
[1112337.119983] audit: type=1400 audit(1733480338.979:120): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-1137f312-3708-42a4-a61a-2791aef6dfbc" pid=1960281 comm="apparmor_parser"
[1112338.402073] audit: type=1400 audit(1733480340.263:121): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-1137f312-3708-42a4-a61a-2791aef6dfbc" pid=1960492 comm="apparmor_parser"
[1112338.497470] incusbr0: port 2(vethbcc78fe4) entered blocking state
[1112338.497478] incusbr0: port 2(vethbcc78fe4) entered disabled state
[1112338.497682] device vethbcc78fe4 entered promiscuous mode
[1112338.737733] audit: type=1400 audit(1733480340.599:122): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus-snd_</var/lib/incus>" pid=1960789 comm="apparmor_parser"
[1112338.870073] physnx9lQf: renamed from vethb6a527b7
[1112338.902386] eth0: renamed from physnx9lQf
[1112338.933641] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[1112338.934435] incusbr0: port 2(vethbcc78fe4) entered blocking state
[1112338.934444] incusbr0: port 2(vethbcc78fe4) entered forwarding state
[1309309.152187] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309309.170230] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309309.180895] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309309.187207] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309309.206429] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309309.216870] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309309.228341] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1309313.793082] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1368224.570322] docker0: port 1(veth5b5e142) entered blocking state
[1368224.570329] docker0: port 1(veth5b5e142) entered disabled state
[1368224.570631] device veth5b5e142 entered promiscuous mode
[1368225.227855] eth0: renamed from veth597ddbc
[1368225.259577] IPv6: ADDRCONF(NETDEV_CHANGE): veth5b5e142: link becomes ready
[1368225.259993] docker0: port 1(veth5b5e142) entered blocking state
[1368225.260001] docker0: port 1(veth5b5e142) entered forwarding state
[1369658.733384] docker0: port 1(veth5b5e142) entered disabled state
[1369658.733832] veth597ddbc: renamed from eth0
[1369658.798899] docker0: port 1(veth5b5e142) entered disabled state
[1369658.804224] device veth5b5e142 left promiscuous mode
[1369658.804237] docker0: port 1(veth5b5e142) entered disabled state
[1386007.741889] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386007.859988] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386007.866605] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386007.882893] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386007.897352] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386007.900982] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386007.904617] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386010.700459] NVRM: nvAssertFailedNoLog: Assertion failed: !rmapiLockIsOwner() @ rmapi.c:563
[1386192.011811] audit: type=1400 audit(1733754187.826:123): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-ee2adc98-9553-4acd-87ae-af7333e4d33e" pid=2458422 comm="apparmor_parser"
[1386192.097772] audit: type=1400 audit(1733754187.910:124): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-ee2adc98-9553-4acd-87ae-af7333e4d33e" pid=2458433 comm="apparmor_parser"
[1386192.191233] audit: type=1400 audit(1733754188.006:125): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-85ad0e21-9acb-4d02-8da8-946bb1018810" pid=2458435 comm="apparmor_parser"
[1386193.743060] audit: type=1400 audit(1733754189.554:126): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-85ad0e21-9acb-4d02-8da8-946bb1018810" pid=2458642 comm="apparmor_parser"
[1386277.640867] incusbr0: port 1(tap712f59eb) entered disabled state
[1386277.826112] device tap712f59eb left promiscuous mode
[1386277.826244] incusbr0: port 1(tap712f59eb) entered disabled state
[1386278.221491] audit: type=1400 audit(1733754274.032:127): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-first_</var/lib/incus>" pid=2459091 comm="apparmor_parser"
[1386294.574007] physnx9lQf: renamed from eth0
[1386294.610212] incusbr0: port 2(vethbcc78fe4) entered disabled state
[1386294.615659] vethb6a527b7: renamed from physnx9lQf
[1386294.744093] device vethbcc78fe4 left promiscuous mode
[1386294.744218] incusbr0: port 2(vethbcc78fe4) entered disabled state
[1386295.626368] audit: type=1400 audit(1733754291.436:128): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-snd_</var/lib/incus>" pid=2459299 comm="apparmor_parser"
[1386341.778042] audit: type=1400 audit(1733754337.587:129): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_archive-864c6116-5de9-4a69-96bf-8191edb32e45" pid=2459577 comm="apparmor_parser"
[1386341.865086] audit: type=1400 audit(1733754337.675:130): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_archive-864c6116-5de9-4a69-96bf-8191edb32e45" pid=2459582 comm="apparmor_parser"
[1386341.917246] audit: type=1400 audit(1733754337.727:131): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459584 comm="apparmor_parser"
[1386341.942098] audit: type=1400 audit(1733754337.751:132): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" name="/sys/devices/system/node/" pid=2459585 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1386341.997330] audit: type=1400 audit(1733754337.807:133): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459589 comm="apparmor_parser"
[1386342.051581] audit: type=1400 audit(1733754337.863:134): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459591 comm="apparmor_parser"
[1386342.080253] audit: type=1400 audit(1733754337.891:135): apparmor="DENIED" operation="open" profile="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" name="/sys/devices/system/node/" pid=2459592 comm="qemu-img" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[1386345.533249] audit: type=1400 audit(1733754341.343:136): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_qemu-img-<var-lib-incus-images-4d495423f46b7aee78a503e59d1c6760a17561fe059b48a28af22708f6ea6b8e.rootfs>_<var-lib-incus-storage-pools-default-virtual-machines-gpu-test-root.img>" pid=2459626 comm="apparmor_parser"
[1386388.843086] audit: type=1400 audit(1733754384.650:137): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_rsync-7ccae3ba-5834-4e7e-8a2c-031231921851" pid=2459795 comm="apparmor_parser"
[1386388.943847] audit: type=1400 audit(1733754384.754:138): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_rsync-7ccae3ba-5834-4e7e-8a2c-031231921851" pid=2459801 comm="apparmor_parser"
[1445740.835025] perf: interrupt took too long (7842 > 7841), lowering kernel.perf_event_max_sample_rate to 25500
[1631136.391676] incusbr0: port 1(tapc58f47cb) entered blocking state
[1631136.391685] incusbr0: port 1(tapc58f47cb) entered disabled state
[1631136.391887] device tapc58f47cb entered promiscuous mode
[1631136.853405] NVRM: Attempting to remove device 0000:18:00.0 with non-zero usage count!
[1631263.770556] audit: type=1400 audit(1733999254.140:139): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_rsync-894fc324-55d8-4898-a24b-d5aed7cb136a" pid=2850906 comm="apparmor_parser"
[1631263.864832] audit: type=1400 audit(1733999254.232:140): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_rsync-894fc324-55d8-4898-a24b-d5aed7cb136a" pid=2850912 comm="apparmor_parser"
[1631275.020559] incusbr0: port 2(tapc2b5370f) entered blocking state
[1631275.020568] incusbr0: port 2(tapc2b5370f) entered disabled state
[1631275.020826] device tapc2b5370f entered promiscuous mode
[1631277.078610] audit: type=1400 audit(1733999267.448:141): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus-gpu-test1_</var/lib/incus>" pid=2851021 comm="apparmor_parser"
[1631277.285265] incusbr0: port 2(tapc2b5370f) entered blocking state
[1631277.285275] incusbr0: port 2(tapc2b5370f) entered forwarding state
[1631304.760142] incusbr0: port 2(tapc2b5370f) entered disabled state
[1631304.915689] device tapc2b5370f left promiscuous mode
[1631304.915777] incusbr0: port 2(tapc2b5370f) entered disabled state
[1631305.127810] audit: type=1400 audit(1733999295.496:142): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-gpu-test1_</var/lib/incus>" pid=2851286 comm="apparmor_parser"
[1647814.253191] systemd[1]: systemd 249.11-0ubuntu3.12 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[1647814.275677] systemd[1]: Detected architecture x86-64.
[1647814.397068] systemd[1]: Configuration file /run/systemd/system/netplan-ovs-cleanup.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[1647814.411005] systemd[1]: /lib/systemd/system/snapd.service:23: Unknown key name 'RestartMode' in section 'Service', ignoring.
[1647814.579348] systemd[1]: Stopping Journal Service...
[1647814.579403] systemd-journald[2450]: Received SIGTERM from PID 1 (systemd).
[1647814.620595] systemd[1]: systemd-journald.service: Deactivated successfully.
[1647814.621105] systemd[1]: Stopped Journal Service.
[1647814.621148] systemd[1]: systemd-journald.service: Consumed 4min 27.162s CPU time.
[1647814.629740] systemd[1]: Starting Journal Service...
[1647814.666647] systemd[1]: Started Journal Service.
Maybe these NVRM assertions are the problem?
mrakgr
(Marko Grdinić)
December 12, 2024, 5:33pm
14
I also asked Perplexity about this earlier in the day and it told me: https://www.perplexity.ai/search/how-can-i-launch-an-incus-vm-w-7I_yY361Tgm8V2fZMa9DIg#1
I won’t lie, at that point I just decided to learn Ansible than mess with IOMMU Groups and Driver Blacklisting like the chatbot suggested.
stgraber
(Stéphane Graber)
December 12, 2024, 5:45pm
15
That’s possibly the problem. It looks like those GPUs are currently in use by something.
Does incus start
ever complete/fail or does it just get stuck indefinitely.
If stuck indefinitely, you may want to look for ps fauxww | grep qemu
to see if QEMU is stuck on the kernel.
And yeah, for GPU passthrough, you’ll need clean IOMMU groups, so that means making sure that you have the system booted with something like iommu=pt
, that those GPUs aren’t in use on the system, …
You can usually validate that by looking at incus info --resources