nvidia-container-cli info can show GPU correctly. I also installed
lxc 4.0.5 to get the nvidia hook (
However, when I start a container with
nvidia.runtime: true, it fails and shows an error of (output from
lxc info --show-log nvtest)
ERROR conf - conf.c:run_buffer:324 - Script exited with status 1 ERROR conf - conf.c:lxc_setup:3374 - Failed to run mount hooks ERROR start - start.c:do_start:1218 - Failed to setup container "nvtest" ERROR sync - sync.c:__sync_wait:36 - An error occurred in another process (expected sequence number 5) WARN network - network.c:lxc_delete_network_priv:3185 - Failed to rename interface with index 3 from "eth0" to its initial name "vethe1e468a6" ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:860 - Received container state "ABORTING" instead of "RUNNING" ERROR start - start.c:__lxc_start:1999 - Failed to spawn container "nvtest" WARN start - start.c:lxc_abort:1018 - No such process - Failed to send SIGKILL to 45644
I’ve confirmed that by setting
nvidia.runtime to false, the container can start. I also tried to run
/usr/share/lxc/hooks/nvidia directly under bash, and the exit code is 0.
Any chance to get further details on what happened to the hook?
The container only has one nic device. Other config are all defaults. (Except
nvidia.runtime of course )
lxc info --resources and
nvidia-smi all can correctly show the GPU. And a small pytorch example can also run on the host.