OCI container startup failure

I tried the OCI for mysql, but got this error. Any ideas?

lijunle@server-debian:~$ incus start mysql
Error: Failed to run: /opt/incus/bin/incusd forkstart mysql /var/lib/incus/containers /run/incus/mysql/lxc.conf: exit status 1
Try `incus info --show-log mysql` for more info

lijunle@server-debian:~$ incus info --show-log mysql
Name: mysql
Status: STOPPED
Type: container (application)
Architecture: x86_64
Created: 2024/07/12 09:52 PDT
Last Used: 2024/07/12 10:02 PDT

Log:

lxc mysql 20240712170211.284 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc mysql 20240712170211.284 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3940 - Failed to run mount hooks
lxc mysql 20240712170211.284 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "mysql"
lxc mysql 20240712170211.284 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc mysql 20240712170211.302 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "vethd13099d5"
lxc mysql 20240712170211.303 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc mysql 20240712170211.303 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "mysql"
lxc mysql 20240712170211.303 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 18 for process 5815
lxc 20240712170211.495 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240712170211.495 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lijunle@server-debian:~$ uname -a
Linux server-debian 6.7.12+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.7.12-1~bpo12+1 (2024-05-06) x86_64 GNU/Linux

I moved your post into its own topic so we don’t have a debug session in the release post :slight_smile:

Can you show incus info and incus config show --expanded mysql?

That should make it easier for me to try to reproduce things.
The fact that it’s a hook failing makes it likely to be something wrong with the built-in DHCP client.

lijunle@server-debian:~$ incus info
config:
  core.https_address: :8443
  oidc.audience: (Removed)
  oidc.client.id: (Removed)
  oidc.issuer: (Removed)
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
- oidc
auth_user_name: lijunle
auth_user_method: unix
environment:
  addresses:
  - 192.168.135.88:8443
  - '[fd42:fdaf:9d38:8908::1]:8443'
  - (Some of these entries are removed)
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    (Removed)
    -----END CERTIFICATE-----
  certificate_fingerprint: (Removed)
  driver: lxc | qemu
  driver_version: 6.0.1 | 9.0.1
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.7.12+bpo-amd64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "12"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: server-debian
  server_pid: 1567
  server_version: "6.3"
  storage: btrfs
  storage_version: "6.2"
  storage_supported_drivers:
  - name: btrfs
    version: "6.2"
    remote: false
  - name: dir
    version: "1"
    remote: false
lijunle@server-debian:~$ incus config show --expanded mysql
architecture: x86_64
config:
  environment.GOSU_VERSION: "1.17"
  environment.HOME: /root
  environment.MYSQL_DATABASE: wordpress
  environment.MYSQL_MAJOR: innovation
  environment.MYSQL_PASSWORD: wordpress
  environment.MYSQL_RANDOM_ROOT_PASSWORD: "1"
  environment.MYSQL_SHELL_VERSION: 9.0.0-1.el9
  environment.MYSQL_USER: wordpress
  environment.MYSQL_VERSION: 9.0.0-1.el9
  environment.PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  environment.TERM: xterm
  image.architecture: x86_64
  image.description: docker.io/library/mysql (OCI)
  image.type: oci
  nvidia.runtime: "true"
  security.idmap.isolated: "true"
  snapshots.expiry: 1m
  snapshots.pattern: snapshot_{{creation_date|date:'2006-01-02'}}
  snapshots.schedule: '@daily'
  volatile.base_image: 72a37ddc9f839cfd84f1f6815fb31ba26f37f4c200b90e49607797480e3be446
  volatile.cloud-init.instance-id: 002f22d3-5124-4977-ba55-97bf690b634a
  volatile.container.oci: "true"
  volatile.eth0.hwaddr: 00:16:3e:da:16:a1
  volatile.eth0.name: eth0
  volatile.idmap.current: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 83efeec2-aa3e-4afb-a670-83e37b2a33a2
  volatile.uuid.generation: 83efeec2-aa3e-4afb-a670-83e37b2a33a2
devices:
  eth0:
    network: incusbr0
    type: nic
  gpu:
    type: gpu
  root:
    path: /
    pool: ssd
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Override nvidia.runtime to be false is solving the issue.

Ah, yeah, I guess that makes sense. The NVIDIA startup hook likely makes a lot of assumption about what the container has available as far as paths and tools, something that a lot of OCI containers are likely to be missing.

I could image there will be a topic about how to enable CUDA in OCI container soon. :wink:

Yeah, I’m hoping that the containers which may need it are going to be based on a more complete base image and so will hopefully just work. If not, we’re going to need to add some debugging to the nvidia hook to see exactly what it’s missing :slight_smile:

Any update about this? I’m facing the same issue trying to run one of the docker containers from https://nvcr.io/nvidia.

Do you have instructions on what OCI repository you added and what image you used from that?
Also, can you confirm that nvidia.runtime=true works properly for images:ubuntu/24.04 or similar?

Do you have instructions on what OCI repository you added and what image you used from that?

I tried the following…

sudo incus remote add nvcr https://nvcr.io/nvidia --protocol=oci
sudo incus create nvcr:pytorch:21.03-py3 pytorch-test
sudo incus config set pytorch-test nvidia.runtime=true
sudo incus start pytorch-test --console

…which failed.

I also tried the official ubuntu docker image:

sudo incus remote add docker https://docker.io --protocol=oci
sudo incus create docker:ubuntu:24.04 ubuntu-oci-test
sudo incus config set ubuntu-oci-test nvidia.runtime=true
sudo incus start ubuntu-oci-test --console

…which similarly failed.

The log doesn’t provide much of anything useful, I guess.

Log:

lxc ubuntu-oci-test 20250110021904.836 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc ubuntu-oci-test 20250110021904.836 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3940 - Failed to run mount hooks
lxc ubuntu-oci-test 20250110021904.836 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "ubuntu-oci-test"
lxc ubuntu-oci-test 20250110021904.836 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc ubuntu-oci-test 20250110021904.841 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth860bf867"
lxc ubuntu-oci-test 20250110021904.841 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc ubuntu-oci-test 20250110021904.841 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "ubuntu-oci-test"
lxc ubuntu-oci-test 20250110021904.841 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 1070264
lxc 20250110021904.926 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response

Both run when I set nvidia.runtime=false, though the former is vocal about “WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.”.

Also, can you confirm that nvidia.runtime=true works properly for images:ubuntu/24.04 or similar?

In contrast, my plain incus containers based on image “Ubuntu noble ubuntu/noble/default” (ubuntu/24.04) work well with nvidia.runtime=true (at least I can confirm that they are starting and using the gpu).

I’d be happy to experiment more if you have guidance about what to try next.

Okay, so yeah, it’s something with the OCI environment that’s tickling nvidia-container the wrong way.

Can you try setting raw.lxc=lxc.log.level=trace and then try to start again and get a new incus info --show-log output? With a bit of luck that will capture the output of the nvidia-container hook.

The trace is below, with one line of relevance appearing to be:

lxc pytorch-test 20250110160537.569 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /opt/incus/share/lxc/hooks/nvidia produced output: ERROR: Missing tool nvidia-container-cli, see https://github.com/NVIDIA/libnvidia-container

I do have nvidia-container-tools (1.17.3+dfsg-0lambda0.22.04.1) installed on the host, but I take it there is more to it than that.

Full log:

Name: pytorch-test
Status: STOPPED
Type: container (application)
Architecture: x86_64
Location: lbd-vector03
Created: 2025/01/08 21:13 PST
Last Used: 2025/01/10 08:05 PST

Snapshots:
+-------+----------------------+----------------------+----------+
| NAME  |       TAKEN AT       |      EXPIRES AT      | STATEFUL |
+-------+----------------------+----------------------+----------+
| snap0 | 2025/01/09 15:50 PST | 2025/01/16 15:50 PST | NO       |
+-------+----------------------+----------------------+----------+

Log:

lxc pytorch-test 20250110160537.432 TRACE    commands - ../src/lxc/commands.c:lxc_cmd_timeout:525 - Connection refused - Command "get_state" failed to connect command socket
lxc pytorch-test 20250110160537.432 TRACE    start - ../src/lxc/start.c:lxc_init_handler:739 - Created anonymous pair {3,6} of unix sockets
lxc pytorch-test 20250110160537.432 TRACE    commands - ../src/lxc/commands.c:lxc_server_init:2138 - Created abstract unix socket "/var/lib/incus/containers/pytorch-test/command"
lxc pytorch-test 20250110160537.432 TRACE    start - ../src/lxc/start.c:lxc_init_handler:755 - Unix domain socket 8 for command server is ready
lxc pytorch-test 20250110160537.433 INFO     lxccontainer - ../src/lxc/lxccontainer.c:do_lxcapi_start:959 - Set process title to [lxc monitor] /var/lib/incus/containers pytorch-test
lxc pytorch-test 20250110160537.434 INFO     start - ../src/lxc/start.c:lxc_check_inherited:326 - Closed inherited fd 4
lxc pytorch-test 20250110160537.434 INFO     start - ../src/lxc/start.c:lxc_check_inherited:326 - Closed inherited fd 5
lxc pytorch-test 20250110160537.434 INFO     start - ../src/lxc/start.c:lxc_check_inherited:326 - Closed inherited fd 19
lxc pytorch-test 20250110160537.434 TRACE    execute - ../src/lxc/execute.c:lxc_execute:49 - Doing lxc_execute
lxc pytorch-test 20250110160537.434 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
lxc pytorch-test 20250110160537.434 TRACE    start - ../src/lxc/start.c:lxc_init:779 - Initialized LSM
lxc pytorch-test 20250110160537.434 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:484 - Set container state to STARTING
lxc pytorch-test 20250110160537.434 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:487 - No state clients registered
lxc pytorch-test 20250110160537.434 TRACE    start - ../src/lxc/start.c:lxc_init:785 - Set container state to "STARTING"
lxc pytorch-test 20250110160537.434 TRACE    start - ../src/lxc/start.c:lxc_init:841 - Set environment variables
lxc pytorch-test 20250110160537.434 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/proc/1562/exe callhook /var/lib/incus "default" "pytorch-test" start" for container "pytorch-test"
lxc pytorch-test 20250110160537.434 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=pre-start
lxc pytorch-test 20250110160537.434 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc pytorch-test 20250110160537.435 DEBUG    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:818 - First child 1097300 exited
lxc pytorch-test 20250110160537.469 TRACE    start - ../src/lxc/start.c:lxc_init:846 - Ran pre-start hooks
lxc pytorch-test 20250110160537.470 TRACE    start - ../src/lxc/start.c:setup_signal_fd:371 - Created signal file descriptor 5
lxc pytorch-test 20250110160537.470 TRACE    start - ../src/lxc/start.c:lxc_init:859 - Set up signal fd
lxc pytorch-test 20250110160537.470 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:unpriv_systemd_create_scope:1498 - Running privileged, not using a systemd unit
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:462 - Adding cgroup hierarchy mounted at  and base cgroup (null)
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the cpuset controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the cpu controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the io controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the memory controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the hugetlb controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the pids controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the rdma controller
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_hierarchy_add:465 - The hierarchy contains the misc controller
lxc pytorch-test 20250110160537.470 TRACE    cgroup2_devices - ../src/lxc/cgroups/cgroup2_devices.c:bpf_program_load_kernel:335 - Loaded bpf program: func#0 @0
0: R1=ctx() R10=fp0
0: (61) r2 = *(u32 *)(r1 +0)          ; R1=ctx() R2_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
1: (54) w2 &= 65535                   ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xffff,var_off=(0x0; 0xffff))
2: (61) r3 = *(u32 *)(r1 +0)          ; R1=ctx() R3_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
3: (74) w3 >>= 16                     ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xffff,var_off=(0x0; 0xffff))
4: (61) r4 = *(u32 *)(r1 +4)          ; R1=ctx() R4_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
5: (61) r5 = *(u32 *)(r1 +8)          ; R1=ctx() R5_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
6: (b7) r0 = 1                        ; R0_w=1
7: (95) exit
mark_precise: frame0: last_idx 7 first_idx 0 subseq_idx -1 
mark_precise: frame0: regs=r0 stack= before 6: (b7) r0 = 1
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

lxc pytorch-test 20250110160537.470 TRACE    cgroup2_devices - ../src/lxc/cgroups/cgroup2_devices.c:bpf_devices_cgroup_supported:553 - The bpf device cgroup is supported
lxc pytorch-test 20250110160537.470 TRACE    cgroup - ../src/lxc/cgroups/cgroup.c:cgroup_init:41 - Initialized cgroup driver cgfsng
lxc pytorch-test 20250110160537.470 TRACE    cgroup - ../src/lxc/cgroups/cgroup.c:cgroup_init:48 - Unified cgroup layout
lxc pytorch-test 20250110160537.470 TRACE    start - ../src/lxc/start.c:lxc_init:866 - Initialized cgroup driver
lxc pytorch-test 20250110160537.470 DEBUG    seccomp - ../src/lxc/seccomp.c:parse_config_v2:664 - Host native arch is [3221225534]
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:get_new_ctx:478 - Added arch 2 to main seccomp context
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:get_new_ctx:486 - Removed native arch from main seccomp context
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:get_new_ctx:478 - Added arch 3 to main seccomp context
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:get_new_ctx:486 - Removed native arch from main seccomp context
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:get_new_ctx:491 - Arch 4 already present in main seccomp context
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "[all]"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "[all]"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "kexec_load errno 38"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[246:kexec_load] action[327718:errno] arch[0]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741827]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741886]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "open_by_handle_at errno 38"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[304:open_by_handle_at] action[327718:errno] arch[0]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741827]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741886]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "init_module errno 38"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[175:init_module] action[327718:errno] arch[0]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741827]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741886]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "finit_module errno 38"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[313:finit_module] action[327718:errno] arch[0]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741827]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741886]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "delete_module errno 38"
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[176:delete_module] action[327718:errno] arch[0]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741827]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741886]
lxc pytorch-test 20250110160537.470 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:1036 - Merging compat seccomp contexts into main context
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:parse_config_v2:1046 - Merged first compat seccomp context into main context
lxc pytorch-test 20250110160537.470 TRACE    seccomp - ../src/lxc/seccomp.c:parse_config_v2:1062 - Merged second compat seccomp context into main context
lxc pytorch-test 20250110160537.470 TRACE    start - ../src/lxc/start.c:lxc_init:873 - Read seccomp policy
lxc pytorch-test 20250110160537.470 TRACE    start - ../src/lxc/start.c:lxc_init:880 - Initialized LSM
lxc pytorch-test 20250110160537.470 INFO     start - ../src/lxc/start.c:lxc_init:882 - Container "pytorch-test" is initialized
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:726 - Created 10(lxc.monitor.pytorch-test) cgroup
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:741 - Opened newly created cgroup lxc.monitor.pytorch-test as 11
lxc pytorch-test 20250110160537.470 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1669 - The monitor process uses "lxc.monitor.pytorch-test" as cgroup
lxc pytorch-test 20250110160537.470 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgfsng_delegate_controllers:3620 - Enabled "+cpuset +cpu +io +memory +hugetlb +pids +rdma +misc" controllers in the unified cgroup 10
lxc pytorch-test 20250110160537.493 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_enter:1819 - Moved monitor (1097301) into cgroup 11
lxc pytorch-test 20250110160537.493 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_enter:1833 - Moved transient process into cgroup 11
lxc pytorch-test 20250110160537.493 DEBUG    storage - ../src/lxc/storage/storage.c:get_storage_by_name:209 - Detected rootfs type "dir"
lxc pytorch-test 20250110160537.493 TRACE    conf - ../src/lxc/conf.c:lxc_rootfs_init:361 - Not pinning because container runs in user namespace
lxc pytorch-test 20250110160537.493 DEBUG    storage - ../src/lxc/storage/storage.c:get_storage_by_name:209 - Detected rootfs type "dir"
lxc pytorch-test 20250110160537.493 TRACE    sync - ../src/lxc/sync.c:lxc_sync_init:139 - Initialized synchronization infrastructure
lxc pytorch-test 20250110160537.494 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:726 - Created 10(lxc.payload.pytorch-test) cgroup
lxc pytorch-test 20250110160537.494 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:741 - Opened newly created cgroup lxc.payload.pytorch-test as 16
lxc pytorch-test 20250110160537.494 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1777 - The container process uses "lxc.payload.pytorch-test" as inner and "lxc.payload.pytorch-test" as limit cgroup
lxc pytorch-test 20250110160537.495 TRACE    start - ../src/lxc/start.c:lxc_spawn:1709 - Spawned container directly into target cgroup via cgroup2 fd 16
lxc pytorch-test 20250110160537.495 TRACE    start - ../src/lxc/start.c:lxc_spawn:1749 - Cloned child process 1097319
lxc pytorch-test 20250110160537.495 TRACE    start - ../src/lxc/start.c:core_scheduling:1589 - Created new core scheduling domain with cookie 3565788268
lxc pytorch-test 20250110160537.495 TRACE    utils - ../src/lxc/utils.c:lxc_can_use_pidfd:1931 - Kernel supports pidfds
lxc pytorch-test 20250110160537.495 INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUSER
lxc pytorch-test 20250110160537.495 INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWNS
lxc pytorch-test 20250110160537.495 INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWPID
lxc pytorch-test 20250110160537.495 INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUTS
lxc pytorch-test 20250110160537.495 INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWIPC
lxc pytorch-test 20250110160537.495 INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWCGROUP
lxc pytorch-test 20250110160537.495 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved user namespace via fd 18 and stashed path as user:/proc/1097301/fd/18
lxc pytorch-test 20250110160537.495 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved mnt namespace via fd 19 and stashed path as mnt:/proc/1097301/fd/19
lxc pytorch-test 20250110160537.495 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved pid namespace via fd 20 and stashed path as pid:/proc/1097301/fd/20
lxc pytorch-test 20250110160537.495 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved uts namespace via fd 21 and stashed path as uts:/proc/1097301/fd/21
lxc pytorch-test 20250110160537.495 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved ipc namespace via fd 22 and stashed path as ipc:/proc/1097301/fd/22
lxc pytorch-test 20250110160537.495 TRACE    start - ../src/lxc/start.c:lxc_spawn:1709 - Spawned container directly into target cgroup via cgroup2 fd 16
lxc pytorch-test 20250110160537.495 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved cgroup namespace via fd 23 and stashed path as cgroup:/proc/1097301/fd/23
lxc pytorch-test 20250110160537.495 INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc pytorch-test 20250110160537.495 INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc pytorch-test 20250110160537.495 DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:186 - No newuidmap and newgidmap binary found. Trying to write directly with euid 0
lxc pytorch-test 20250110160537.495 TRACE    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:251 - Wrote mapping "0 1000000 1000000000
"
lxc pytorch-test 20250110160537.495 TRACE    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:251 - Wrote mapping "0 1000000 1000000000
"
lxc pytorch-test 20250110160537.495 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wait_parent:110 - Child waiting for parent with sequence startup
lxc pytorch-test 20250110160537.495 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgfsng_delegate_controllers:3620 - Enabled "+cpuset +cpu +io +memory +hugetlb +pids +rdma +misc" controllers in the unified cgroup 10
lxc pytorch-test 20250110160537.495 TRACE    conf - ../src/lxc/conf.c:get_minimal_idmap:4476 - Allocated minimal idmapping for ns uid 0 and ns gid 0
lxc pytorch-test 20250110160537.496 TRACE    conf - ../src/lxc/conf.c:userns_exec_1:4540 - Establishing uid mapping for "1097320" in new user namespace: nsuid 1000000000 - hostid 0 - range 1
lxc pytorch-test 20250110160537.496 TRACE    conf - ../src/lxc/conf.c:userns_exec_1:4540 - Establishing uid mapping for "1097320" in new user namespace: nsuid 0 - hostid 1000000 - range 1000000000
lxc pytorch-test 20250110160537.496 TRACE    conf - ../src/lxc/conf.c:userns_exec_1:4540 - Establishing gid mapping for "1097320" in new user namespace: nsuid 1000000000 - hostid 0 - range 1
lxc pytorch-test 20250110160537.496 TRACE    conf - ../src/lxc/conf.c:userns_exec_1:4540 - Establishing gid mapping for "1097320" in new user namespace: nsuid 0 - hostid 1000000 - range 1000000000
lxc pytorch-test 20250110160537.496 INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc pytorch-test 20250110160537.496 INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc pytorch-test 20250110160537.496 INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:176 - Caller maps host root. Writing mapping directly
lxc pytorch-test 20250110160537.496 TRACE    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:251 - Wrote mapping "1000000000 0 1
0 1000000 1000000000
"
lxc pytorch-test 20250110160537.496 TRACE    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:251 - Wrote mapping "1000000000 0 1
0 1000000 1000000000
"
lxc pytorch-test 20250110160537.496 TRACE    conf - ../src/lxc/conf.c:run_userns_fn:4412 - Calling function "chown_cgroup_wrapper"
lxc pytorch-test 20250110160537.496 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1477 - Dropped supplimentary groups
lxc pytorch-test 20250110160537.497 TRACE    sync - ../src/lxc/sync.c:lxc_sync_barrier_child:97 - Parent waking child with sequence startup and waiting with sequence configure
lxc pytorch-test 20250110160537.497 INFO     start - ../src/lxc/start.c:do_start:1105 - Unshared CLONE_NEWNET
lxc pytorch-test 20250110160537.497 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1477 - Dropped supplimentary groups
lxc pytorch-test 20250110160537.497 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1453 - Switched to gid 0
lxc pytorch-test 20250110160537.497 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1462 - Switched to uid 0
lxc pytorch-test 20250110160537.497 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wake_parent:104 - Child waking parent with sequence configure
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUBLAS_VERSION=11.4.1.1026
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUDA_VERSION=11.2.1.007
lxc pytorch-test 20250110160537.497 DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved net namespace via fd 4 and stashed path as net:/proc/1097301/fd/4
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NSIGHT_SYSTEMS_VERSION=2020.4.3.7
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: HOME=/root
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: OPENUCX_VERSION=1.9.0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: LD_LIBRARY_PATH=/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_REQUIRE_CUDA=cuda>=9.0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NCCL_VERSION=2.8.4
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_PYTORCH_VERSION=21.03
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: DALI_BUILD=2054952
lxc pytorch-test 20250110160537.497 TRACE    start - ../src/lxc/start.c:lxc_spawn:1841 - Allocated new network namespace id
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: PYTORCH_VERSION=1.9.0a0+df837d0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUSOLVER_VERSION=11.1.0.135
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: PYTORCH_BUILD_NUMBER=0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: TRT_VERSION=7.2.2.3+cuda11.1.0.024
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: DLPROF_VERSION=21.03
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: BASH_ENV=/etc/bash.bashrc
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NPP_VERSION=11.3.2.139
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: OPENMPI_VERSION=4.0.5
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUDNN_VERSION=8.1.1.33
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVJPEG_VERSION=11.4.0.135
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: TRTOSS_VERSION=21.03
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUDA_DRIVER_VERSION=460.32.03
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CURAND_VERSION=10.2.3.135
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: COCOAPI_VERSION=2.0+nv0.4.0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: MOFED_VERSION=5.1-2.3.7
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVM_DIR=/usr/local/nvm
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: PATH=/opt/conda/bin:/opt/cmake-3.14.6-Linux-x86_64/bin/:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: DALI_VERSION=0.31.0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NSIGHT_COMPUTE_VERSION=2020.3.1.3
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: PYTORCH_BUILD_VERSION=1.9.0a0+df837d0
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: _CUDA_COMPAT_PATH=/usr/local/cuda/compat
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUSPARSE_VERSION=11.4.0.135
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_BUILD_ID=21060478
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUDA_CACHE_DISABLE=1
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: TERM=xterm
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: PYTHONIOENCODING=utf-8
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: JUPYTER_PORT=8888
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_VISIBLE_DEVICES=all
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: LC_ALL=C.UTF-8
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: CUFFT_VERSION=10.4.0.135
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: ENV=/etc/shinit_v2
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: TENSORBOARD_PORT=6006
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_VISIBLE_DEVICES=none
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_DRIVER_CAPABILITIES=compute,utility
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_REQUIRE_CUDA=
lxc pytorch-test 20250110160537.497 TRACE    conf - ../src/lxc/conf.c:lxc_set_environment:5231 - Set environment variable: NVIDIA_REQUIRE_DRIVER=
lxc pytorch-test 20250110160537.497 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wait_parent:110 - Child waiting for parent with sequence post-configure
lxc pytorch-test 20250110160537.497 DEBUG    network - ../src/lxc/network.c:netdev_configure_server_phys:1250 - Instantiated phys "vethfda708be" with ifindex "99"
lxc pytorch-test 20250110160537.500 TRACE    network - ../src/lxc/network.c:create_transient_name:3542 - Created transient name physD5EH2j for network device
lxc pytorch-test 20250110160537.525 DEBUG    network - ../src/lxc/network.c:lxc_network_move_created_netdev_priv:3593 - Moved network device "vethfda708be" with ifindex 99 to network namespace of 1097319 and renamed to physD5EH2j
lxc pytorch-test 20250110160537.525 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wake_child:122 - Parent waking child with sequence post-configure
lxc pytorch-test 20250110160537.525 DEBUG    storage - ../src/lxc/storage/storage.c:get_storage_by_name:209 - Detected rootfs type "dir"
lxc pytorch-test 20250110160537.525 TRACE    mount_utils - ../src/lxc/mount_utils.c:can_use_mount_api:582 - Kernel supports mount api
lxc pytorch-test 20250110160537.525 TRACE    mount_utils - ../src/lxc/mount_utils.c:can_use_bind_mounts:607 - Kernel supports bind mounts in the new mount api
lxc pytorch-test 20250110160537.525 TRACE    mount_utils - ../src/lxc/mount_utils.c:create_detached_idmapped_mount:286 - Idmapped mount "/var/lib/incus/storage-pools/local/containers/pytorch-test/rootfs" requested with user namespace fd 12
lxc pytorch-test 20250110160537.525 TRACE    conf - ../src/lxc/conf.c:lxc_rootfs_prepare_parent:458 - Created detached idmapped mount 24
lxc pytorch-test 20250110160537.525 TRACE    network - ../src/lxc/network.c:lxc_network_send_to_child:4105 - Sent network device name "physD5EH2j" to child
lxc pytorch-test 20250110160537.525 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wait_child:116 - Parent waiting for child with sequence idmapped-mounts
lxc pytorch-test 20250110160537.525 TRACE    conf - ../src/lxc/conf.c:lxc_rootfs_prepare_child:3634 - Received detached idmapped mount 17
lxc pytorch-test 20250110160537.526 TRACE    conf - ../src/lxc/conf.c:turn_into_dependent_mounts:3455 - Turned all mount table entries into dependent mount
lxc pytorch-test 20250110160537.526 TRACE    mount_utils - ../src/lxc/mount_utils.c:can_use_mount_api:582 - Kernel supports mount api
lxc pytorch-test 20250110160537.526 TRACE    mount_utils - ../src/lxc/mount_utils.c:can_use_bind_mounts:607 - Kernel supports bind mounts in the new mount api
lxc pytorch-test 20250110160537.526 TRACE    mount_utils - ../src/lxc/mount_utils.c:move_detached_mount:328 - Attach detached mount 17 to filesystem at 19
lxc pytorch-test 20250110160537.526 TRACE    dir - ../src/lxc/storage/dir.c:dir_mount:197 - Mounted "/var/lib/incus/storage-pools/local/containers/pytorch-test/rootfs" onto "/opt/incus/lib/lxc/rootfs"
lxc pytorch-test 20250110160537.526 DEBUG    conf - ../src/lxc/conf.c:lxc_mount_rootfs:1240 - Mounted rootfs "/var/lib/incus/storage-pools/local/containers/pytorch-test/rootfs" onto "/opt/incus/lib/lxc/rootfs" with options "idmap=container"
lxc pytorch-test 20250110160537.526 TRACE    conf - ../src/lxc/conf.c:lxc_mount_rootfs:1248 - Container uses separate rootfs. Opened container's rootfs
lxc pytorch-test 20250110160537.526 INFO     conf - ../src/lxc/conf.c:setup_utsname:679 - Set hostname to "pytorch-test"
lxc pytorch-test 20250110160537.526 TRACE    network - ../src/lxc/network.c:lxc_network_recv_from_parent:4130 - Received network device name "physD5EH2j" from parent
lxc pytorch-test 20250110160537.538 TRACE    network - ../src/lxc/network.c:__netdev_configure_container_common:1320 - Renamed network device from "physD5EH2j" to "eth0"
lxc pytorch-test 20250110160537.538 DEBUG    network - ../src/lxc/network.c:setup_hw_addr:3866 - Mac address "00:16:3e:e1:41:9b" on "eth0" has been setup
lxc pytorch-test 20250110160537.538 DEBUG    network - ../src/lxc/network.c:lxc_network_setup_in_child_namespaces_common:4007 - Network device "eth0" has been setup
lxc pytorch-test 20250110160537.538 INFO     network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4064 - Finished setting up network devices with caller assigned names
lxc pytorch-test 20250110160537.538 INFO     conf - ../src/lxc/conf.c:mount_autodev:1023 - Preparing "/dev"
lxc pytorch-test 20250110160537.538 TRACE    mount_utils - ../src/lxc/mount_utils.c:__fs_prepare:177 - Finished initializing new tmpfs filesystem context 20
lxc pytorch-test 20250110160537.538 TRACE    mount_utils - ../src/lxc/mount_utils.c:fs_set_property:215 - Set "mode" to "0755" on filesystem context 20
lxc pytorch-test 20250110160537.539 TRACE    mount_utils - ../src/lxc/mount_utils.c:fs_set_property:215 - Set "size" to "500000" on filesystem context 20
lxc pytorch-test 20250110160537.539 TRACE    mount_utils - ../src/lxc/mount_utils.c:fs_attach:266 - Mounted 22 onto 21
lxc pytorch-test 20250110160537.539 INFO     conf - ../src/lxc/conf.c:mount_autodev:1084 - Prepared "/dev"
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:539 - Invalid argument - Tried to ensure procfs is unmounted
lxc pytorch-test 20250110160537.539 TRACE    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:546 - Created procfs mountpoint under 19
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:562 - Invalid argument - Tried to ensure sysfs is unmounted
lxc pytorch-test 20250110160537.539 TRACE    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:569 - Created sysfs mountpoint under 19
lxc pytorch-test 20250110160537.539 TRACE    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:623 - Mounted automount "proc" on "/opt/incus/lib/lxc/rootfs/proc" read-write with flags 14
lxc pytorch-test 20250110160537.539 TRACE    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:623 - Mounted automount "sysfs" on "/opt/incus/lib/lxc/rootfs/sys" read-write with flags 0
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/dev/fuse" on "/opt/incus/lib/lxc/rootfs/dev/fuse" to respect bind or remount options
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/dev/fuse" were 4098, required extra flags are 2
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/dev/fuse" on "/opt/incus/lib/lxc/rootfs/dev/fuse" with filesystem type "none"
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/dev/net/tun" on "/opt/incus/lib/lxc/rootfs/dev/net/tun" to respect bind or remount options
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/dev/net/tun" were 4098, required extra flags are 2
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/dev/net/tun" on "/opt/incus/lib/lxc/rootfs/dev/net/tun" with filesystem type "none"
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/firmware/efi/efivars" on "/opt/incus/lib/lxc/rootfs/sys/firmware/efi/efivars" to respect bind or remount options
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/firmware/efi/efivars" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/firmware/efi/efivars" on "/opt/incus/lib/lxc/rootfs/sys/firmware/efi/efivars" with filesystem type "none"
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/fs/fuse/connections" on "/opt/incus/lib/lxc/rootfs/sys/fs/fuse/connections" to respect bind or remount options
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/fs/fuse/connections" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/fs/fuse/connections" on "/opt/incus/lib/lxc/rootfs/sys/fs/fuse/connections" with filesystem type "none"
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/fs/pstore" on "/opt/incus/lib/lxc/rootfs/sys/fs/pstore" to respect bind or remount options
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/fs/pstore" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.539 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/fs/pstore" on "/opt/incus/lib/lxc/rootfs/sys/fs/pstore" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/kernel/config" on "/opt/incus/lib/lxc/rootfs/sys/kernel/config" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/kernel/config" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/kernel/config" on "/opt/incus/lib/lxc/rootfs/sys/kernel/config" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/kernel/debug" on "/opt/incus/lib/lxc/rootfs/sys/kernel/debug" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/kernel/debug" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/kernel/debug" on "/opt/incus/lib/lxc/rootfs/sys/kernel/debug" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/kernel/security" on "/opt/incus/lib/lxc/rootfs/sys/kernel/security" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/kernel/security" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/kernel/security" on "/opt/incus/lib/lxc/rootfs/sys/kernel/security" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/kernel/tracing" on "/opt/incus/lib/lxc/rootfs/sys/kernel/tracing" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/kernel/tracing" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/kernel/tracing" on "/opt/incus/lib/lxc/rootfs/sys/kernel/tracing" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/dev/mqueue" on "/opt/incus/lib/lxc/rootfs/dev/mqueue" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/dev/mqueue" were 4110, required extra flags are 14
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/dev/mqueue" on "/opt/incus/lib/lxc/rootfs/dev/mqueue" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/var/lib/incus/guestapi" on "/opt/incus/lib/lxc/rootfs/dev/incus" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/var/lib/incus/guestapi" were 4096, required extra flags are 0
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2247 - Mountflags already were 4096, skipping remount
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/var/lib/incus/guestapi" on "/opt/incus/lib/lxc/rootfs/dev/incus" with filesystem type "none"
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:parse_vfs_attr:2090 - Raising nosuid
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:parse_vfs_attr:2090 - Raising noexec
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:parse_vfs_attr:2090 - Raising nodev
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "shm" on "/opt/incus/lib/lxc/rootfs/dev/shm" with filesystem type "tmpfs"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "none" on "/opt/incus/lib/lxc/rootfs/run" with filesystem type "tmpfs"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/var/lib/incus/containers/pytorch-test/network/hosts" on "/opt/incus/lib/lxc/rootfs/etc/hosts" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/var/lib/incus/containers/pytorch-test/network/hosts" were 4096, required extra flags are 0
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2247 - Mountflags already were 4096, skipping remount
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/var/lib/incus/containers/pytorch-test/network/hosts" on "/opt/incus/lib/lxc/rootfs/etc/hosts" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/var/lib/incus/containers/pytorch-test/network/hostname" on "/opt/incus/lib/lxc/rootfs/etc/hostname" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/var/lib/incus/containers/pytorch-test/network/hostname" were 4096, required extra flags are 0
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2247 - Mountflags already were 4096, skipping remount
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/var/lib/incus/containers/pytorch-test/network/hostname" on "/opt/incus/lib/lxc/rootfs/etc/hostname" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/var/lib/incus/containers/pytorch-test/network/resolv.conf" on "/opt/incus/lib/lxc/rootfs/etc/resolv.conf" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/var/lib/incus/containers/pytorch-test/network/resolv.conf" were 4096, required extra flags are 0
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2247 - Mountflags already were 4096, skipping remount
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/var/lib/incus/containers/pytorch-test/network/resolv.conf" on "/opt/incus/lib/lxc/rootfs/etc/resolv.conf" with filesystem type "none"
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/var/lib/incus/shmounts/pytorch-test" on "/opt/incus/lib/lxc/rootfs/dev/.incus-mounts" to respect bind or remount options
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/var/lib/incus/shmounts/pytorch-test" were 4096, required extra flags are 0
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2247 - Mountflags already were 4096, skipping remount
lxc pytorch-test 20250110160537.540 DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/var/lib/incus/shmounts/pytorch-test" on "/opt/incus/lib/lxc/rootfs/dev/.incus-mounts" with filesystem type "none"
lxc pytorch-test 20250110160537.540 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wake_parent:104 - Child waking parent with sequence idmapped-mounts
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:parse_vfs_attr:2090 - Raising nosuid
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:parse_vfs_attr:2090 - Raising noexec
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:parse_vfs_attr:2090 - Raising nodev
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:lxc_idmapped_mounts_child:2903 - Finished setting up idmapped mounts
lxc pytorch-test 20250110160537.540 TRACE    conf - ../src/lxc/conf.c:lxc_idmapped_mounts_parent:3655 - Finished receiving idmapped mount file descriptors (-9 | -9) from child
lxc pytorch-test 20250110160537.540 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_mount:2254 - Read-write cgroup mounts requested
lxc pytorch-test 20250110160537.540 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wait_child:116 - Parent waiting for child with sequence cgroup-limits
lxc pytorch-test 20250110160537.540 TRACE    mount_utils - ../src/lxc/mount_utils.c:__fs_prepare:177 - Finished initializing new cgroup2 filesystem context 22
lxc pytorch-test 20250110160537.540 TRACE    mount_utils - ../src/lxc/mount_utils.c:fs_attach:266 - Mounted 23 onto 21
lxc pytorch-test 20250110160537.540 DEBUG    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroupfs_mount:2187 - Mounted cgroup filesystem cgroup2 onto 21((null))
lxc pytorch-test 20250110160537.540 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_mount:2355 - Force mounted cgroup filesystem in new cgroup namespace
lxc pytorch-test 20250110160537.540 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/opt/incus/share/lxcfs/lxc.mount.hook" for container "pytorch-test"
lxc pytorch-test 20250110160537.540 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=mount
lxc pytorch-test 20250110160537.540 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc pytorch-test 20250110160537.566 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/opt/incus/share/lxc/hooks/nvidia" for container "pytorch-test"
lxc pytorch-test 20250110160537.566 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=mount
lxc pytorch-test 20250110160537.566 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc pytorch-test 20250110160537.569 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /opt/incus/share/lxc/hooks/nvidia produced output: ERROR: Missing tool nvidia-container-cli, see https://github.com/NVIDIA/libnvidia-container

lxc pytorch-test 20250110160537.569 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc pytorch-test 20250110160537.569 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3940 - Failed to run mount hooks
lxc pytorch-test 20250110160537.569 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "pytorch-test"
lxc pytorch-test 20250110160537.569 TRACE    sync - ../src/lxc/sync.c:lxc_sync_wake_parent:104 - Child waking parent with sequence error
lxc pytorch-test 20250110160537.569 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_USER_NS=/proc/1097301/fd/18
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_MNT_NS=/proc/1097301/fd/19
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_PID_NS=/proc/1097301/fd/20
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_UTS_NS=/proc/1097301/fd/21
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_IPC_NS=/proc/1097301/fd/22
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_NET_NS=/proc/1097301/fd/4
lxc pytorch-test 20250110160537.569 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_CGROUP_NS=/proc/1097301/fd/23
lxc pytorch-test 20250110160537.574 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "vethfda708be"
lxc pytorch-test 20250110160537.574 DEBUG    network - ../src/lxc/network.c:lxc_delete_network:4220 - Deleted network devices
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_socket_pair:545 - Sent container state "ABORTING" to 6
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:484 - Set container state to ABORTING
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:487 - No state clients registered
lxc pytorch-test 20250110160537.574 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc pytorch-test 20250110160537.574 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "pytorch-test"
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:484 - Set container state to ABORTING
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:487 - No state clients registered
lxc pytorch-test 20250110160537.574 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 1097319
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:484 - Set container state to STOPPING
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:487 - No state clients registered
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_USER_NS=/proc/1097301/fd/18
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_MNT_NS=/proc/1097301/fd/19
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_PID_NS=/proc/1097301/fd/20
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_UTS_NS=/proc/1097301/fd/21
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_IPC_NS=/proc/1097301/fd/22
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_NET_NS=/proc/1097301/fd/4
lxc pytorch-test 20250110160537.574 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_CGROUP_NS=/proc/1097301/fd/23
lxc pytorch-test 20250110160537.574 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/opt/incus/bin/incusd callhook /var/lib/incus "default" "pytorch-test" stopns" for container "pytorch-test"
lxc pytorch-test 20250110160537.574 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=stop
lxc pytorch-test 20250110160537.574 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc pytorch-test 20250110160537.656 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_tree_remove:491 - Removed cgroup tree 10(lxc.payload.pytorch-test)
lxc pytorch-test 20250110160537.656 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:726 - Reusing 10(lxc.pivot) cgroup
lxc pytorch-test 20250110160537.656 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:741 - Opened cgroup lxc.pivot as 3
lxc pytorch-test 20250110160537.668 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_destroy:927 - Removed cgroup tree 10(lxc.monitor.pytorch-test)
lxc pytorch-test 20250110160537.668 TRACE    start - ../src/lxc/start.c:lxc_end:964 - Closed command socket
lxc pytorch-test 20250110160537.668 TRACE    start - ../src/lxc/start.c:lxc_end:975 - Set container state to "STOPPED"
lxc 20250110160537.668 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250110160537.668 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc pytorch-test 20250110160537.668 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/opt/incus/share/lxcfs/lxc.reboot.hook" for container "pytorch-test"
lxc pytorch-test 20250110160537.668 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=post-stop
lxc pytorch-test 20250110160537.668 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc pytorch-test 20250110160538.172 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/opt/incus/bin/incusd callhook /var/lib/incus "default" "pytorch-test" stop" for container "pytorch-test"
lxc pytorch-test 20250110160538.172 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=post-stop
lxc pytorch-test 20250110160538.172 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc

I do have nvidia-container-tools (1.17.3+dfsg-0lambda0.22.04.1) installed on the host

Doh! That was the wrong cluster member that I was inspecting… problem solved. These containers happily have access to the nvidia GPUs now.