Hello everyone,
I have a server running Ubuntu 20.04 Server with PREEMPT-RT patch and a 10G SR-IOV Intel network card with two physical interfaces (Intel Corporation Ethernet Controller 10G X550T). I would like to provision multiple containers and assign a virtual function to each. To do that, I create an SR-IOV network with the following options:
lxc network create sriov0 --type=sriov parent=eno2 mtu=1500
where eno2 is the name of the network interface corresponding to one of the physical functions of the network card.
Then, to attach containers to this network, I tried the following:
- Modify the configuration of an existing (stopped) container and add the following device:
eth1:
name: eth1
type: nic
network: sriov0
or directly:
eth1:
name: eth1
type: nic
nictype: sriov
parent: eno2
This works for the first container, but usually when I try to spawn a second one with the same device configuration, I get an error like this one:
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart another-container /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/another-container/lxc.conf:
Try `lxc info --show-log another-container` for more info
giu@server: lxc info --show-log another-container
Name: another-container
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/12/22 15:37 UTC
Status: Stopped
Type: container
Profiles: default
Log:
lxc another-container 20201222153823.955 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset/system/lxc.monitor.another-container"
lxc another-container 20201222153823.967 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset/system/lxc.payload.another-container"
lxc another-container 20201222153824.200 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1573 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc another-container 20201222153825.355 ERROR network - network.c:__instantiate_ns_common:882 - File exists - Failed to rename network device "vetheac53f63" to "eth0"
lxc another-container 20201222153825.355 ERROR network - network.c:lxc_setup_network_in_child_namespaces:3528 - File exists - Failed to setup netdev
lxc another-container 20201222153825.355 ERROR conf - conf.c:lxc_setup:3295 - Failed to setup network
lxc another-container 20201222153825.355 ERROR start - start.c:do_start:1218 - Failed to setup container "another-container"
lxc another-container 20201222153825.356 ERROR sync - sync.c:__sync_wait:36 - An error occurred in another process (expected sequence number 5)
lxc another-container 20201222153825.360 WARN network - network.c:lxc_delete_network_priv:3185 - Failed to rename interface with index 0 from "eth0" to its initial name "vetheac53f63"
lxc another-container 20201222153825.362 WARN network - network.c:lxc_delete_network_priv:3185 - Failed to rename interface with index 0 from "eth2" to its initial name "eth0"
lxc another-container 20201222153825.362 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:860 - Received container state "ABORTING" instead of "RUNNING"
lxc another-container 20201222153825.362 ERROR start - start.c:__lxc_start:1999 - Failed to spawn container "another-container"
lxc another-container 20201222153825.362 WARN start - start.c:lxc_abort:1013 - No such process - Failed to send SIGKILL via pidfd 30 for process 2081594
lxc 20201222153825.814 WARN commands - commands.c:lxc_cmd_rsp_recv:126 - Connection reset by peer - Failed to receive response for command "get_state"
Here is the output of lxc start with the debug flag
DBUG[12-22|16:01:22] Connecting to a local LXD over a Unix socket
DBUG[12-22|16:01:22] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
DBUG[12-22|16:01:22] Got response struct from LXD
DBUG[12-22|16:01:22]
{
"config": {
"core.https_address": "[::]:8443",
"core.trust_password": true
},
"api_extensions": [
"storage_zfs_remove_snapshots",
"container_host_shutdown_timeout",
"container_stop_priority",
"container_syscall_filtering",
"auth_pki",
"container_last_used_at",
"etag",
"patch",
"usb_devices",
"https_allowed_credentials",
"image_compression_algorithm",
"directory_manipulation",
"container_cpu_time",
"storage_zfs_use_refquota",
"storage_lvm_mount_options",
"network",
"profile_usedby",
"container_push",
"container_exec_recording",
"certificate_update",
"container_exec_signal_handling",
"gpu_devices",
"container_image_properties",
"migration_progress",
"id_map",
"network_firewall_filtering",
"network_routes",
"storage",
"file_delete",
"file_append",
"network_dhcp_expiry",
"storage_lvm_vg_rename",
"storage_lvm_thinpool_rename",
"network_vlan",
"image_create_aliases",
"container_stateless_copy",
"container_only_migration",
"storage_zfs_clone_copy",
"unix_device_rename",
"storage_lvm_use_thinpool",
"storage_rsync_bwlimit",
"network_vxlan_interface",
"storage_btrfs_mount_options",
"entity_description",
"image_force_refresh",
"storage_lvm_lv_resizing",
"id_map_base",
"file_symlinks",
"container_push_target",
"network_vlan_physical",
"storage_images_delete",
"container_edit_metadata",
"container_snapshot_stateful_migration",
"storage_driver_ceph",
"storage_ceph_user_name",
"resource_limits",
"storage_volatile_initial_source",
"storage_ceph_force_osd_reuse",
"storage_block_filesystem_btrfs",
"resources",
"kernel_limits",
"storage_api_volume_rename",
"macaroon_authentication",
"network_sriov",
"console",
"restrict_devlxd",
"migration_pre_copy",
"infiniband",
"maas_network",
"devlxd_events",
"proxy",
"network_dhcp_gateway",
"file_get_symlink",
"network_leases",
"unix_device_hotplug",
"storage_api_local_volume_handling",
"operation_description",
"clustering",
"event_lifecycle",
"storage_api_remote_volume_handling",
"nvidia_runtime",
"container_mount_propagation",
"container_backup",
"devlxd_images",
"container_local_cross_pool_handling",
"proxy_unix",
"proxy_udp",
"clustering_join",
"proxy_tcp_udp_multi_port_handling",
"network_state",
"proxy_unix_dac_properties",
"container_protection_delete",
"unix_priv_drop",
"pprof_http",
"proxy_haproxy_protocol",
"network_hwaddr",
"proxy_nat",
"network_nat_order",
"container_full",
"candid_authentication",
"backup_compression",
"candid_config",
"nvidia_runtime_config",
"storage_api_volume_snapshots",
"storage_unmapped",
"projects",
"candid_config_key",
"network_vxlan_ttl",
"container_incremental_copy",
"usb_optional_vendorid",
"snapshot_scheduling",
"container_copy_project",
"clustering_server_address",
"clustering_image_replication",
"container_protection_shift",
"snapshot_expiry",
"container_backup_override_pool",
"snapshot_expiry_creation",
"network_leases_location",
"resources_cpu_socket",
"resources_gpu",
"resources_numa",
"kernel_features",
"id_map_current",
"event_location",
"storage_api_remote_volume_snapshots",
"network_nat_address",
"container_nic_routes",
"rbac",
"cluster_internal_copy",
"seccomp_notify",
"lxc_features",
"container_nic_ipvlan",
"network_vlan_sriov",
"storage_cephfs",
"container_nic_ipfilter",
"resources_v2",
"container_exec_user_group_cwd",
"container_syscall_intercept",
"container_disk_shift",
"storage_shifted",
"resources_infiniband",
"daemon_storage",
"instances",
"image_types",
"resources_disk_sata",
"clustering_roles",
"images_expiry",
"resources_network_firmware",
"backup_compression_algorithm",
"ceph_data_pool_name",
"container_syscall_intercept_mount",
"compression_squashfs",
"container_raw_mount",
"container_nic_routed",
"container_syscall_intercept_mount_fuse",
"container_disk_ceph",
"virtual-machines",
"image_profiles",
"clustering_architecture",
"resources_disk_id",
"storage_lvm_stripes",
"vm_boot_priority",
"unix_hotplug_devices",
"api_filtering",
"instance_nic_network",
"clustering_sizing",
"firewall_driver",
"projects_limits",
"container_syscall_intercept_hugetlbfs",
"limits_hugepages",
"container_nic_routed_gateway",
"projects_restrictions",
"custom_volume_snapshot_expiry",
"volume_snapshot_scheduling",
"trust_ca_certificates",
"snapshot_disk_usage",
"clustering_edit_roles",
"container_nic_routed_host_address",
"container_nic_ipvlan_gateway",
"resources_usb_pci",
"resources_cpu_threads_numa",
"resources_cpu_core_die",
"api_os",
"container_nic_routed_host_table",
"container_nic_ipvlan_host_table",
"container_nic_ipvlan_mode",
"resources_system",
"images_push_relay",
"network_dns_search",
"container_nic_routed_limits",
"instance_nic_bridged_vlan",
"network_state_bond_bridge",
"usedby_consistency",
"custom_block_volumes",
"clustering_failure_domains",
"resources_gpu_mdev",
"console_vga_type",
"projects_limits_disk",
"network_type_macvlan",
"network_type_sriov",
"container_syscall_intercept_bpf_devices",
"network_type_ovn",
"projects_networks",
"projects_networks_restricted_uplinks",
"custom_volume_backup",
"backup_override_name",
"storage_rsync_compression",
"network_type_physical",
"network_ovn_external_subnets",
"network_ovn_nat",
"network_ovn_external_routes_remove",
"tpm_device_type",
"storage_zfs_clone_copy_rebase",
"gpu_mdev",
"resources_pci_iommu",
"resources_network_usb",
"resources_disk_address",
"network_physical_ovn_ingress_mode",
"network_ovn_dhcp",
"network_physical_routes_anycast",
"projects_limits_instances"
],
"api_status": "stable",
"api_version": "1.0",
"auth": "trusted",
"public": false,
"auth_methods": [
"tls"
],
"environment": {
"addresses": [
"192.168.1.1:8443",
"10.41.74.9:8443",
"10.0.3.1:8443",
"192.168.122.1:8443",
"192.168.100.128:8443",
"172.18.0.1:8443",
"172.17.0.1:8443",
"10.92.231.1:8443",
"[fd42:ab0c:92d6:8603::1]:8443",
"10.168.233.1:8443",
"[fd42:3467:d55c:6566::1]:8443",
"10.140.85.1:8443",
"[fd42:4350:257b:233d::1]:8443"
],
"architectures": [
"x86_64",
"i686"
],
"certificate": "-----BEGIN CERTIFICATE-----\nMIICBTCCAYqgAwIBAgIRAJx/SEicvTkyOxvr6KU49bYwCgYIKoZIzj0EAwMwNDEc\nMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEUMBIGA1UEAwwLcm9vdEB2aXJ0\ndGIwHhcNMjAwOTI4MTU0MzQxWhcNMzAwOTI2MTU0MzQxWjA0MRwwGgYDVQQKExNs\naW51eGNvbnRhaW5lcnMub3JnMRQwEgYDVQQDDAtyb290QHZpcnR0YjB2MBAGByqG\nSM49AgEGBSuBBAAiA2IABAfjbZY17bwD0qzpr3gMKuS4U/njUjFGAy2ZWYY4NK0p\n8a+55lNZ+jfR4b64/Y0xNbnr0ZoV9EmTCvaiAvVEEH3NMTRZl7YTQOCO1my6QHqq\n6vZDJb5tesY6K4PYo9btAqNgMF4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG\nCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwKQYDVR0RBCIwIIIGdmlydHRihwR/AAAB\nhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2kAMGYCMQCN6Wl6fRiFt3kb\nuZjM3PdsdVh9y6b7GW14EfQaf5TxTSxawXjIJsLJPhVFZJxbw5sCMQDJFIXB7Aqg\naRcELjPipTY0g3x7aT0Zkp5HCK9bdx5LScVnz3gCl8KU1YxetmCdU1E=\n-----END CERTIFICATE-----\n",
"certificate_fingerprint": "b6b923261d635207816ade36695ecf9caad2a0e36bcd8b5fa8ec0f046dacff96",
"driver": "lxc | qemu",
"driver_version": "4.0.5 | 5.2.0",
"firewall": "xtables",
"kernel": "Linux",
"kernel_architecture": "x86_64",
"kernel_features": {
"netnsid_getifaddrs": "true",
"seccomp_listener": "true",
"seccomp_listener_continue": "true",
"shiftfs": "false",
"uevent_injection": "true",
"unpriv_fscaps": "true"
},
"kernel_version": "5.6.19-rt12opt",
"lxc_features": {
"cgroup2": "true",
"devpts_fd": "true",
"mount_injection_file": "true",
"network_gateway_device_route": "true",
"network_ipvlan": "true",
"network_l2proxy": "true",
"network_phys_macvlan_mtu": "true",
"network_veth_router": "true",
"pidfd": "true",
"seccomp_allow_deny_syntax": "true",
"seccomp_notify": "true",
"seccomp_proxy_send_notify_fd": "true"
},
"os_name": "Ubuntu",
"os_version": "20.04",
"project": "default",
"server": "lxd",
"server_clustered": false,
"server_name": "virttb01",
"server_pid": 1128851,
"server_version": "4.9",
"storage": "btrfs",
"storage_version": "4.15.1"
}
}
DBUG[12-22|16:01:22] Sending request to LXD method=GET url=http://unix.socket/1.0/instances/another-container etag=
DBUG[12-22|16:01:22] Got response struct from LXD
DBUG[12-22|16:01:22]
{
"architecture": "x86_64",
"config": {
"image.architecture": "amd64",
"image.description": "ubuntu 20.04 LTS amd64 (release) (20201210)",
"image.label": "release",
"image.os": "ubuntu",
"image.release": "focal",
"image.serial": "20201210",
"image.type": "squashfs",
"image.version": "20.04",
"volatile.base_image": "e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6",
"volatile.eth0.hwaddr": "00:16:3e:b5:4a:8a",
"volatile.eth1.hwaddr": "00:16:3e:7f:d9:10",
"volatile.eth1.name": "eth1",
"volatile.idmap.base": "0",
"volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
"volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
"volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
"volatile.last_state.power": "STOPPED",
"volatile.sriov0.hwaddr": "00:16:3e:14:80:c6",
"volatile.sriov0.name": "eth2",
"volatile.uuid": "cc8d3ee7-d3c9-4cde-8aad-16a446b88987"
},
"devices": {
"eth1": {
"name": "eth1",
"network": "sriov0",
"type": "nic"
}
},
"ephemeral": false,
"profiles": [
"default"
],
"stateful": false,
"description": "",
"created_at": "2020-12-22T15:37:29.129694059Z",
"expanded_config": {
"image.architecture": "amd64",
"image.description": "ubuntu 20.04 LTS amd64 (release) (20201210)",
"image.label": "release",
"image.os": "ubuntu",
"image.release": "focal",
"image.serial": "20201210",
"image.type": "squashfs",
"image.version": "20.04",
"volatile.base_image": "e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6",
"volatile.eth0.hwaddr": "00:16:3e:b5:4a:8a",
"volatile.eth1.hwaddr": "00:16:3e:7f:d9:10",
"volatile.eth1.name": "eth1",
"volatile.idmap.base": "0",
"volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
"volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
"volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
"volatile.last_state.power": "STOPPED",
"volatile.sriov0.hwaddr": "00:16:3e:14:80:c6",
"volatile.sriov0.name": "eth2",
"volatile.uuid": "cc8d3ee7-d3c9-4cde-8aad-16a446b88987"
},
"expanded_devices": {
"eth0": {
"name": "eth0",
"network": "lxdbr0",
"type": "nic"
},
"eth1": {
"name": "eth1",
"network": "sriov0",
"type": "nic"
},
"root": {
"path": "/",
"pool": "default",
"type": "disk"
}
},
"name": "another-container",
"status": "Stopped",
"status_code": 102,
"last_used_at": "2020-12-22T15:45:25.087660315Z",
"location": "none",
"type": "container"
}
DBUG[12-22|16:01:22] Connected to the websocket: ws://unix.socket/1.0/events
DBUG[12-22|16:01:22] Sending request to LXD method=PUT url=http://unix.socket/1.0/instances/another-container/state etag=
DBUG[12-22|16:01:22]
{
"action": "start",
"timeout": 0,
"force": false,
"stateful": false
}
DBUG[12-22|16:01:22] Got operation from LXD
DBUG[12-22|16:01:22]
{
"id": "49317f1e-3720-4168-b0a7-06a7a1a5e768",
"class": "task",
"description": "Starting container",
"created_at": "2020-12-22T16:01:22.089896976Z",
"updated_at": "2020-12-22T16:01:22.089896976Z",
"status": "Running",
"status_code": 103,
"resources": {
"containers": [
"/1.0/containers/another-container"
]
},
"metadata": null,
"may_cancel": false,
"err": "",
"location": "none"
}
DBUG[12-22|16:01:22] Sending request to LXD method=GET url=http://unix.socket/1.0/operations/49317f1e-3720-4168-b0a7-06a7a1a5e768 etag=
DBUG[12-22|16:01:22] Got response struct from LXD
DBUG[12-22|16:01:22]
{
"id": "49317f1e-3720-4168-b0a7-06a7a1a5e768",
"class": "task",
"description": "Starting container",
"created_at": "2020-12-22T16:01:22.089896976Z",
"updated_at": "2020-12-22T16:01:22.089896976Z",
"status": "Running",
"status_code": 103,
"resources": {
"containers": [
"/1.0/containers/another-container"
]
},
"metadata": null,
"may_cancel": false,
"err": "",
"location": "none"
}
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart another-container /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/another-container/lxc.conf:
Try `lxc info --show-log another-container` for more info
Sometimes I get a time-out error due to the daemon trying to bind the container to virtual function which does not exist (looks like an issue with naming conventions).
- Attach the container to the network with:
lxc network attach sriov0 a-container eth1 eth1
This just fails with the error:
Error: Failed to start device "eth1": Parent device 'sriov0' doesn't exist
- Create the container with a profile like this one:
config: {}
description: ""
devices:
eth1:
name: eth1
network: sriov0
type: nic
eth2:
name: eth2
network: br0
type: nic
root:
path: /
pool: default
type: disk
name: sriov-profile
used_by: []
With this I get the same error as 1).
Am I doing something wrong?
Let me know if you need me to provide other logs/debug prints.