Issues creating multiple containers with SR-IOV nic

Hello everyone,

I have a server running Ubuntu 20.04 Server with PREEMPT-RT patch and a 10G SR-IOV Intel network card with two physical interfaces (Intel Corporation Ethernet Controller 10G X550T). I would like to provision multiple containers and assign a virtual function to each. To do that, I create an SR-IOV network with the following options:

lxc network create sriov0 --type=sriov parent=eno2 mtu=1500

where eno2 is the name of the network interface corresponding to one of the physical functions of the network card.
Then, to attach containers to this network, I tried the following:

  1. Modify the configuration of an existing (stopped) container and add the following device:
eth1:
  name: eth1
  type: nic
  network: sriov0

or directly:

eth1:
  name: eth1
  type: nic
  nictype: sriov
  parent: eno2

This works for the first container, but usually when I try to spawn a second one with the same device configuration, I get an error like this one:

Error: Failed to run: /snap/lxd/current/bin/lxd forkstart another-container /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/another-container/lxc.conf: 
Try `lxc info --show-log another-container` for more info
giu@server: lxc info --show-log another-container
Name: another-container
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/12/22 15:37 UTC
Status: Stopped
Type: container
Profiles: default

Log:

lxc another-container 20201222153823.955 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset/system/lxc.monitor.another-container"
lxc another-container 20201222153823.967 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset/system/lxc.payload.another-container"
lxc another-container 20201222153824.200 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1573 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc another-container 20201222153825.355 ERROR    network - network.c:__instantiate_ns_common:882 - File exists - Failed to rename network device "vetheac53f63" to "eth0"
lxc another-container 20201222153825.355 ERROR    network - network.c:lxc_setup_network_in_child_namespaces:3528 - File exists - Failed to setup netdev
lxc another-container 20201222153825.355 ERROR    conf - conf.c:lxc_setup:3295 - Failed to setup network
lxc another-container 20201222153825.355 ERROR    start - start.c:do_start:1218 - Failed to setup container "another-container"
lxc another-container 20201222153825.356 ERROR    sync - sync.c:__sync_wait:36 - An error occurred in another process (expected sequence number 5)
lxc another-container 20201222153825.360 WARN     network - network.c:lxc_delete_network_priv:3185 - Failed to rename interface with index 0 from "eth0" to its initial name "vetheac53f63"
lxc another-container 20201222153825.362 WARN     network - network.c:lxc_delete_network_priv:3185 - Failed to rename interface with index 0 from "eth2" to its initial name "eth0"
lxc another-container 20201222153825.362 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:860 - Received container state "ABORTING" instead of "RUNNING"
lxc another-container 20201222153825.362 ERROR    start - start.c:__lxc_start:1999 - Failed to spawn container "another-container"
lxc another-container 20201222153825.362 WARN     start - start.c:lxc_abort:1013 - No such process - Failed to send SIGKILL via pidfd 30 for process 2081594
lxc 20201222153825.814 WARN     commands - commands.c:lxc_cmd_rsp_recv:126 - Connection reset by peer - Failed to receive response for command "get_state"

Here is the output of lxc start with the debug flag

DBUG[12-22|16:01:22] Connecting to a local LXD over a Unix socket 
DBUG[12-22|16:01:22] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[12-22|16:01:22] Got response struct from LXD 
DBUG[12-22|16:01:22] 
	{
		"config": {
			"core.https_address": "[::]:8443",
			"core.trust_password": true
		},
		"api_extensions": [
			"storage_zfs_remove_snapshots",
			"container_host_shutdown_timeout",
			"container_stop_priority",
			"container_syscall_filtering",
			"auth_pki",
			"container_last_used_at",
			"etag",
			"patch",
			"usb_devices",
			"https_allowed_credentials",
			"image_compression_algorithm",
			"directory_manipulation",
			"container_cpu_time",
			"storage_zfs_use_refquota",
			"storage_lvm_mount_options",
			"network",
			"profile_usedby",
			"container_push",
			"container_exec_recording",
			"certificate_update",
			"container_exec_signal_handling",
			"gpu_devices",
			"container_image_properties",
			"migration_progress",
			"id_map",
			"network_firewall_filtering",
			"network_routes",
			"storage",
			"file_delete",
			"file_append",
			"network_dhcp_expiry",
			"storage_lvm_vg_rename",
			"storage_lvm_thinpool_rename",
			"network_vlan",
			"image_create_aliases",
			"container_stateless_copy",
			"container_only_migration",
			"storage_zfs_clone_copy",
			"unix_device_rename",
			"storage_lvm_use_thinpool",
			"storage_rsync_bwlimit",
			"network_vxlan_interface",
			"storage_btrfs_mount_options",
			"entity_description",
			"image_force_refresh",
			"storage_lvm_lv_resizing",
			"id_map_base",
			"file_symlinks",
			"container_push_target",
			"network_vlan_physical",
			"storage_images_delete",
			"container_edit_metadata",
			"container_snapshot_stateful_migration",
			"storage_driver_ceph",
			"storage_ceph_user_name",
			"resource_limits",
			"storage_volatile_initial_source",
			"storage_ceph_force_osd_reuse",
			"storage_block_filesystem_btrfs",
			"resources",
			"kernel_limits",
			"storage_api_volume_rename",
			"macaroon_authentication",
			"network_sriov",
			"console",
			"restrict_devlxd",
			"migration_pre_copy",
			"infiniband",
			"maas_network",
			"devlxd_events",
			"proxy",
			"network_dhcp_gateway",
			"file_get_symlink",
			"network_leases",
			"unix_device_hotplug",
			"storage_api_local_volume_handling",
			"operation_description",
			"clustering",
			"event_lifecycle",
			"storage_api_remote_volume_handling",
			"nvidia_runtime",
			"container_mount_propagation",
			"container_backup",
			"devlxd_images",
			"container_local_cross_pool_handling",
			"proxy_unix",
			"proxy_udp",
			"clustering_join",
			"proxy_tcp_udp_multi_port_handling",
			"network_state",
			"proxy_unix_dac_properties",
			"container_protection_delete",
			"unix_priv_drop",
			"pprof_http",
			"proxy_haproxy_protocol",
			"network_hwaddr",
			"proxy_nat",
			"network_nat_order",
			"container_full",
			"candid_authentication",
			"backup_compression",
			"candid_config",
			"nvidia_runtime_config",
			"storage_api_volume_snapshots",
			"storage_unmapped",
			"projects",
			"candid_config_key",
			"network_vxlan_ttl",
			"container_incremental_copy",
			"usb_optional_vendorid",
			"snapshot_scheduling",
			"container_copy_project",
			"clustering_server_address",
			"clustering_image_replication",
			"container_protection_shift",
			"snapshot_expiry",
			"container_backup_override_pool",
			"snapshot_expiry_creation",
			"network_leases_location",
			"resources_cpu_socket",
			"resources_gpu",
			"resources_numa",
			"kernel_features",
			"id_map_current",
			"event_location",
			"storage_api_remote_volume_snapshots",
			"network_nat_address",
			"container_nic_routes",
			"rbac",
			"cluster_internal_copy",
			"seccomp_notify",
			"lxc_features",
			"container_nic_ipvlan",
			"network_vlan_sriov",
			"storage_cephfs",
			"container_nic_ipfilter",
			"resources_v2",
			"container_exec_user_group_cwd",
			"container_syscall_intercept",
			"container_disk_shift",
			"storage_shifted",
			"resources_infiniband",
			"daemon_storage",
			"instances",
			"image_types",
			"resources_disk_sata",
			"clustering_roles",
			"images_expiry",
			"resources_network_firmware",
			"backup_compression_algorithm",
			"ceph_data_pool_name",
			"container_syscall_intercept_mount",
			"compression_squashfs",
			"container_raw_mount",
			"container_nic_routed",
			"container_syscall_intercept_mount_fuse",
			"container_disk_ceph",
			"virtual-machines",
			"image_profiles",
			"clustering_architecture",
			"resources_disk_id",
			"storage_lvm_stripes",
			"vm_boot_priority",
			"unix_hotplug_devices",
			"api_filtering",
			"instance_nic_network",
			"clustering_sizing",
			"firewall_driver",
			"projects_limits",
			"container_syscall_intercept_hugetlbfs",
			"limits_hugepages",
			"container_nic_routed_gateway",
			"projects_restrictions",
			"custom_volume_snapshot_expiry",
			"volume_snapshot_scheduling",
			"trust_ca_certificates",
			"snapshot_disk_usage",
			"clustering_edit_roles",
			"container_nic_routed_host_address",
			"container_nic_ipvlan_gateway",
			"resources_usb_pci",
			"resources_cpu_threads_numa",
			"resources_cpu_core_die",
			"api_os",
			"container_nic_routed_host_table",
			"container_nic_ipvlan_host_table",
			"container_nic_ipvlan_mode",
			"resources_system",
			"images_push_relay",
			"network_dns_search",
			"container_nic_routed_limits",
			"instance_nic_bridged_vlan",
			"network_state_bond_bridge",
			"usedby_consistency",
			"custom_block_volumes",
			"clustering_failure_domains",
			"resources_gpu_mdev",
			"console_vga_type",
			"projects_limits_disk",
			"network_type_macvlan",
			"network_type_sriov",
			"container_syscall_intercept_bpf_devices",
			"network_type_ovn",
			"projects_networks",
			"projects_networks_restricted_uplinks",
			"custom_volume_backup",
			"backup_override_name",
			"storage_rsync_compression",
			"network_type_physical",
			"network_ovn_external_subnets",
			"network_ovn_nat",
			"network_ovn_external_routes_remove",
			"tpm_device_type",
			"storage_zfs_clone_copy_rebase",
			"gpu_mdev",
			"resources_pci_iommu",
			"resources_network_usb",
			"resources_disk_address",
			"network_physical_ovn_ingress_mode",
			"network_ovn_dhcp",
			"network_physical_routes_anycast",
			"projects_limits_instances"
		],
		"api_status": "stable",
		"api_version": "1.0",
		"auth": "trusted",
		"public": false,
		"auth_methods": [
			"tls"
		],
		"environment": {
			"addresses": [
				"192.168.1.1:8443",
				"10.41.74.9:8443",
				"10.0.3.1:8443",
				"192.168.122.1:8443",
				"192.168.100.128:8443",
				"172.18.0.1:8443",
				"172.17.0.1:8443",
				"10.92.231.1:8443",
				"[fd42:ab0c:92d6:8603::1]:8443",
				"10.168.233.1:8443",
				"[fd42:3467:d55c:6566::1]:8443",
				"10.140.85.1:8443",
				"[fd42:4350:257b:233d::1]:8443"
			],
			"architectures": [
				"x86_64",
				"i686"
			],
			"certificate": "-----BEGIN CERTIFICATE-----\nMIICBTCCAYqgAwIBAgIRAJx/SEicvTkyOxvr6KU49bYwCgYIKoZIzj0EAwMwNDEc\nMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEUMBIGA1UEAwwLcm9vdEB2aXJ0\ndGIwHhcNMjAwOTI4MTU0MzQxWhcNMzAwOTI2MTU0MzQxWjA0MRwwGgYDVQQKExNs\naW51eGNvbnRhaW5lcnMub3JnMRQwEgYDVQQDDAtyb290QHZpcnR0YjB2MBAGByqG\nSM49AgEGBSuBBAAiA2IABAfjbZY17bwD0qzpr3gMKuS4U/njUjFGAy2ZWYY4NK0p\n8a+55lNZ+jfR4b64/Y0xNbnr0ZoV9EmTCvaiAvVEEH3NMTRZl7YTQOCO1my6QHqq\n6vZDJb5tesY6K4PYo9btAqNgMF4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG\nCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwKQYDVR0RBCIwIIIGdmlydHRihwR/AAAB\nhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2kAMGYCMQCN6Wl6fRiFt3kb\nuZjM3PdsdVh9y6b7GW14EfQaf5TxTSxawXjIJsLJPhVFZJxbw5sCMQDJFIXB7Aqg\naRcELjPipTY0g3x7aT0Zkp5HCK9bdx5LScVnz3gCl8KU1YxetmCdU1E=\n-----END CERTIFICATE-----\n",
			"certificate_fingerprint": "b6b923261d635207816ade36695ecf9caad2a0e36bcd8b5fa8ec0f046dacff96",
			"driver": "lxc | qemu",
			"driver_version": "4.0.5 | 5.2.0",
			"firewall": "xtables",
			"kernel": "Linux",
			"kernel_architecture": "x86_64",
			"kernel_features": {
				"netnsid_getifaddrs": "true",
				"seccomp_listener": "true",
				"seccomp_listener_continue": "true",
				"shiftfs": "false",
				"uevent_injection": "true",
				"unpriv_fscaps": "true"
			},
			"kernel_version": "5.6.19-rt12opt",
			"lxc_features": {
				"cgroup2": "true",
				"devpts_fd": "true",
				"mount_injection_file": "true",
				"network_gateway_device_route": "true",
				"network_ipvlan": "true",
				"network_l2proxy": "true",
				"network_phys_macvlan_mtu": "true",
				"network_veth_router": "true",
				"pidfd": "true",
				"seccomp_allow_deny_syntax": "true",
				"seccomp_notify": "true",
				"seccomp_proxy_send_notify_fd": "true"
			},
			"os_name": "Ubuntu",
			"os_version": "20.04",
			"project": "default",
			"server": "lxd",
			"server_clustered": false,
			"server_name": "virttb01",
			"server_pid": 1128851,
			"server_version": "4.9",
			"storage": "btrfs",
			"storage_version": "4.15.1"
		}
	} 
DBUG[12-22|16:01:22] Sending request to LXD                   method=GET url=http://unix.socket/1.0/instances/another-container etag=
DBUG[12-22|16:01:22] Got response struct from LXD 
DBUG[12-22|16:01:22] 
	{
		"architecture": "x86_64",
		"config": {
			"image.architecture": "amd64",
			"image.description": "ubuntu 20.04 LTS amd64 (release) (20201210)",
			"image.label": "release",
			"image.os": "ubuntu",
			"image.release": "focal",
			"image.serial": "20201210",
			"image.type": "squashfs",
			"image.version": "20.04",
			"volatile.base_image": "e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6",
			"volatile.eth0.hwaddr": "00:16:3e:b5:4a:8a",
			"volatile.eth1.hwaddr": "00:16:3e:7f:d9:10",
			"volatile.eth1.name": "eth1",
			"volatile.idmap.base": "0",
			"volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
			"volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
			"volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
			"volatile.last_state.power": "STOPPED",
			"volatile.sriov0.hwaddr": "00:16:3e:14:80:c6",
			"volatile.sriov0.name": "eth2",
			"volatile.uuid": "cc8d3ee7-d3c9-4cde-8aad-16a446b88987"
		},
		"devices": {
			"eth1": {
				"name": "eth1",
				"network": "sriov0",
				"type": "nic"
			}
		},
		"ephemeral": false,
		"profiles": [
			"default"
		],
		"stateful": false,
		"description": "",
		"created_at": "2020-12-22T15:37:29.129694059Z",
		"expanded_config": {
			"image.architecture": "amd64",
			"image.description": "ubuntu 20.04 LTS amd64 (release) (20201210)",
			"image.label": "release",
			"image.os": "ubuntu",
			"image.release": "focal",
			"image.serial": "20201210",
			"image.type": "squashfs",
			"image.version": "20.04",
			"volatile.base_image": "e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6",
			"volatile.eth0.hwaddr": "00:16:3e:b5:4a:8a",
			"volatile.eth1.hwaddr": "00:16:3e:7f:d9:10",
			"volatile.eth1.name": "eth1",
			"volatile.idmap.base": "0",
			"volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
			"volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
			"volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
			"volatile.last_state.power": "STOPPED",
			"volatile.sriov0.hwaddr": "00:16:3e:14:80:c6",
			"volatile.sriov0.name": "eth2",
			"volatile.uuid": "cc8d3ee7-d3c9-4cde-8aad-16a446b88987"
		},
		"expanded_devices": {
			"eth0": {
				"name": "eth0",
				"network": "lxdbr0",
				"type": "nic"
			},
			"eth1": {
				"name": "eth1",
				"network": "sriov0",
				"type": "nic"
			},
			"root": {
				"path": "/",
				"pool": "default",
				"type": "disk"
			}
		},
		"name": "another-container",
		"status": "Stopped",
		"status_code": 102,
		"last_used_at": "2020-12-22T15:45:25.087660315Z",
		"location": "none",
		"type": "container"
	} 
DBUG[12-22|16:01:22] Connected to the websocket: ws://unix.socket/1.0/events 
DBUG[12-22|16:01:22] Sending request to LXD                   method=PUT url=http://unix.socket/1.0/instances/another-container/state etag=
DBUG[12-22|16:01:22] 
	{
		"action": "start",
		"timeout": 0,
		"force": false,
		"stateful": false
	} 
DBUG[12-22|16:01:22] Got operation from LXD 
DBUG[12-22|16:01:22] 
	{
		"id": "49317f1e-3720-4168-b0a7-06a7a1a5e768",
		"class": "task",
		"description": "Starting container",
		"created_at": "2020-12-22T16:01:22.089896976Z",
		"updated_at": "2020-12-22T16:01:22.089896976Z",
		"status": "Running",
		"status_code": 103,
		"resources": {
			"containers": [
				"/1.0/containers/another-container"
			]
		},
		"metadata": null,
		"may_cancel": false,
		"err": "",
		"location": "none"
	} 
DBUG[12-22|16:01:22] Sending request to LXD                   method=GET url=http://unix.socket/1.0/operations/49317f1e-3720-4168-b0a7-06a7a1a5e768 etag=
DBUG[12-22|16:01:22] Got response struct from LXD 
DBUG[12-22|16:01:22] 
	{
		"id": "49317f1e-3720-4168-b0a7-06a7a1a5e768",
		"class": "task",
		"description": "Starting container",
		"created_at": "2020-12-22T16:01:22.089896976Z",
		"updated_at": "2020-12-22T16:01:22.089896976Z",
		"status": "Running",
		"status_code": 103,
		"resources": {
			"containers": [
				"/1.0/containers/another-container"
			]
		},
		"metadata": null,
		"may_cancel": false,
		"err": "",
		"location": "none"
	} 
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart another-container /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/another-container/lxc.conf: 
Try `lxc info --show-log another-container` for more info

Sometimes I get a time-out error due to the daemon trying to bind the container to virtual function which does not exist (looks like an issue with naming conventions).

  1. Attach the container to the network with:
lxc network attach sriov0 a-container eth1 eth1

This just fails with the error:

Error: Failed to start device "eth1": Parent device 'sriov0' doesn't exist
  1. Create the container with a profile like this one:
config: {}
description: ""
devices:
  eth1:
    name: eth1
    network: sriov0
    type: nic
  eth2:
    name: eth2
    network: br0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: sriov-profile
used_by: []

With this I get the same error as 1).

Am I doing something wrong?

Let me know if you need me to provide other logs/debug prints.

SR-IOV physical managed network type is still pretty new and I haven’t had a time to play with it myself but @tomp may be able to help should he pop in before we get back to work in January.

Alternatively, you may want to try the older way of doing SR-IOV see if that behaves better:

type: nic
nictype: sriov
parent: eno2
mtu: 1500
name: eth1

Thanks for the prompt reply. I see, unfortunately I have tried both and that doesn’t work either - same error as 1).

OK thanks, so its not specifically to do with the sriov network but a more general issue with sriov NICs.

Please can you enable debug logging and then show the output when you start both the first and 2nd containers:

sudo snap set lxd daemon.debug=true; sudo systemctl reload snap.lxd.daemon
sudo tail -f /var/snap/lxd/common/lxd/logs/lxd.log

Thanks

Also can you show output of lxc config show <instance> --expanded for the case where you’ve used an sriov NIC directly. Thanks

Alright.

Let’s start with the the second case, i.e. directly using an sriov NIC. I created (w/o starting) two containers, c1 and c2, with the following configuration (output of lxc config show <instance> --expanded):

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20201210)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20201210"
  image.type: squashfs
  image.version: "20.04"
  volatile.apply_template: create
  volatile.base_image: e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6
  volatile.eth0.hwaddr: 00:16:3e:c9:eb:99
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  eth1:
    mtu: "1500"
    name: eth1
    nictype: sriov
    parent: eno2
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Here is the log after calling lxc start c1:

t=2020-12-23T09:28:15+0000 lvl=dbug msg="\n\t{\n\t\t\"action\": \"start\",\n\t\t\"timeout\": 0,\n\t\t\"force\": false,\n\t\t\"stateful\": fal
se\n\t}" 
t=2020-12-23T09:28:15+0000 lvl=dbug msg="New task Operation: abad5dc5-ec1b-4cbb-a544-665d6cf916ec" 
t=2020-12-23T09:28:15+0000 lvl=dbug msg="Started task operation: abad5dc5-ec1b-4cbb-a544-665d6cf916ec" 
t=2020-12-23T09:28:15+0000 lvl=dbug msg="\n\t{\n\t\t\"type\": \"async\",\n\t\t\"status\": \"Operation created\",\n\t\t\"status_code\": 100,\n
\t\t\"operation\": \"/1.0/operations/abad5dc5-ec1b-4cbb-a544-665d6cf916ec\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\":
 {\n\t\t\t\"id\": \"abad5dc5-ec1b-4cbb-a544-665d6cf916ec\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Starting container\",\n\t\t
\t\"created_at\": \"2020-12-23T09:28:15.21422414Z\",\n\t\t\t\"updated_at\": \"2020-12-23T09:28:15.21422414Z\",\n\t\t\t\"status\": \"Running\"
,\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"containers\": [\n\t\t\t\t\t\"/1.0/containers/c1\"\n\t\t\t\t]\n\t\t\t},\n\t\
t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"none\"\n\t\t}\n\t}" 
t=2020-12-23T09:28:15+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/operations/abad5dc5-ec1b-4cbb-a544-665d6cf916ec usern
ame=mfpp
t=2020-12-23T09:28:15+0000 lvl=dbug msg="MountInstance started" driver=btrfs instance=c1 pool=default project=default
t=2020-12-23T09:28:15+0000 lvl=dbug msg="MountInstance finished" driver=btrfs instance=c1 pool=default project=default
t=2020-12-23T09:28:15+0000 lvl=dbug msg="Container idmap changed, remapping" instance=c1 instanceType=container project=default
t=2020-12-23T09:28:15+0000 lvl=dbug msg="Updated metadata for task Operation: abad5dc5-ec1b-4cbb-a544-665d6cf916ec" 
t=2020-12-23T09:28:17+0000 lvl=dbug msg="Updated metadata for task Operation: abad5dc5-ec1b-4cbb-a544-665d6cf916ec" 
t=2020-12-23T09:28:17+0000 lvl=dbug msg="Starting device" device=eth0 instance=c1 instanceType=container project=default type=nic
t=2020-12-23T09:28:17+0000 lvl=dbug msg="Scheduler: network: veth9932adbf has been added: updating network priorities" 
t=2020-12-23T09:28:17+0000 lvl=dbug msg="Scheduler: network: veth2a852da7 has been added: updating network priorities" 
t=2020-12-23T09:28:17+0000 lvl=dbug msg="Starting device" device=eth1 instance=c1 instanceType=container project=default type=nic
t=2020-12-23T09:28:18+0000 lvl=dbug msg="Scheduler: network: eth1 has been added: updating network priorities" 
t=2020-12-23T09:28:18+0000 lvl=dbug msg="Starting device" device=root instance=c1 instanceType=container project=default type=disk
t=2020-12-23T09:28:18+0000 lvl=dbug msg="UpdateInstanceBackupFile started" driver=btrfs instance=c1 pool=default project=default
t=2020-12-23T09:28:18+0000 lvl=dbug msg="UpdateInstanceBackupFile finished" driver=btrfs instance=c1 pool=default project=default
t=2020-12-23T09:28:18+0000 lvl=info msg="Starting container" action=start created=2020-12-23T09:19:39+0000 ephemeral=false instance=c1 instan
ceType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:28:18+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c1/onstart?project=default" username
=root
t=2020-12-23T09:28:19+0000 lvl=dbug msg="Scheduler: container c1 started: re-balancing" 
t=2020-12-23T09:28:20+0000 lvl=info msg="Started container" action=start created=2020-12-23T09:19:39+0000 ephemeral=false instance=c1 instanc
eType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:28:20+0000 lvl=dbug msg="Success for task operation: abad5dc5-ec1b-4cbb-a544-665d6cf916ec" 
t=2020-12-23T09:28:20+0000 lvl=dbug msg="Event listener finished: 5f06786c-56fb-427e-a40f-8a856274f460" 
t=2020-12-23T09:28:20+0000 lvl=dbug msg="Disconnected event listener: 5f06786c-56fb-427e-a40f-8a856274f460"

And here is the log after starting c2:

t=2020-12-23T09:31:29+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=mfpp                                        
t=2020-12-23T09:31:29+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/instances/c2 username=mfpp                           
t=2020-12-23T09:31:29+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/events username=mfpp                                 
t=2020-12-23T09:31:29+0000 lvl=dbug msg="New event listener: 2e5053b9-2934-479f-9f42-321e1df01bd8"                                           
t=2020-12-23T09:31:29+0000 lvl=dbug msg=Handling ip=@ method=PUT protocol=unix url=/1.0/instances/c2/state username=mfpp                     
t=2020-12-23T09:31:29+0000 lvl=dbug msg="\n\t{\n\t\t\"action\": \"start\",\n\t\t\"timeout\": 0,\n\t\t\"force\": false,\n\t\t\"stateful\": fal
se\n\t}"                                                                                                                                     
t=2020-12-23T09:31:29+0000 lvl=dbug msg="New task Operation: bb462b74-3b3a-40eb-89ce-ecdaf54191c0"                                           
t=2020-12-23T09:31:29+0000 lvl=dbug msg="Started task operation: bb462b74-3b3a-40eb-89ce-ecdaf54191c0"                                       
t=2020-12-23T09:31:29+0000 lvl=dbug msg="\n\t{\n\t\t\"type\": \"async\",\n\t\t\"status\": \"Operation created\",\n\t\t\"status_code\": 100,\n
\t\t\"operation\": \"/1.0/operations/bb462b74-3b3a-40eb-89ce-ecdaf54191c0\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\":
 {\n\t\t\t\"id\": \"bb462b74-3b3a-40eb-89ce-ecdaf54191c0\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Starting container\",\n\t\t
\t\"created_at\": \"2020-12-23T09:31:29.323045233Z\",\n\t\t\t\"updated_at\": \"2020-12-23T09:31:29.323045233Z\",\n\t\t\t\"status\": \"Running
\",\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"containers\": [\n\t\t\t\t\t\"/1.0/containers/c2\"\n\t\t\t\t]\n\t\t\t},\n\
t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"none\"\n\t\t}\n\t}"                      
t=2020-12-23T09:31:29+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/operations/bb462b74-3b3a-40eb-89ce-ecdaf54191c0 usern
ame=mfpp  
t=2020-12-23T09:31:33+0000 lvl=dbug msg="UpdateInstanceBackupFile started" driver=btrfs instance=c2 pool=default project=default
t=2020-12-23T09:31:33+0000 lvl=dbug msg="UpdateInstanceBackupFile finished" driver=btrfs instance=c2 pool=default project=default            
t=2020-12-23T09:31:33+0000 lvl=info msg="Starting container" action=start created=2020-12-23T09:20:17+0000 ephemeral=false instance=c2 instanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000                                                    
t=2020-12-23T09:31:33+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c2/onstart?project=default" username=root                                                                                                                                        
t=2020-12-23T09:31:33+0000 lvl=dbug msg="Scheduler: container c2 started: re-balancing" 
t=2020-12-23T09:31:33+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c2/onstopns?netns=%2Fproc%2F2243156%2Ffd%2F4&project=default&target=stop" username=root                                                                                          
t=2020-12-23T09:31:33+0000 lvl=dbug msg="Stopping device" device=eth1 instance=c2 instanceType=container project=default type=nic            
t=2020-12-23T09:31:33+0000 lvl=eror msg="Failed to stop device" devName=eth1 err="Failed to detach interface: \"eth1\" to \"eth0\": Failed to run: /snap/lxd/current/bin/lxd forknet detach -- /proc/2243156/fd/4 2239372 eth1 eth0: Error: Failed to run: ip address flush dev eth1: Device \"eth1\" does not exist." instance=c2 instanceType=container project=default                                                              
t=2020-12-23T09:31:33+0000 lvl=dbug msg="Stopping device" device=eth0 instance=c2 instanceType=container project=default type=nic
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Clearing instance firewall static filters" dev=eth0 host_name=veth7d0b803e hwaddr=00:16:3e:e4:93:c9 instance=c2 ipv4=0.0.0.0 ipv6=:: parent=lxdbr0 project=default                                                                               
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Clearing instance firewall dynamic filters" dev=eth0 host_name=veth7d0b803e hwaddr=00:16:3e:e4:93:c9
 instance=c2 ipv4=<nil> ipv6=<nil> parent=lxdbr0 project=default
t=2020-12-23T09:31:34+0000 lvl=eror msg="Failed starting container" action=start created=2020-12-23T09:20:17+0000 ephemeral=false instance=c2
 instanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Failure for task operation: bb462b74-3b3a-40eb-89ce-ecdaf54191c0: Failed to run: /snap/lxd/current/b
in/lxd forkstart c2 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c2/lxc.conf: " 
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Event listener finished: 2e5053b9-2934-479f-9f42-321e1df01bd8" 
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Disconnected event listener: 2e5053b9-2934-479f-9f42-321e1df01bd8" 
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Scheduler: network: eth0 has been added: updating network priorities" 
t=2020-12-23T09:31:34+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c2/onstop?project=default&target=sto
p" username=root
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Container initiated" action=stop created=2020-12-23T09:20:17+0000 ephemeral=false instance=c2 instan
ceType=container project=default stateful=false used=2020-12-23T09:31:33+0000
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Container stopped, cleaning up" instance=c2 instanceType=container project=default
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Stopping device" device=root instance=c2 instanceType=container project=default type=disk
t=2020-12-23T09:31:34+0000 lvl=dbug msg="UnmountInstance started" driver=btrfs instance=c2 pool=default project=default
t=2020-12-23T09:31:34+0000 lvl=dbug msg="UnmountInstance finished" driver=btrfs instance=c2 pool=default project=default
t=2020-12-23T09:31:34+0000 lvl=info msg="Shut down container" action=stop created=2020-12-23T09:20:17+0000 ephemeral=false instance=c2 instan
ceType=container project=default stateful=false used=2020-12-23T09:31:33+0000
t=2020-12-23T09:31:34+0000 lvl=dbug msg="Scheduler: container c2 stopped: re-balancing"

Now the first case: with the network. I created the following network:

config:
  mtu: "1500"
  parent: eno2
description: ""
name: sriov0
type: sriov
used_by: []
managed: true
status: Created
locations:
- none

Then I created (without starting them), two containers, c1-net and c2-net, with the following configuration:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20201210)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20201210"
  image.type: squashfs
  image.version: "20.04"
  volatile.apply_template: create
  volatile.base_image: e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6
  volatile.eth0.hwaddr: 00:16:3e:bd:e5:04
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  eth1:
    name: eth1
    network: sriov0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Here is the log after ``lxc start c1-net``` (this failed already):

t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=mfpp                                [30/715]
t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/1.0/instances?recursion=1" username=mfpp                
t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=mfpp                                        
t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/instances/c1-net username=mfpp                       
t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/events username=mfpp                                 
t=2020-12-23T09:42:56+0000 lvl=dbug msg="New event listener: 2bf9ed46-5c7a-4fec-9348-10ffd5e8c210"                                           
t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=PUT protocol=unix url=/1.0/instances/c1-net/state username=mfpp                 
t=2020-12-23T09:42:56+0000 lvl=dbug msg="\n\t{\n\t\t\"action\": \"start\",\n\t\t\"timeout\": 0,\n\t\t\"force\": false,\n\t\t\"stateful\": fal
se\n\t}"                                                                                                                                     
t=2020-12-23T09:42:56+0000 lvl=dbug msg="New task Operation: 3503f93d-7bf8-4312-9426-fbb043e1b2ab"                                           
t=2020-12-23T09:42:56+0000 lvl=dbug msg="Started task operation: 3503f93d-7bf8-4312-9426-fbb043e1b2ab"                                       
t=2020-12-23T09:42:56+0000 lvl=dbug msg="\n\t{\n\t\t\"type\": \"async\",\n\t\t\"status\": \"Operation created\",\n\t\t\"status_code\": 100,\$
\t\t\"operation\": \"/1.0/operations/3503f93d-7bf8-4312-9426-fbb043e1b2ab\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\":
 {\n\t\t\t\"id\": \"3503f93d-7bf8-4312-9426-fbb043e1b2ab\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Starting container\",\n\t\t
\t\"created_at\": \"2020-12-23T09:42:56.976850063Z\",\n\t\t\t\"updated_at\": \"2020-12-23T09:42:56.976850063Z\",\n\t\t\t\"status\": \"Running
\",\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"containers\": [\n\t\t\t\t\t\"/1.0/containers/c1-net\"\n\t\t\t\t]\n\t\t\t}
,\n\t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"none\"\n\t\t}\n\t}"                  
t=2020-12-23T09:42:56+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/operations/3503f93d-7bf8-4312-9426-fbb043e1b2ab usern
ame=mfpp
t=2020-12-23T09:42:56+0000 lvl=dbug msg="MountInstance started" driver=btrfs instance=c1-net pool=default project=default
t=2020-12-23T09:42:56+0000 lvl=dbug msg="MountInstance finished" driver=btrfs instance=c1-net pool=default project=default
t=2020-12-23T09:42:56+0000 lvl=dbug msg="Container idmap changed, remapping" instance=c1-net instanceType=container project=default
t=2020-12-23T09:42:56+0000 lvl=dbug msg="Updated metadata for task Operation: 3503f93d-7bf8-4312-9426-fbb043e1b2ab" 
t=2020-12-23T09:42:59+0000 lvl=dbug msg="Updated metadata for task Operation: 3503f93d-7bf8-4312-9426-fbb043e1b2ab" 
t=2020-12-23T09:42:59+0000 lvl=dbug msg="Starting device" device=eth0 instance=c1-net instanceType=container project=default type=nic
t=2020-12-23T09:42:59+0000 lvl=dbug msg="Scheduler: network: veth086f0056 has been added: updating network priorities" 
t=2020-12-23T09:42:59+0000 lvl=dbug msg="Scheduler: network: veth87088b71 has been added: updating network priorities" 
t=2020-12-23T09:42:59+0000 lvl=dbug msg="Starting device" device=eth1 instance=c1-net instanceType=container project=default type=nic
t=2020-12-23T09:43:00+0000 lvl=dbug msg="Scheduler: network: eth0 has been added: updating network priorities" 
t=2020-12-23T09:43:00+0000 lvl=dbug msg="Starting device" device=root instance=c1-net instanceType=container project=default type=disk
t=2020-12-23T09:43:00+0000 lvl=dbug msg="UpdateInstanceBackupFile started" driver=btrfs instance=c1-net pool=default project=default
t=2020-12-23T09:43:00+0000 lvl=dbug msg="UpdateInstanceBackupFile finished" driver=btrfs instance=c1-net pool=default project=default
t=2020-12-23T09:43:00+0000 lvl=info msg="Starting container" action=start created=2020-12-23T09:40:19+0000 ephemeral=false instance=c1-net in
stanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:43:00+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c1-net/onstart?project=default" user
name=root
t=2020-12-23T09:43:00+0000 lvl=dbug msg="Scheduler: container c1-net started: re-balancing"
t=2020-12-23T09:43:02+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c1-net/onstopns?netns=%2Fproc%2F2245
888%2Ffd%2F4&project=default&target=stop" username=root
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Stopping device" device=eth1 instance=c1-net instanceType=container project=default type=nic
t=2020-12-23T09:43:02+0000 lvl=eror msg="Failed to stop device" devName=eth1 err="Failed to detach interface: \"eth1\" to \"eth0\": Failed to
 run: /snap/lxd/current/bin/lxd forknet detach -- /proc/2245888/fd/4 2239372 eth1 eth0: Error: Failed to run: ip address flush dev eth1: Devi
ce \"eth1\" does not exist." instance=c1-net instanceType=container project=default
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Stopping device" device=eth0 instance=c1-net instanceType=container project=default type=nic
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Clearing instance firewall static filters" dev=eth0 host_name=veth87088b71 hwaddr=00:16:3e:bd:e5:04 
instance=c1-net ipv4=0.0.0.0 ipv6=:: parent=lxdbr0 project=default
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Clearing instance firewall dynamic filters" dev=eth0 host_name=veth87088b71 hwaddr=00:16:3e:bd:e5:04
 instance=c1-net ipv4=<nil> ipv6=<nil> parent=lxdbr0 project=default
t=2020-12-23T09:43:02+0000 lvl=eror msg="Failed starting container" action=start created=2020-12-23T09:40:19+0000 ephemeral=false instance=c1
-net instanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Failure for task operation: 3503f93d-7bf8-4312-9426-fbb043e1b2ab: Failed to run: /snap/lxd/current/b
in/lxd forkstart c1-net /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c1-net/lxc.conf: " 
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Event listener finished: 2bf9ed46-5c7a-4fec-9348-10ffd5e8c210" 
t=2020-12-23T09:43:02+0000 lvl=dbug msg="Disconnected event listener: 2bf9ed46-5c7a-4fec-9348-10ffd5e8c210" 
t=2020-12-23T09:43:03+0000 lvl=dbug msg="Scheduler: network: eth0 has been added: updating network priorities" 
t=2020-12-23T09:43:03+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c1-net/onstop?project=default&target
=stop" username=root
t=2020-12-23T09:43:03+0000 lvl=dbug msg="Container initiated" action=stop created=2020-12-23T09:40:19+0000 ephemeral=false instance=c1-net in
stanceType=container project=default stateful=false used=2020-12-23T09:43:00+0000
t=2020-12-23T09:43:03+0000 lvl=dbug msg="Container stopped, cleaning up" instance=c1-net instanceType=container project=default
t=2020-12-23T09:43:03+0000 lvl=dbug msg="Stopping device" device=root instance=c1-net instanceType=container project=default type=disk
t=2020-12-23T09:43:03+0000 lvl=dbug msg="UnmountInstance started" driver=btrfs instance=c1-net pool=default project=default
t=2020-12-23T09:43:03+0000 lvl=dbug msg="UnmountInstance finished" driver=btrfs instance=c1-net pool=default project=default
t=2020-12-23T09:43:03+0000 lvl=info msg="Shut down container" action=stop created=2020-12-23T09:40:19+0000 ephemeral=false instance=c1-net in
stanceType=container project=default stateful=false used=2020-12-23T09:43:00+0000
t=2020-12-23T09:43:03+0000 lvl=dbug msg="Scheduler: container c1-net stopped: re-balancing"

And here is the log after lxc start c2-net (this also failed):

t=2020-12-23T09:44:30+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=mfpp                                [40/791]
t=2020-12-23T09:44:30+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/1.0/instances?recursion=2" username=mfpp                
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage started" driver=btrfs instance=ubu-rt pool=default project=default                 
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage started" driver=btrfs instance=ubu-netperf-1 pool=default project=default          
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage finished" driver=btrfs instance=ubu-rt pool=default project=default                
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage started" driver=btrfs instance=ubuntu-preempt pool=default project=default         
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage finished" driver=btrfs instance=ubu-netperf-1 pool=default project=default         
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage finished" driver=btrfs instance=ubuntu-preempt pool=default project=default        
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage started" driver=btrfs instance=c1-net pool=default project=default                 
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage started" driver=btrfs instance=c2-net pool=default project=default                 
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage finished" driver=btrfs instance=c1-net pool=default project=default                
t=2020-12-23T09:44:30+0000 lvl=dbug msg="GetInstanceUsage finished" driver=btrfs instance=c2-net pool=default project=default                
t=2020-12-23T09:44:36+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=mfpp                                        
t=2020-12-23T09:44:36+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/instances/c2-net username=mfpp                       
t=2020-12-23T09:44:36+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/events username=mfpp                                 
t=2020-12-23T09:44:36+0000 lvl=dbug msg="New event listener: 12cbb610-1468-41b1-af4b-432458ceb245"                                           
t=2020-12-23T09:44:36+0000 lvl=dbug msg=Handling ip=@ method=PUT protocol=unix url=/1.0/instances/c2-net/state username=mfpp                 
t=2020-12-23T09:44:36+0000 lvl=dbug msg="\n\t{\n\t\t\"action\": \"start\",\n\t\t\"timeout\": 0,\n\t\t\"force\": false,\n\t\t\"stateful\": fal
se\n\t}"                                                                                                                                     
t=2020-12-23T09:44:36+0000 lvl=dbug msg="New task Operation: e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b"                                           
t=2020-12-23T09:44:36+0000 lvl=dbug msg="Started task operation: e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b"                                       
t=2020-12-23T09:44:36+0000 lvl=dbug msg="\n\t{\n\t\t\"type\": \"async\",\n\t\t\"status\": \"Operation created\",\n\t\t\"status_code\": 100,\n
\t\t\"operation\": \"/1.0/operations/e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\":
 {\n\t\t\t\"id\": \"e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Starting container\",\n\t\t
\t\"created_at\": \"2020-12-23T09:44:36.340948318Z\",\n\t\t\t\"updated_at\": \"2020-12-23T09:44:36.340948318Z\",\n\t\t\t\"status\": \"Running
\",\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"containers\": [\n\t\t\t\t\t\"/1.0/containers/c2-net\"\n\t\t\t\t]\n\t\t\t}
,\n\t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"none\"\n\t\t}\n\t}" 
t=2020-12-23T09:44:36+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/operations/e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b usern
ame=mfpp
t=2020-12-23T09:44:36+0000 lvl=dbug msg="MountInstance started" driver=btrfs instance=c2-net pool=default project=default
t=2020-12-23T09:44:36+0000 lvl=dbug msg="MountInstance finished" driver=btrfs instance=c2-net pool=default project=default
t=2020-12-23T09:44:36+0000 lvl=dbug msg="Container idmap changed, remapping" instance=c2-net instanceType=container project=default
t=2020-12-23T09:44:36+0000 lvl=dbug msg="Updated metadata for task Operation: e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b" 
t=2020-12-23T09:44:38+0000 lvl=dbug msg="Updated metadata for task Operation: e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b" 
t=2020-12-23T09:44:38+0000 lvl=dbug msg="Starting device" device=eth0 instance=c2-net instanceType=container project=default type=nic
t=2020-12-23T09:44:38+0000 lvl=dbug msg="Scheduler: network: vethf96355a6 has been added: updating network priorities" 
t=2020-12-23T09:44:38+0000 lvl=dbug msg="Scheduler: network: veth0969abd0 has been added: updating network priorities" 
t=2020-12-23T09:44:38+0000 lvl=dbug msg="Starting device" device=eth1 instance=c2-net instanceType=container project=default type=nic [9/797]
t=2020-12-23T09:44:39+0000 lvl=dbug msg="Scheduler: network: eth0 has been added: updating network priorities"                               
t=2020-12-23T09:44:40+0000 lvl=dbug msg="Starting device" device=root instance=c2-net instanceType=container project=default type=disk
t=2020-12-23T09:44:40+0000 lvl=dbug msg="UpdateInstanceBackupFile started" driver=btrfs instance=c2-net pool=default project=default
t=2020-12-23T09:44:40+0000 lvl=dbug msg="UpdateInstanceBackupFile finished" driver=btrfs instance=c2-net pool=default project=default
t=2020-12-23T09:44:40+0000 lvl=info msg="Starting container" action=start created=2020-12-23T09:41:12+0000 ephemeral=false instance=c2-net in
stanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:44:40+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c2-net/onstart?project=default" user
name=root
t=2020-12-23T09:44:40+0000 lvl=dbug msg="Scheduler: container c2-net started: re-balancing" 
t=2020-12-23T09:44:41+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c2-net/onstopns?netns=%2Fproc%2F2246
313%2Ffd%2F4&project=default&target=stop" username=root
t=2020-12-23T09:44:41+0000 lvl=dbug msg="Stopping device" device=eth1 instance=c2-net instanceType=container project=default type=nic
t=2020-12-23T09:44:41+0000 lvl=eror msg="Failed to stop device" devName=eth1 err="Failed to detach interface: \"eth1\" to \"eth0\": Failed to
 run: /snap/lxd/current/bin/lxd forknet detach -- /proc/2246313/fd/4 2239372 eth1 eth0: Error: Failed to run: ip address flush dev eth1: Devi
ce \"eth1\" does not exist." instance=c2-net instanceType=container project=default
t=2020-12-23T09:44:41+0000 lvl=dbug msg="Stopping device" device=eth0 instance=c2-net instanceType=container project=default type=nic
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Clearing instance firewall static filters" dev=eth0 host_name=veth0969abd0 hwaddr=00:16:3e:b5:72:fd 
instance=c2-net ipv4=0.0.0.0 ipv6=:: parent=lxdbr0 project=default
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Clearing instance firewall dynamic filters" dev=eth0 host_name=veth0969abd0 hwaddr=00:16:3e:b5:72:fd
 instance=c2-net ipv4=<nil> ipv6=<nil> parent=lxdbr0 project=default
t=2020-12-23T09:44:42+0000 lvl=eror msg="Failed starting container" action=start created=2020-12-23T09:41:12+0000 ephemeral=false instance=c2
-net instanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Failure for task operation: e6e4f4ef-8c78-4af7-8705-8b259ad7ea8b: Failed to run: /snap/lxd/current/b
in/lxd forkstart c2-net /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c2-net/lxc.conf: " 
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Event listener finished: 12cbb610-1468-41b1-af4b-432458ceb245" 
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Disconnected event listener: 12cbb610-1468-41b1-af4b-432458ceb245" 
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Scheduler: network: eth0 has been added: updating network priorities" 
t=2020-12-23T09:44:42+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/internal/containers/c2-net/onstop?project=default&target
=stop" username=root
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Container initiated" action=stop created=2020-12-23T09:41:12+0000 ephemeral=false instance=c2-net in
stanceType=container project=default stateful=false used=2020-12-23T09:44:40+0000
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Container stopped, cleaning up" instance=c2-net instanceType=container project=default
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Stopping device" device=root instance=c2-net instanceType=container project=default type=disk
t=2020-12-23T09:44:42+0000 lvl=dbug msg="UnmountInstance started" driver=btrfs instance=c2-net pool=default project=default
t=2020-12-23T09:44:42+0000 lvl=dbug msg="UnmountInstance finished" driver=btrfs instance=c2-net pool=default project=default
t=2020-12-23T09:44:42+0000 lvl=info msg="Shut down container" action=stop created=2020-12-23T09:41:12+0000 ephemeral=false instance=c2-net in
stanceType=container project=default stateful=false used=2020-12-23T09:44:40+0000
t=2020-12-23T09:44:42+0000 lvl=dbug msg="Scheduler: container c2-net stopped: re-balancing" 

This network card can have 64 VFs per physical interface. I enabled only 16. Here are the names of the corresponding interfaces:

VF Device	Parent PF PCI BDF	Parent PF Description
=========	=================	=====================
eno2v1		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v13		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v14		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v2		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v4		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v5		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v6		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eno2v9		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth0		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth10		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth11		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth14		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth3		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth6		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth7		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T
eth9		0000:01:00.1		Intel Corporation Ethernet Controller 10G X550T

Thanks for that.

Looking at your original logs the line:

lxc another-container 20201222153825.355 ERROR    network - network.c:__instantiate_ns_common:882 - File exists - Failed to rename network device "vetheac53f63" to "eth0"

Seems to indicate an issue with the lxdbr0 eth0 interface, is it possible to try your sriov tests without the container also having an eth0 interface connecting to lxdbr0 (i.e move the sriov device to eth0).

The general process I follow is to try and simplify the config as much as possible to try and find the cause. There is nothing in the debug logs I can see that indicates an issue with allocating a VF SR-IOV device, however it maybe there is some naming conflict going on with the veth device perhaps.

Thanks! Ok, that seems to work much better. I created a profile which has by default only one sriov network device called “eth0”. I now sometimes get the following error when trying to start multiple containers consecutively:

Creating yet-another-container
Starting yet-another-container
Error: Failed preparing container for start: Failed to start device "eth0": Bind of interface "eno2v2" took too long
Try `lxc info --show-log local:yet-another-container` for more info
giu@server:~$ lxc info --show-log local:yet-another-container
Name: yet-another-container
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/12/23 15:56 UTC
Status: Stopped
Type: container
Profiles: vipac-sriov-net

Log:

lxc 20201223155714.537 TRACE    commands - commands.c:lxc_cmd:302 - Connection refused - Command "get_state" failed to connect command socket
lxc 20201223155714.539 TRACE    commands - commands.c:lxc_cmd:302 - Connection refused - Command "get_state" failed to connect command socket
lxc 20201223155714.539 TRACE    commands - commands.c:lxc_cmd:302 - Connection refused - Command "get_state" failed to connect command socket

And here is the daemon log:

t=2020-12-23T15:56:55+0000 lvl=dbug msg="Failure for task operation: 81f5563a-fcac-4256-af9e-487966178bd3: Failed preparing container for start: Failed to start device \"eth0\": Bind of interface \"eno2v2\" took too long" 
t=2020-12-23T15:56:55+0000 lvl=dbug msg="Event listener finished: 65b53964-4c84-4a30-b5b2-327c361106bf" 
t=2020-12-23T15:56:55+0000 lvl=dbug msg="Disconnected event listener: 65b53964-4c84-4a30-b5b2-327c361106bf" 
t=2020-12-23T15:57:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=mfpp
t=2020-12-23T15:57:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/instances/yet-another-container username=mfpp
t=2020-12-23T15:57:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/instances/yet-another-container/state username=mfpp
t=2020-12-23T15:57:14+0000 lvl=dbug msg="GetInstanceUsage started" driver=btrfs instance=yet-another-container pool=default project=default
t=2020-12-23T15:57:14+0000 lvl=dbug msg="GetInstanceUsage finished" driver=btrfs instance=yet-another-container pool=default project=default
t=2020-12-23T15:57:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/1.0/instances/yet-another-container/snapshots?recursion=1" username=giu
t=2020-12-23T15:57:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/instances/yet-another-container/logs/lxc.log username=giu

In this case, the container gets created but it doesn’t start. If I retry to start it manually, then it usually goes through.

Ok great to know, that gives me something to work with to try and recreate.

When the VF takes too long too bind is that when you start the containers at or near the same time as each other? If not how long apart does it still occur?

Sorry for the late reply, unfortunately this wasn’t a priority after Christmas. Anyways, in the meantime I have been doing some networking testing with docker, which doesn’t support SR-IOV by default so I had to rely on some scripts (https://github.com/jpetazzo/pipework) to move the VFs in the containers’ namespaces, and I found the following (un-)interesting fact which ended up (apparently) solving the issue also for LXD.

Basically, the NIC of the PF that I use for experiments is called eno2, and when I add new virtual functions some end up being named ethX and others eno2vX, but not always in the same way. Well, the ones named ethX always ended up causing all sort of issues because of the naming, so I wrote a script to rename them as eno2vX, so that the names of the NICs of all VFs would follow this pattern. And after that, all the problems mentioned above were gone. I didn’t go into the details, but if you need more info/details which might help (or you happen to know the reason behind that behavior as I am no networking expert), please let me know :smiley:

Glad to hear you figured it out. How did you rename them? Was something else perhaps renaming them as they came up?

I didn’t check what naming rules are in place on the server, but whenever I create them manually from the command line with something like:

echo <num_vfs> | sudo tee /sys/class/net/<PF_NIC>/device/sriov_numvs > /dev/null

they get automatically named as described above. To rename them, I just do, for example:

sudo ip link set eth1 name eno2v1

Again, I have no idea what convention is used to name them as they pop up. Any idea what to look for?

So LXD uses the contents of /sys/class/net/<PF Interface>/device/sriov_numvfs to indicate how many active VFs are available. It then iterates each one looking for a free VF interface by checking that a file name exists under /sys/class/net/<PF Interface>/device/virtfn<VF Index>/net/. This name is the expected name of the free VF interface.

Note: At this time LXD will check that the VF interface it finds is not referenced by any other NIC and skip it if it is, but does not at this time consider if the VF could be used by something outside of LXD. In the case of passing the VF into a VM or container it would be unbound from the host OS (or moved into container’s network namespace) and so wouldn’t appear in ``/sys/class/net//device/virtfn/net/` anyway, but for other uses LXD may try and use an in-use VF.

On my system, where enp1s0f0 is the PF and enp1s0f0v0 is the first VF interface :

ls /sys/class/net/enp1s0f0/device/virtfn0/net/
enp1s0f0v0

ip link show enp1s0f0v0
3: enp1s0f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:91:f5:91:c3:7c brd ff:ff:ff:ff:ff:ff

So it doesn’t really matter what the name of the interface is in /sys/class/net/enp1s0f0/device/virtfn0/net/ but it does expect it to match what actual interface is called when it is activated.

So if something is renaming some of the VF interfaces when they are activated, then this may explain why LXD is getting confused.

You can see this code here:

And the error you were seeing:

Is LXD looking for the expected interface name after it has unbound it from the host (to apply VF settings) and then rebound it. It looks for /sys/class/net/<VF interface name> existing for a few seconds. So worth looking after that error if the VF interface reported in the error does exist (in which case we probably just need to increase the wait time), but if it doesn’t exist or is called something else then it suggests something is renaming it.