I’d like to connect an LXD container to Open vSwitch using a LACP bond, but haven’t been able to make it work. I’m using Ubuntu 20.04 with LXD v4.0.8 and Open vSwitch 2.13.3.
First, why do I want to do this? I’m putting together a simulation of the core network of an ISP, to test software and config changes before they go into production. The closer the simulation is to reality, the more I can test and the easier it is to keep the configs in sync. So I’d like to use a bond, even though that wouldn’t ordinarily be necessary in an all virtual setup.
Anyway, I have no problem creating a switch with Open vSwitch and starting a container with multiple interfaces connected to the switch. I just use ovs-vsctl
to add a virtual switch (bridge in Open vSwitch parlance), for example ovs-vsctl add-br switch1
, and add devices to the container config:
$ lxc config show gw1
...
devices:
eno1:
host_name: gw1-eno1
name: eno1
nictype: bridged
parent: switch1
type: nic
eno2:
host_name: gw1-eno2
name: eno2
nictype: bridged
parent: switch1
type: nic
...
When the container is started, the ports and interfaces are automatically added to switch1 as expected:
$ sudo ovs-vsctl show
eb23be7d-1882-4a2d-8fa0-6eb2e4c01e58
Bridge switch1
Port gw1-eno2
Interface gw1-eno2
Port switch1
Interface switch1
type: internal
Port gw1-eno1
Interface gw1-eno1
ovs_version: "2.13.3"
It’s also straightforward enough to create a bond from within the Linux container with eno1 and eno2 as members, e.g. using the ifupdown syntax in /etc/network/interfaces:
auto eno1
iface eno1 inet manual
bond-master bond0
bond-mode 4
auto eno2
iface eno2 inet manual
bond-master bond0
bond-mode 4
auto bond0
iface bond0 inet manual
bond-mode 4
bond-miimon 100
bond_downdelay 200
bond_updelay 200
bond-lacp-rate 1
bond-slaves none
Now, to create a bond in OVS, I must add at least two interfaces to a port. Without changing the above config I might try to do this with the existing ports and interfaces the container created, but that doesn’t work when the ports and interfaces already exist:
$ sudo ovs-vsctl add-bond switch1 gw1-bond0 gw1-eno1 gw1-eno2
ovs-vsctl: cannot create an interface named gw1-eno1 because a port named gw1-eno1 already exists on bridge switch1
Nor can you execute the command above while the container is stopped to first prepare the switch, then start the container. That results in:
$ sudo ovs-vsctl add-bond switch1 gw1-bond0 gw1-eno1 gw1-eno2
ovs-vsctl: Error detected while setting up 'gw1-eno1': could not open network device gw1-eno1 (No such device). See ovs-vswitchd log for details.
ovs-vsctl: Error detected while setting up 'gw1-eno2': could not open network device gw1-eno2 (No such device). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
$ lxc start gw1
Error: Failed preparing container for start: Failed to start device "eno1": Failed to run: ovs-vsctl --may-exist add-port switch1 gw1-eno1: ovs-vsctl: cannot create a port named gw1-eno1 because an interface named gw1-eno1 already exists on bridge switch1
Try `lxc info --show-log gw1` for more info
So next, I tried another strategy. It’s possible to add interfaces of type internal to OVS. This creates a virtual device of type openvswitch, which can then be declared as a physical device in the container. For example:
$ sudo ovs-vsctl add-port switch1 gw1-eno1 -- set interface gw1-eno1 type=internal
$ lcx config show gw1
...
devices:
eno1:
name: eno1
nictype: physical
parent: gw1-eno1
type: nic
...
This seems to work for single devices. When the container is started, it disappears from the default namespace as expected when the container takes ownership of it, and reappears when it the container is stopped.
Following this, I thought I could create a bond port in OVS with two internal interfaces, and declare both those interfaces as physical in the container. However, there is inconsistent, seemingly random behavior each time the container is started and it never actually works, for example:
$ sudo ovs-vsctl add-bond switch1 gw1-bond0 gw1-eno2 gw1-eno3 -- set interface gw1-eno2 type=internal -- set interface gw1-eno3 type=internal
$ lxc config show gw1
...
devices:
eno1:
name: eno1
nictype: physical
parent: gw1-eno1
type: nic
eno2:
name: eno2
nictype: physical
parent: gw1-eno2
type: nic
eno3:
name: eno3
nictype: physical
parent: gw1-eno3
type: nic
...
$ lxc start gw1 # succeeds, but device still in default namespace and not present in container
$ ip link
...
60: gw1-eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ba:d7:64:72:45:71 brd ff:ff:ff:ff:ff:ff
62: gw1-eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 62:fd:17:77:e9:e2 brd ff:ff:ff:ff:ff:ff
$ lxc stop gw1
$ lxc start gw1
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart gw1 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/gw1/lxc.conf:
Try `lxc info --show-log gw1` for more info
$ lxc info --show-log gw1
Name: gw1
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/12/12 14:13 CET
Status: Stopped
Type: container
Profiles: default
Log:
lxc gw1 20220127094709.544 WARN conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc gw1 20220127094709.544 WARN conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc gw1 20220127094709.544 WARN conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc gw1 20220127094709.544 WARN conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc gw1 20220127094709.544 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc gw1 20220127094709.638 ERROR network - network.c:lxc_network_setup_in_child_namespaces_common:3866 - Invalid argument - Failed to set network device "eno2" up
lxc gw1 20220127094709.638 ERROR network - network.c:lxc_setup_network_in_child_namespaces:4003 - Invalid argument - Failed to setup netdev
lxc gw1 20220127094709.638 ERROR conf - conf.c:lxc_setup:4338 - Failed to setup network
lxc gw1 20220127094709.638 ERROR start - start.c:do_start:1275 - Failed to setup container "gw1"
lxc gw1 20220127094709.638 ERROR sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc gw1 20220127094709.643 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eno3" to its initial name "gw1-eno3"
lxc gw1 20220127094709.643 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:867 - Received container state "ABORTING" instead of "RUNNING"
lxc gw1 20220127094709.644 ERROR start - start.c:__lxc_start:2074 - Failed to spawn container "gw1"
lxc gw1 20220127094709.644 WARN start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 41 for process 6683
lxc gw1 20220127094714.670 WARN conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc gw1 20220127094714.670 WARN conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc 20220127094714.691 ERROR af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220127094714.691 ERROR commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors
$ lxc start gw1
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart gw1 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/gw1/lxc.conf:
Try `lxc info --show-log gw1` for more info
$ lxc info --show-log gw1
Name: gw1
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/12/12 14:13 CET
Status: Stopped
Type: container
Profiles: default
Log:
lxc gw1 20220127095246.351 WARN conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc gw1 20220127095246.351 WARN conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc gw1 20220127095246.352 WARN conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc gw1 20220127095246.352 WARN conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc gw1 20220127095246.352 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc gw1 20220127095246.431 ERROR network - network.c:__netdev_configure_container_common:1275 - No such device - Failed to retrieve ifindex for network device with name physvgsM2g
lxc gw1 20220127095246.431 ERROR network - network.c:lxc_setup_network_in_child_namespaces:4003 - No such device - Failed to setup netdev
lxc gw1 20220127095246.431 ERROR conf - conf.c:lxc_setup:4338 - Failed to setup network
lxc gw1 20220127095246.431 ERROR start - start.c:do_start:1275 - Failed to setup container "gw1"
lxc gw1 20220127095246.431 ERROR sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc gw1 20220127095246.436 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eno3" to its initial name "gw1-eno3"
lxc gw1 20220127095246.436 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:867 - Received container state "ABORTING" instead of "RUNNING"
lxc gw1 20220127095246.436 ERROR start - start.c:__lxc_start:2074 - Failed to spawn container "gw1"
lxc gw1 20220127095246.436 WARN start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 41 for process 6871
lxc gw1 20220127095251.465 WARN conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc gw1 20220127095251.465 WARN conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc 20220127095251.480 ERROR af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220127095251.480 ERROR commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors
Is there another way to make bond interfaces work with Open vSwitch?