LXD 3.13 has been released

stgraber · May 9, 2019, 3:44am

Introduction

The LXD team is very excited to announce the release of LXD 3.13!

This is another very exciting LXD release, packed with useful features and a lot of bugfixes and performance improvements!

The latest addition to the LXD team, @tomp has been busy improving the LXD networking experience with quite a few new features and bugfixes already making it into this release.

We’ve also gotten all the plumbing needed for system call interception done and in place in this release, currently handling mknod on supported systems.

Cluster users will enjoy this release too, thanks to scaling improvements, reducing the load on the leader a bit and improving container copies and migration, especially on CEPH clusters.

Enterprise users will like the addition of Role Based Access Control through the external Canonical RBAC service, making it possible to control permissions to individual projects on your LXD servers and assign roles to your users and groups.

And we’ve even managed to get quotas working for the dir storage backend at last, thanks to the addition of filesystem project quotas in recent kernels.

Enjoy!

New features

Cluster: Improved heartbeat interval

In a LXD cluster, the current leader periodically sends a hearbeat to all other cluster members. The main purpose of this is to detect offline cluster members, marking them as offline in the database so that queries no longer block on them. A secondary use for those hearbeats is to refresh the list of database nodes.

Previously, this was done every 4s with all cluster members being contacted at the same time, resulting in spikes in CPU and network traffic, especially on the current cluster leader.

LXD 3.13 changes that by bumping the interval to 10s and by adding randomization to the timing of the hearbeats so that not all cluster members are contacted at the same time. Extra logic was also added to detect cluster members that get added during a hearbeat run.

Cluster Internal container copy

LXD 3.13 now properly implements one step container copies, similar to how you would normally copy a container on a standalone LXD instance. Prior to this, the client had to know whether to perform a copy (if staying on the same cluster member) or a migration (if going to another cluster member), this is now all done internally.

A side benefit of this fix is that all CEPH copies are now near instantaneous on clusters as those do not require any migration at all.

Initial syscall interception support

LXD 3.13 when combined with a 5.0 or higher kernel, as well as the very latest libseccomp and liblxc can now intercept and mediate system calls in userspace.

For this first pass, we have focused on mknod, implementing a basic allow list of devices which can now be created by unprivileged containers.

It will take a little while before this feature can be commonly used as we will need an upstream release of both libseccomp and liblxc and are waiting for further improvements to the feature in the kernel too.

We will be building upon this capability to allow specific filesystems to be mounted inside unprivileged containers in the future as well as allow things like kernel module loading and more (all will require opt-in from the administrator).

Role Based Access Control (RBAC)

Users of the Canonical RBAC service can now integrate LXD with it.

LXD will register all its projects with RBAC, allowing administrators to assign roles to users/groups for specific projects or for the entire LXD instance.

Currently this includes the following permissions:

Full administrative access to LXD
Management of containers (creation, deletion, re-configuration, …)
Operation of containers (start/stop/restart, exec, console, …)
Management of images (creation, deletion, aliases, …)
Management of profiles (creation, deletion, re-configuration, …)
Management of the project itself (re-configuration)
Read-only access (view everything tied to a project)

This gets us one step closer to being able to run a shared LXD cluster with unprivileged users being able to run containers on it without concerns of them escalating their privileges.

IPVLAN support

LXD can now make use of the recent implementation of ipvlan in LXC.
When running a suitably recent version of LXC, IPVLAN can now be configured in LXD through a nic device:

Setting the nictype property to ipvlan
Setting the parent property to the expected outgoing device
For IPv4, setting ipv4.address to the desired address
For IPv6, setting ipv6.address to the desired address

Here is an example of it in action:

stgraber@castiana:~$ lxc init ubuntu:18.04 ipvlan
Creating ipvlan
stgraber@castiana:~$ lxc config device add ipvlan eth0 nic nictype=ipvlan parent=wlan0 ipv4.address=172.17.0.100 ipv6.address=2001:470:b0f8:1000:1::100
Device eth0 added to ipvlan
stgraber@castiana:~$ lxc start ipvlan
stgraber@castiana:~$ lxc exec ipvlan bash
root@ipvlan:~# ifconfig 
eth0: flags=4291<UP,BROADCAST,RUNNING,NOARP,MULTICAST>  mtu 1500
        inet 172.17.0.100  netmask 255.255.255.255  broadcast 255.255.255.255
        inet6 2001:470:b0f8:1000:1::100  prefixlen 128  scopeid 0x0<global>
        inet6 fe80::28:f800:12b:bdf8  prefixlen 64  scopeid 0x20<link>
        ether 00:28:f8:2b:bd:f8  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@ipvlan:~# ip -4 route show
default dev eth0 

root@ipvlan:~# ip -6 route show
2001:470:b0f8:1000:1::100 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default dev eth0 metric 1024 pref medium

root@ipvlan:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=14.4 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 14.476/14.476/14.476/0.000 ms

root@ipvlan:~# ping6 -n 2607:f8b0:400b:800::2004
PING 2607:f8b0:400b:800::2004(2607:f8b0:400b:800::2004) 56 data bytes
64 bytes from 2607:f8b0:400b:800::2004: icmp_seq=1 ttl=57 time=21.2 ms
--- 2607:f8b0:400b:800::2004 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 21.245/21.245/21.245/0.000 ms
root@ipvlan:~#

Quota support on `dir` storage backend

Support for the project quota feature of recent Linux kernels has been added.

When the backing filesystem for a dir type storage pool is suitably configured, container quotas can now be set as with other storage backends and disk usage is also properly reported.

stgraber@castiana:~$ sudo truncate -s 10G /tmp/ext4.img
stgraber@castiana:~$ sudo mkfs.ext4 /tmp/ext4.img 
mke2fs 1.44.6 (5-Mar-2019)
Discarding device blocks: done                            
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: d8ab56d9-1e84-40ee-921a-c68c06ad6625
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done     
stgraber@castiana:~$ sudo tune2fs -O project -Q prjquota /tmp/ext4.img 
tune2fs 1.44.6 (5-Mar-2019)

stgraber@castiana:~$ sudo mount -o prjquota /tmp/ext4.img /mnt/
stgraber@castiana:~$ sudo rmdir /mnt/lost+found/
stgraber@castiana:~$ lxc storage create mnt dir source=/mnt
Storage pool mnt created

stgraber@castiana:~$ lxc launch ubuntu:18.04 c1 -s mnt
Creating c1
Starting c1
stgraber@castiana:~$ lxc exec c1 -- df -h
Filesystem                                           Size  Used Avail Use% Mounted on
/var/lib/lxd/storage-pools/mnt/containers/c1/rootfs  9.8G  742M  8.6G   8% /
none                                                 492K     0  492K   0% /dev
udev                                                 7.7G     0  7.7G   0% /dev/tty
tmpfs                                                100K     0  100K   0% /dev/lxd
tmpfs                                                100K     0  100K   0% /dev/.lxd-mounts
tmpfs                                                7.8G     0  7.8G   0% /dev/shm
tmpfs                                                7.8G  152K  7.8G   1% /run
tmpfs                                                5.0M     0  5.0M   0% /run/lock
tmpfs                                                7.8G     0  7.8G   0% /sys/fs/cgroup

stgraber@castiana:~$ lxc config device set c1 root size 1GB
stgraber@castiana:~$ lxc exec c1 -- df -h
Filesystem                                           Size  Used Avail Use% Mounted on
/var/lib/lxd/storage-pools/mnt/containers/c1/rootfs  954M  706M  249M  74% /
none                                                 492K     0  492K   0% /dev
udev                                                 7.7G     0  7.7G   0% /dev/tty
tmpfs                                                100K     0  100K   0% /dev/lxd
tmpfs                                                100K     0  100K   0% /dev/.lxd-mounts
tmpfs                                                7.8G     0  7.8G   0% /dev/shm
tmpfs                                                7.8G  152K  7.8G   1% /run
tmpfs                                                5.0M     0  5.0M   0% /run/lock
tmpfs                                                7.8G     0  7.8G   0% /sys/fs/cgroup

stgraber@castiana:~$ lxc info c1
Name: c1
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/05/09 16:09 UTC
Status: Running
Type: persistent
Profiles: default
Pid: 10096
Ips:
  eth0:	inet	10.166.11.38	vethKM0DFY
  eth0:	inet6	2001:470:b368:4242:216:3eff:fe4b:2c3	vethKM0DFY
  eth0:	inet6	fe80::216:3eff:fe4b:2c3	vethKM0DFY
  lo:	inet	127.0.0.1
  lo:	inet6	::1
Resources:
  Processes: 24
  Disk usage:
    root: 739.77MB
  CPU usage:
    CPU usage (in seconds): 7
  Memory usage:
    Memory (current): 104.91MB
    Memory (peak): 229.67MB
  Network usage:
    lo:
      Bytes received: 1.23kB
      Bytes sent: 1.23kB
      Packets received: 12
      Packets sent: 12
    eth0:
      Bytes received: 480.35kB
      Bytes sent: 27.21kB
      Packets received: 332
      Packets sent: 277

Routes on container NIC devices

New ipv4.routes and ipv6.routes options on the nic devices make it possible to tie a particular route to a specific container, making it follow the container as it’s moved between hosts.

This will usually be a better option than using the similarly named key on the network itself.

Configurable NAT source address

New ipv4.nat.address and ipv6.nat.address properties on LXD networks now make it possible to override the outgoing IP address for a particular bridge.

LXC features exported in API

Similar to what was done in the previous release with kernel features, specific LXC features which LXD can use when present are now exported by the LXD API so that clients can check what advanced feature to expect on the target.

  lxc_features:
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    seccomp_notify: "true"

Bugs fixed

client: Consider volumeOnly option when migrating
client: Copy volume config and description
client: Don’t crash on missing stdin
client: Fix copy from snapshot
client: Fix copying between two unix sockets
doc: Adds missing packages to install guide
doc: Correct host_name property
doc: Update storage documentation
i18n: Update translations from weblate
lxc/copy: Don’t strip volatile keys on refresh
lxc/utils: Updates progress to stop outputting if msg is longer than window
lxd/api: Rename alias* commands to imageAlias*
lxd/api: Rename apiProject* to project*
lxd/api: Rename certificateFingerprint* to certficate*
lxd/api: Rename operation functions for consistency
lxd/api: Rename serverResources to api10Resources
lxd/api: Rename snapshotHandler to containerSnapshotHandler
lxd/api: Replace Command with APIEndpoint
lxd/api: Sort API commands list
lxd/candid: Cleanup config handling
lxd/certificates: Make certificate add more robust
lxd/certificates: Port to APIEndpoint
lxd/cluster: Avoid panic in Gateway
lxd/cluster: Fix race condition during join
lxd/cluster: Port to APIEndpoint
lxd/cluster: Use current time for hearbeat
lxd/cluster: Workaround new raft logging
lxd/containers: Avoid costly storage calls during snapshot
lxd/containers: Change disable_ipv6=1 to accept_ra=0 on host side interface
lxd/containers: Don’t fail on old libseccomp
lxd/containers: Don’t needlessly mount snapshots
lxd/containers: Early check for running container refresh
lxd/containers: Fix bad operation type
lxd/containers: Fix profile snapshot settings
lxd/containers: Moves network limits to network up hook
lxd/containers: Only run network up hook for nics that need it
lxd/containers: Optimize snapshot retrieval
lxd/containers: Port to APIEndpoint
lxd/containers: Remove unused arg from network limits function
lxd/containers: Speed up simple snapshot list
lxd/daemon: Port to APIEndpoint
lxd: Don’t allow remote access to internal API
lxd: Fix volume migration with snapshots
lxd: Have Authenticate return the protocol
lxd: More reliably grab interface host name
lxd: Port from HasApiExtension to LXCFeatures
lxd: Rename parseAddr to proxyParseAddr
lxd: Use idmap.Equals
lxd/db: Fix substr handling for containers
lxd/db: Parent filter for ContainerList
lxd/db/profiles: Fix cross-project updates
lxd/db: Properly handle unsetting keys
lxd/event: Port to APIEndpoint
lxd/images: Fix project handling on copy
lxd/images: Fix simplestreams cache expiry
lxd/images: Port to APIEndpoint
lxd/images: Properly handle invalid protocols
lxd/images: Replicate images to the right project
lxd/internal: Port to APIEndpoint
lxd/migration: Fix feature negotiation
lxd/network: Filter leases by project
lxd/network: Fix DNS records for projects
lxd/network: Port to APIEndpoint
lxd/operation: Port to APIEndpoint
lxd/patches: Fix LVM VG name
lxd/profiles: Optimize container updates
lxd/profiles: Port to APIEndpoint
lxd/projects: Port to APIEndpoint
lxd/proxy: Correctly handle unix: path rewriting with empty bind=
lxd/proxy: Don’t wrap string literal
lxd/proxy: Fix goroutine leak
lxd/proxy: Handle mnts for abstract unix sockets
lxd/proxy: Make helpers static
lxd/proxy: Make logfile close on exec
lxd/proxy: Only attach to mntns for unix sockets
lxd/proxy: Retry epoll on EINTR
lxd/proxy: Use standard macros on exit
lxd/proxy: Validate the addresses
lxd/resource: Port to APIEndpoint
lxd/storage: Don’t hardcode default project
lxd/storage: Fix error message on differing maps
lxd/storage: Handle XFS with leftover journal entries
lxd/storage: Port to APIEndpoint
lxd/storage/btrfs: Don’t make ro snapshots when unpriv
lxd/storage/ceph: Don’t mix stderr with json
lxd/storage/ceph: Fix snapshot of running containers
lxd/storage/ceph: Fix snapshot of running xfs/btrfs
lxd/storage/ceph: Fix UUID re-generation
lxd/storage/ceph: Only rewrite UUID once
lxd/sys: Cleanup State struct
scripts/bash: Add bash completion for profile/container device get, set, unset
shared: Add StringMapHasStringKey helper function
shared: Fix $SNAP handling under new snappy
shared: Fix Windows build
shared/idmap: Add comparison function
shared/netutils: Adapt to kernel changes
shared/netutils: Add AbstractUnixReceiveFdData()
shared/netutils: Export peer link id in getifaddrs
shared/netutils: Handle SCM_CREDENTIALS when receiving fds
shared/netutils: Move network cgo to shared/netutils
shared/netutils: Move send/recv fd functions
shared/network: Fix reporting of down interfaces
shared/network: Get HostName field when possible
shared/osarch: Add i586 to arch aliases
tests: Extend migration tests
tests: Handle built-in shiftfs
tests: Updates config tests to use host_name for nic tests

Try it for yourself

This new LXD release is already available for you to try on our demo service.

Downloads

The release tarballs can be found on our download page.

simos · May 9, 2019, 7:31pm

Great news!

I had a look at the network_ipvlan LXC feature. In LXD 3.13 (channel: candidate),

$ lxc info 
...
  kernel_version: 4.15.0-48-generic
  lxc_features:
    mount_injection_file: "true"
    network_gateway_device_route: "false"
    network_ipvlan: "false"
    network_l2proxy: "false"
    seccomp_notify: "false"
...
$

It is a feature that it is not enabled in my case. Is it user-configurable?

https://github.com/lxc/lxd/blob/master/lxd/daemon.go#L580

It appears it is not user-configurable. As mentioned in the announcement, it relates to the version of liblxc that is bundled in the snap package.

The master branch of liblxc knows about network_ipvlan:

github.com

lxc/lxc/blob/main/src/lxc/api_extensions.h#L48


      
          	"network_veth_router",
          	"cgroup2_devices",
          	"cgroup2",
          	"pidfd",
          	"cgroup_advanced_isolation",
          	"network_bridge_vlan",
          	"time_namespace",
          	"seccomp_allow_deny_syntax",
          	"devpts_fd",
          #ifdef HAVE_DECL_SECCOMP_NOTIFY_FD
          	"seccomp_notify_fd_active",
          	"seccomp_proxy_send_notify_fd",
          #endif /* HAVE_DECL_SECCOMP_NOTIFY_FD */
          	"idmapped_mounts",
          	"idmapped_mounts_v2",
          	"core_scheduling",
          	"cgroup2_auto_mounting",
          };
          
          static size_t nr_api_extensions = sizeof(api_extensions) / sizeof(*api_extensions);

But what version of liblxc does the 3.13 LXD snap package have?

$ snap run --shell lxd
bash-4.3$ ls -l /snap/lxd/current/lib/liblxc*
lrwxrwxrwx 1 0 0      15 May  9 17:12 /snap/lxd/current/lib/liblxc.so.1 -> liblxc.so.1.5.0
-rwxr-xr-x 1 0 0 1068656 May  9 17:29 /snap/lxd/current/lib/liblxc.so.1.5.0
-rwxr-xr-x 1 0 0   80688 May  9 17:29 /snap/lxd/current/lib/liblxcfs.so
bash-4.3$

It mentions it is a liblxc 1.5.0 version. There is no string network_ipvlan in liblxc.so.1.5.0.
How does this library relate to the repository GitHub - lxc/lxc: LXC - Linux Containers ? I could not find a relevant 1.5.0 tag in the source code repository.

I have used the snap package from the candidate channel, which bundles the following lxclxc version, tag: lxc-3.1.0. It is not recent enough to have the network_ipvlan goodness.

github.com

canonical/lxd-pkg-snap/blob/latest-candidate/snapcraft.yaml#L401


      
                    -DNETWORK_HTTP_BOOT_ENABLE=TRUE \
                    -DTPM2_ENABLE=TRUE \
                    -DTPM2_CONFIG_ENABLE=TRUE \
                    $@
          EOF
              ) | bash -e
          
              cp Build/*/${TARGET_BUILD_TYPE}*/FV/${FV_CODE}.fd "${TARGET_CODE}"
              cp Build/*/${TARGET_BUILD_TYPE}*/FV/${FV_VARS}.fd "${TARGET_VARS}"
          
              if [ "$(uname -m)" = "aarch64" ]; then
                  truncate -s 64m "${TARGET_CODE}"
                  truncate -s 64m "${TARGET_VARS}"
              fi
          }
          
          # Create the firmware path
          mkdir -p "${CRAFT_PART_INSTALL}/share/qemu/"
          
          # Primary firmware (4MB, no CSM)
          build_edk2 \

But what snap package channel is recent enough to have the git version of liblxc? Is it edge?

github.com

canonical/lxd-pkg-snap/blob/latest-edge/snapcraft.yaml#L399


      
                    -DSMM_REQUIRE=FALSE \
                    -DSECURE_BOOT_ENABLE=TRUE \
                    -DNETWORK_IP4_ENABLE=TRUE \
                    -DNETWORK_IP6_ENABLE=TRUE \
                    -DNETWORK_TLS_ENABLE=TRUE \
                    -DNETWORK_HTTP_BOOT_ENABLE=TRUE \
                    -DTPM2_ENABLE=TRUE \
                    -DTPM2_CONFIG_ENABLE=TRUE \
                    $@
          EOF
              ) | bash -e
          
              cp Build/*/${TARGET_BUILD_TYPE}*/FV/${FV_CODE}.fd "${TARGET_CODE}"
              cp Build/*/${TARGET_BUILD_TYPE}*/FV/${FV_VARS}.fd "${TARGET_VARS}"
          
              if [ "$(uname -m)" = "aarch64" ]; then
                  truncate -s 64m "${TARGET_CODE}"
                  truncate -s 64m "${TARGET_VARS}"
              fi
          }

It is edge. Does edge have a recent enough LXD that includes IPVLAN support?

$ snap info lxd
...
channels:
  stable:        3.12        2019-04-16 (10601) 56MB -
  candidate:     3.13        2019-05-09 (10732) 56MB -
  beta:          ↑                                   
  edge:          git-566ee20 2019-05-09 (10738) 56MB -
...

There is a commit, 566ee20. Is that recent enough to have IPVLAN support in LXD? Here are the commits, Commits · lxc/incus · GitHub and discourse does not show a good snapshot of the page.

So, we could switch to the edge snap channel of LXD and experience the full goodness of the new features. But that would be utterly inappropriate for the stability of the system. Because, would things break in LXD if you switch forward and back between the stable and edge channels? The LXD version is almost the same, but liblxc differs quite a bit; almost six months of changes to the code.

So, what do we do? You know what we do, but let’s capture first the error message when you try IPVLAN on a liblxc that it is not recent enough. They say it is good for SEO.

$ lxc launch ubuntu:18.04 mycontainer --profile default --profile ipvlan
Creating mycontainer
Error: Failed container creation: Create container: Create LXC container: Initialize LXC: LXC is missing one or more API extensions: network_ipvlan, network_l2proxy, network_gateway_device_route

Let’s switch the snap package of LXD to the edge channel.

$ snap switch --channel edge lxd
"lxd" switched to the "edge" channel
$ snap refresh
lxd (edge) git-566ee20 from Canonical✓ refreshed

Will it work now?

$ lxc launch ubuntu:18.04 mycontainer --profile default --profile ipvlan
Creating mycontainer
Starting mycontainer
$ lxc list mycontainer
+-------------+---------+------+------+------------+-----------+
|    NAME     |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+-------------+---------+------+------+------------+-----------+
| mycontainer | RUNNING |      |      | PERSISTENT | 0         |
+-------------+---------+------+------+------------+-----------+

No IP address from the LAN. What went wrong? Isn’t IPVLAN supposed to let the container get the IP address automatically from the LAN? Probably not, considering that it is Layer 3 (not Layer 2 that macvlan is). Scratch that then, we start over again.

To cut this short, you need to tell LXD (ipv4.address=...) the IP address for the container. Then, LXD will be able to set up what is needed. And you need to instruct the container of the DNS server settings because without DNS, cloud-init takes time to complete the bootup sequence (and create the ubuntu account).

In a nutshell,

You need to get LXD to setup the IP address for the container, because that’s the way IPVLAN works.
You do not get a DNS server autoconfigured, so you need to configure it in some way, such as with cloud-init from a LXD profile.
You do not need to (cannot?) add a default route. LXD/ipvlan does that for you. See below how the default route looks like.

ubuntu@mycontainer:~$ route 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         0.0.0.0         0.0.0.0         U     0      0        0 eth0
ubuntu@mycontainer:~$ ping -c 1 www.google.com
PING www.google.com (216.58.198.4) 56(84) bytes of data.
64 bytes from mil04s03-in-f4.1e100.net (216.58.198.4): icmp_seq=1 ttl=54 time=76.2 ms

--- www.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 76.275/76.275/76.275/0.000 ms
ubuntu@mycontainer:~$

stgraber · May 9, 2019, 8:05pm

@simos correct, the LXC version in the stable snap is 3.1 (as lxc info should show as driver_version), you need current master to get IPVLAN which the LXD edge snap has.

And indeed, IPVLAN doesn’t have a default gateway in the normal sense of the term so that’s configured for you, DNS is up to you to configure though, you should be able to use network-config (netplan) to configure that, possibly by adding your DNS config to the loopback device so that netplan doesn’t attempt to mess the ipvlan device?

simos · May 10, 2019, 7:47pm

I tried with network-config on eth0 to set the nameserver and it worked well.

As is, I use a LXD profile per ipvlan container, because each profile needs to specify a unique LAN IP addresses.

I noticed that the ipvlan container cannot communicate with the host, which follows the case with macvlan.

tomp · May 11, 2019, 8:05am

Good to hear simos.

You can also use a shared profile and add individual ipvlan NICs to a container.

Yes ipvlan like macvlan stops containers and the host communicating. You can add an ipvlan interface to host though to overcome this I believe.

saulcosta · June 9, 2019, 11:10am

Have the changes to the network limits affected how limits are applied to individual containers? It appears our limits have suddenly stopped working, so we’re trying to determine the root cause. Currently we apply them by overwriting the devices config for the container.

tomp · June 10, 2019, 7:52am

Can you give an example of your config, and a bit more info on what you think isn’t working or has changed?

The network limits shouldn’t have changed, there were some tests added as part of 3.13 to ensure behaviour remained consistent:

github.com

lxc/lxd/blob/lxd-3.13/test/suites/container_devices.sh

test_container_devices_nic() {
  ensure_import_testimage
  ensure_has_localhost_remote "${LXD_ADDR}"

  veth_host_name="veth$$"
  ct_name="nictest$$"
  ipRand=$(shuf -i 0-9 -n 1)
  brName="lxdt$$"

  # Standard bridge with random subnet and a bunch of options
  lxc network create ${brName}
  lxc network set lxdt$$ dns.mode dynamic
  lxc network set lxdt$$ dns.domain blah
  lxc network set lxdt$$ ipv4.routing false
  lxc network set lxdt$$ ipv6.routing false
  lxc network set lxdt$$ ipv6.dhcp.stateful true
  lxc network set lxdt$$ bridge.hwaddr 00:11:22:33:44:55
  [ "$(cat /sys/class/net/lxdt$$/address)" = "00:11:22:33:44:55" ]

  # Test pre-launch profile config is applied at launch.

This file has been truncated. show original

saulcosta · June 10, 2019, 11:10pm

Sure thing!

We currently update the devices value of the LXD container. The Ruby code looks something like this (we set the network and disk limits here):

# config = get the LXD container's config using the LXD API
config.devices = {
  eth0: {
    nictype: :bridged,
    parent: :lxdbr0,
    type: :nic,
    'limits.ingress' => '25Mbit',
    'limits.egress' => '5Mbit'
  },
  root: {
    path: '/',
    pool: 'default',
    size: '2048MB',
    type: :disk,
    'limits.read' => '10MB',
    'limits.write' => '10MB'
  }
}
# save the LXD container's config using the LXD API

This allows us to set dynamic network and disk limits based on the instance size we’ve selected. Previously both were working, but now only the disk one is.

Now, we’re only able to set it on the profile level, and only if the profile has that device, which is just our default profile. We do it by running this on the host machine:

lxc profile device set default eth0 limits.ingress 50000000

Note that the 50000000 could also be 50Mbit, but for some reason, this also didn’t appear to be working. We haven’t gone back and tried to reproduce that yet, however.

The instance launches with the eth0 interface already attached thanks to the default profile, so ideally we can continue to alter the limits for that device once it’s already been attached.

tomp · June 11, 2019, 8:23am

You’re correct that you can only set limits on the device at the profile level when the device is being added to the container as part of the profile. If you add a standalone device to a container then you can specify limits on a per-container basis.

I’m not understanding what the issue is that you’re having though, do you get an error compared to pre-3.13? Or is it that the limits do not apply?

Thanks
Tom

saulcosta · June 11, 2019, 4:15pm

Ah, thanks for clarifying.

It’s that the limits do not apply. I know for sure they used to because we tested it thoroughly when we added them originally, but realized just this week they were no longer being used. I saw some changes to network limits mentioned in the 3.13 changelog which is why I brought it up here.

It sounds like our best bet would be to either keep applying more general limits on the default profile or remove the eth0 device from the default profile and add it on a per container basis.

I wonder if perhaps previously the default profile didn’t have the eth0 device added to it. If that’s the case, the code above would have added it on a per container basis, where it sounds like it would have included the limit.

Thanks for your help! I’ll do a bit more experimentation now that I know limits can only be applied when the device is originally attached.

stgraber · June 11, 2019, 4:31pm

From what I understand, @saulcosta has the eth0 device in his default profile with some initial limits, those limits then get overriden on a per-container basis by adding a eth0 device local to the container.

In this case I would certainly expect the limit defined on the eth0 device on the container to be the effective traffic limit.

There shouldn’t be any need for @saulcosta to alter his default profile here, adding a eth0 device directly to the container should take precedence and immediately change the limit to the running container.

Am I missing something?

tomp · June 11, 2019, 4:41pm

Yes that should be fine, there is a test for that scenario here:

github.com

lxc/lxd/blob/lxd-3.13/test/suites/container_devices.sh#L45-L70


# Test hot plugging a container nic with different settings to profile with the same name.
lxc config device add "${ct_name}" eth0 nic \
  nictype=bridged \
  parent=${brName} \
  ipv4.routes="192.0.2.2${ipRand}/32" \
  ipv6.routes="2001:db8::2${ipRand}/128" \
  limits.ingress=3Mbit \
  limits.egress=4Mbit \
  host_name="${veth_host_name}"


if ! ip -4 r list dev "${veth_host_name}" | grep "192.0.2.2${ipRand}" ; then
  echo "ipv4.routes invalid"
  false
fi
if ! ip -6 r list dev "${veth_host_name}" | grep "2001:db8::2${ipRand}" ; then
  echo "ipv4.routes invalid"
  false
fi
if ! tc class show dev "${veth_host_name}" | grep "3Mbit" ; then
  echo "limits.ingress invalid"

This file has been truncated. show original

tomp · June 11, 2019, 4:50pm

@stgraber just checked doing an non-hotplug variant too and it appears to work fine. tc shows the limits applied.

saulcosta · June 11, 2019, 5:33pm

@stgraber that was my expectation as well and how it previously seemed to be performing. Here’re some more details on the configuration we currently have.

The default profile (applicable part):

devices:                                                   
  eth0:                                                    
    limits.egress: 20Mbit                                  
    limits.ingress: 50Mbit                                 
    nictype: bridged                                       
    parent: lxdbr0                                         
    type: nic                                              
  root:                                                    
    path: /                                                
    pool: default                                          
    type: disk

And here are the starting devices for the container, which does have the default profile:

root:
  limits.read: 10MB
  limits.write: 10MB
  path: /
  pool: default
  size: 354MB
  type: disk

When running a speedtest in that container, I get:

workspace $ speedtest
Retrieving speedtest.net configuration...
Testing from Google Cloud (34.66.108.68)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Kansas Research and Education Network (Wichita, KS) [43.14 km]: 34.618 ms
Testing download speed................................................................................
Download: 45.26 Mbit/s
Testing upload speed................................................................................................
Upload: 19.99 Mbit/s

This makes sense, given the limits on the default profile.

I then can apply the network limits using the approach in the Ruby code above, which does apply this config to the container:

eth0:
  limits.egress: 15Mbit
  limits.ingress: 25Mbit
  nictype: bridged
  parent: lxdbr0
  type: nic
root:
  limits.read: 10MB
  limits.write: 10MB
  path: /
  pool: default
  size: 361MB
  type: disk

However, the speedtest still uses the limits from the default profile:

workspace $ speedtest
Retrieving speedtest.net configuration...
Testing from Google Cloud (34.66.108.68)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Cox - Wichita (Wichita, KS) [43.14 km]: 26.761 ms
Testing download speed................................................................................
Download: 43.89 Mbit/s
Testing upload speed................................................................................................
Upload: 20.06 Mbit/s

speedtest is a utility that can be installed with sudo pip install speedtest-cli.

tomp · June 11, 2019, 5:44pm

Can you apply a container level limit and then check the settings are applied:

 sudo tc class show dev $(lxc config get test1 volatile.eth0.host_name)

saulcosta · June 11, 2019, 6:27pm

I couldn’t get that command to work with my container name, but here’s the only output from the config for that container that includes volatile.eth0:

volatile.eth0.hwaddr: 00:16:3e:0c:92:f9
volatile.eth0.name: eth0

tomp · June 11, 2019, 6:58pm

Can you try running ip link on the host and look for the parent interfaces starting “veth”.

If you run sudo tc class show dev X where X is each parent veth device name before the “@” sign, then you should be able to see which interfaces have what limits applied.

We need to ascertain whether the issue is with limits settings not being applied to the OS or whether the OS isn’t restricting them.

tomp · June 11, 2019, 7:04pm

For testing purposes, if you set a host_name property on one of your containers then you’ll be more able to identify it’s peer on the host.

lxd config device set ct1 eth0 host_name myct1

sudo tc class show dev myct

saulcosta · June 11, 2019, 7:24pm

Looks like the limit is indeed being applied. sudo tc class show dev myct1 returns:

class htb 1:10 root prio rate 25Mbit ceil 25Mbit burst 1600b cburst 1600b

LXD 3.13 has been released

Introduction

New features

Cluster: Improved heartbeat interval

Cluster Internal container copy

Initial syscall interception support

Role Based Access Control (RBAC)

IPVLAN support

Quota support on dir storage backend

Routes on container NIC devices

Configurable NAT source address

LXC features exported in API

Bugs fixed

Try it for yourself

Downloads

Quota support on `dir` storage backend