Cannot launch containers with LXD 3.15 and LXC 3.2.1

Have been using lxd for a couple of weeks (had upgraded from using lxc) for testing purposes… but today, I can’t seem to start my containers anymore.

$ lxc start testing
Error: Failed to run: /usr/bin/lxd forkstart testing /var/lib/lxd/containers /var/log/lxd/testing/lxc.conf: 
Try `lxc info --show-log testing` for more info

The following log talks about not having any more space on device… I currently have 30Gb free:

$ lxc info --show-log local:testing
Name: testing
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/07/26 19:53 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc testing 20190726195337.101 ERROR    cgfsng - cgroups/cgfsng.c:__do_cgroup_enter:1500 - No space left on device - Failed to enter cgroup "/sys/fs/cgroup/cpuset//lxc.monitor/testing/cgroup.procs"
lxc testing 20190726195337.101 ERROR    start - start.c:__lxc_start:2009 - Failed to enter monitor cgroup
lxc testing 20190726195337.101 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:872 - Received container state "STOPPING" instead of "RUNNING"
lxc testing 20190726195337.133 WARN     cgfsng - cgroups/cgfsng.c:cgfsng_monitor_destroy:1180 - No space left on device - Failed to move monitor 3411 to "/sys/fs/cgroup/cpuset//lxc.pivot/cgroup.procs"

lxc 20190726195337.133 WARN     commands - commands.c:lxc_cmd_rsp_recv:134 - Connection reset by peer - Failed to receive response for command "get_state"

Testing config:

$ lxc config show testing --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 16.04 LTS amd64 (release) (20190628)
  image.label: release
  image.os: ubuntu
  image.release: xenial
  image.serial: "20190628"
  image.version: "16.04"
  volatile.base_image: 8b430b6d827140412a85a1f76f0fc76ebc42c3e1ca8d628cb90b12e9cef175c9
  volatile.eth0.hwaddr: 00:16:3e:b0:64:b4
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: STOPPED
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

I tried looking up different posts which I though had similar problems to my own and tried:

  • changing directory permissions to certain folders in /var/lib/lxd
  • setting and unsetting security.priviledged flag to specific containers

None of it worked… I tried the nuclear option and completely removed

  • /var/lib/lxd
  • /var/cache/lxd
  • /var/log/lxd

And restarted the machine, then re-initializing my environment.

$ sudo lxd init
[sudo] password for ffernand: 
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (dir, lvm) [default=dir]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 10.0.99.1/24
Would you like LXD to NAT IPv4 traffic on your bridge? [default=yes]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like LXD to be available over the network? (yes/no) [default=no]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 

But I still get the same error… I can’t seem to recover.

Using LXD 3.15 & Archlinux

$ uname -a
Linux theoden 5.2.2-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 21 19:18:34 UTC 2019 x86_64 GNU/Linux

After a number of failed attempts, I started scouring pacman logs and I failed to notice that LXC (not LXD) was upgraded to 3.2.1 by the package manager two days ago.

Downgrading to LXC 3.1.0 did the trick, I can now start my LXD containers without issue after a reboot.

And I can reliably reproduce the error by upgrading to LXC 3.2.1 and rebooting; restarting the LXD service is not sufficient to reproduce, a reboot was necessary.

LXD 3.15 doesn’t seem to work with LXC 3.2.1, at least on ArchLinux and a 5.2.2 kernel.

If anyone wants more log/diagnostic data to help diagnose why that’s the case, just let me know.

Because LXD is provided by ArchLinux AUR packaging, I thought maybe it was because LXD 3.15 was originally built against LXC 3.1.0 and that I just needed to rebuild LXD against LXC 3.2.1, but I verified that does not work either.

However, I did find someone with the same problem and their solution was to do the following (which I can verify works for me as well).

$ lxc start testing
<error text>

$ echo 0 | sudo tee /sys/fs/cgroup/cpuset//lxc.monitor/cpuset.cpus
$ echo 0 | sudo tee /sys/fs/cgroup/cpuset//lxc.pivot/cpuset.cpus
$ lxc start testing
<error text>

$ echo 0 | sudo tee /sys/fs/cgroup/cpuset/lxc.payload/cpuset.cpus
$ lxc start testing
<success>

The above container restarts are necessary… to reiterate

  • attempt to start the container
  • cpuset.cpus updated for lxc.pivot and lxc.monitor
  • attempt to start the container again
  • cpuset.cpus updated for lxc.payload
  • then start the container and succeed

For now, downgrading LXC to 3.1.0 is good enough for me.
Again, if anyone wants more log/diagnostic data to help diagnose, just let me know.

@brauner can you look into this?

Seems like a pretty clear regression between 3.1 and 3.2, though not something that’s been hitting the snap as far as I can tell (though we do have some cpuset config in there to avoid past issues).


should fix the issue. :slight_smile:

1 Like

I’ve removed the arch LXC distro package and build three versions of LXC from source.

  • 3.1.0
  • 3.2.1
  • brauner’s branch containing the fix

I can confirm that starting an LXD container worked against LXC 3.1.0, failed against LXC 3.2.1, and subsequently succeeded against brauner’s branch.

Much appreciated @brauner & @stgraber!