Cannot start a copied container: Failed to change ACLs

I have a container which has been copied from another host (using lxc cp).
When I try to start it I get an error:

lxc start foo
Error: Common start logic: Failed to change ACLs on /var/snap/lxd/common/lxd/storage-pools/local/containers/foo/rootfs/var/log/journal
Try `lxc info --show-log foo` for more info

There appears to be no log messages:

Name: foo
Location: myhost
Remote: unix://
Architecture: x86_64
Created: 2020/05/29 08:07 UTC
Status: Stopped
Type: container
Profiles: default

Log:

Here is the container config:

$ lxc config show foo --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (daily) (20190529)
  image.label: daily
  image.os: ubuntu
  image.release: bionic
  image.serial: "20190529"
  image.version: "18.04"
  volatile.apply_template: copy
  volatile.base_image: 3c09483ccd69f33a4819532c103f482f219ae4591cc0d860dfb94193e97a2627
  volatile.eth0.hwaddr: 00:16:3e:f1:71:6a
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":5200},{"Isuid":true,"Isgid":false,"Hostid":5200,"Nsid":5200,"Maprange":200},{"Isuid":true,"Isgid":false,"Hostid":1005400,"Nsid":5400,"Maprange":999994600},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1045},{"Isuid":false,"Isgid":true,"Hostid":1045,"Nsid":1045,"Maprange":11},{"Isuid":false,"Isgid":true,"Hostid":1001056,"Nsid":1056,"Maprange":999998944}]'
devices:
  eth0:
    name: eth0
    network: lxdfan0
    type: nic
  root:
    path: /
    pool: local
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

It looks like the container files are own by the host root, not by the container root:

$ sudo ls -l  /mnt/lxd/containers/foo/rootfs/var/log/journal
total 0
drwxr-sr-x+ 1 root systemd-journal 10204 Mar 21 15:17 a3f8c2c4a763425b8733002e27ec8b79
$ sudo getfacl  /mnt/lxd/containers/foo/rootfs/var/log/journal
getfacl: Removing leading '/' from absolute path names
# file: mnt/lxd/containers/foo/rootfs/var/log/journal
# owner: root
# group: systemd-journal
# flags: -s-
user::rwx
group::r-x
group:adm:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:adm:r-x
default:mask::r-x
default:other::r-x

I mounted the local storage pool under /mnt/lxd/, it appears to be not mounted under /var/snap/lxd/common/lxd/storage-pools/local/. When configuring using lxd init I chose clustering and have used a blocked device for the local storage pool. Strangely, the source of the local pool is not shown:

$ lxc storage show local 
config:
  btrfs.mount_options: noatime,nodiratime,compress=lzo,user_subvol_rm_allowed
description: ""
name: local
driver: btrfs
used_by:
- /1.0/containers/foo
...
- /1.0/images/40775fd923e2a77f56ce3c028ce22ad43b9254bb12766b12eeeefb32a3a145da
- /1.0/profiles/default
status: Created
locations:
- myhost
- myotherhost

I have the latest version of lxd/lxc from snap installed on ubuntu 20.04:

$ lxd --version
4.1
$ lxc --version
4.1
$ uname -a
Linux myhost 5.4.0-33-generic #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I would be grateful for any hints for diagnosing this issue.

1 Like

It’s normal that you don’t see things mounted under /var/snap/lxd/common.
Snaps operate in a separate mount namespace so they don’t disturb the host.

You can inspect things through /var/snap/lxd/common/ns/var/snap/lxd/comon/lxd/storage-pools/... if you need to.

lxc storage show when clustered does not show host-specific options like source, for that you need to specify --target with the name of one of the cluster members.

As for the actual shifting issue, that’s most often caused by the number of ACL entries having increased to a point where we can’t add more. There was a bug causing that in much older LXD, so that may be the source of this issue.

Easiest workaround in this case would likely be to blow away that directory unless you expect to need access to older journal entries for this container.

Deleting the directory /var/log/journal of the container indeed did the trick! I expected that the problem shows up for other files but apparently it did not. Thanks for the suggestion!

I wonder why the ACL entries have to be added not changed? I can manually change the attributes for the user root and the group adm to the respective ids of the containers. In this case, lxc start complains about ACLs for a sub-directory of /var/log/journal. I guess, with some effort, I could manually change all ACLs but I was not sure.

Regarding lxc storage show, the source is indeed displayed with the --target option. I wonder why for lxc storage info the values of space used: and total space: are displayed without --target? Shouldn’t they be also host-specific?

Yeah, feels like used/total should be nuked when no target is specified, right now you’re likely to just get the value for whatever server answered your request.