Lxc moved container fails to start with ceph driver. lxd 3.0.3 clustered configuration

Hello,

Sorry post got kinda messy.

I have test ceph cluster and doing some testing with lxd.

before I purged lxd and lxcfs ( I did “apt install lxd” “apt purge lxd lxcfs” and ceph many times during my testing)

I see there is some error line with br0, or whatever but I use same configuration on other hosts and it works (dont know if it matters)

  1. created 3 node test ceph cluster.
  2. created lxd cluster with two instances with basic settings… using br0 bridge.
  3. lxc launch ubuntu
  4. lxc start advanced-ghoul(love these names :slight_smile: ) works fine.
  5. move new container to other LXD instance paralel-linux.
    lxc stop advanced-ghoul;
    lxc move advanced ghoul --target paralel-linux;
    lxc start advanced ghoul.
    And I get error:
    `Error: Failed to run: /usr/lib/lxd/lxd forkstart advanced-ghoul /var/lib/lxd/containers /var/log/lxd/advanced-ghoul/lxc.conf:

root@paralel-linux:/# cat /var/log/lxd/advanced-ghoul/lxc.conf
lxc.log.file = /var/log/lxd/advanced-ghoul/lxc.log
lxc.log.level = warn
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/log/lxd/advanced-ghoul/console.log
lxc.mount.auto = proc:rw sys:rw
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional
lxc.mount.entry = /sys/firmware/efi/efivars sys/firmware/efi/efivars none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.arch = linux64
lxc.hook.pre-start = /usr/lib/lxd/lxd callhook /var/lib/lxd 1 start
lxc.hook.post-stop = /usr/lib/lxd/lxd callhook /var/lib/lxd 1 stop
lxc.tty.max = 0
lxc.uts.name = advanced-ghoul
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.apparmor.profile = lxd-advanced-ghoul_</var/lib/lxd>//&:lxd-advanced-ghoul_:
lxc.seccomp.profile = /var/lib/lxd/security/seccomp/advanced-ghoul
lxc.idmap = u 0 165536 65536
lxc.idmap = g 0 165536 65536
lxc.rootfs.path = dir:/var/lib/lxd/containers/advanced-ghoul/rootfs
lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br0
lxc.net.0.hwaddr = 00:16:3e:3e:a9:b2
lxc.net.0.name = eth0
lxc.mount.entry = /var/lib/lxd/shmounts/advanced-ghoul dev/.lxd-mounts none bind,create=dir 0 0

Try lxc info --show-log advanced-ghoul for more info

lxc advanced-ghoul 20190416140309.246 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [165536-231072) not allowed”: newuidmap 176631 0 165536 65536
lxc advanced-ghoul 20190416140309.246 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.
lxc advanced-ghoul 20190416140309.321 WARN network - network.c:lxc_delete_network_priv:2613 - Invalid argument - Failed to remove interface “vethITFA9N” from “br0”
lxc advanced-ghoul 20190416140309.321 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state “ABORTING” instead of “RUNNING”
lxc advanced-ghoul 20190416140309.322 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container “advanced-ghoul”
lxc advanced-ghoul 20190416140309.326 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [165536-231072) not allowed”: newuidmap 176646 0 165536 65536 65536 0 1
lxc advanced-ghoul 20190416140309.326 ERROR conf - conf.c:userns_exec_1:4352 - Error setting up {g,u}id mappings for child process “176646”
lxc advanced-ghoul 20190416140309.327 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1122 - Failed to destroy cgroups
lxc 20190416140309.327 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”`

This works in reverse. Container works if I create it on node2 first and wont if I move it back to node1

lxc intimate-rattler 20190416141831.226 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [100000-165536) not allowed”: newuidmap 34224 0 100000 65536
lxc intimate-rattler 20190416141831.226 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.
lxc intimate-rattler 20190416141831.744 WARN network - network.c:lxc_delete_network_priv:2613 - Invalid argument - Failed to remove interface “veth8GSNBD” from “br0”
lxc intimate-rattler 20190416141831.745 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state “ABORTING” instead of “RUNNING”
lxc intimate-rattler 20190416141831.750 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container “intimate-rattler”
lxc intimate-rattler 20190416141831.787 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [100000-165536) not allowed”: newuidmap 34243 0 100000 65536 65536 0 1
lxc intimate-rattler 20190416141831.787 ERROR conf - conf.c:userns_exec_1:4352 - Error setting up {g,u}id mappings for child process “34243”
lxc intimate-rattler 20190416141831.790 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1122 - Failed to destroy cgroups
lxc 20190416141831.790 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”

Well I tested with other network configurations same error.

lxc sharp-malamute 20190417060338.883 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [100000-165536) not allowed”: newuidmap 8213 0 100000 65536
lxc sharp-malamute 20190417060338.883 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.

Any ideas?

If I create container container1 on node1 then move container1 --target node2 then copy container1 container2 --target node2 container starts without errors.

Without copying I cant make same container work after move.

Problem was /etc/subuid /etc/subguid fow root and lxd were different between servers
astunkojis@ceph-osd:~$ cat /etc/subgid
andriusjurkus:100000:65536
lxd:165536:65536
root:165536:65536

root@paralel-linux:/# cat /etc/subgid
lxd:100000:65536
root:100000:65536
andriusjurkus:231072:65536

I dont know if this is considered bug somewhere or not.