Useradd & subuid

jef_ua · April 29, 2020, 12:53pm

with ctrl-c yes, and then
systemctl start lxd.service lxd.socket

jef_ua · April 29, 2020, 12:53pm

but now it is restarted, but i can’t start neither create containers

root@srv2:/usr/local/bin# lxc info --show-log local:testcuda
Name: testcuda
Remote: unix://
Architecture: x86_64
Created: 2020/04/29 12:47 UTC
Status: Stopped
Type: persistent
Profiles: all_gpu_250GB

Log:

lxc testcuda 20200429124712.759 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [786432-851968) not allowed”: newuidmap 19272 0 786432 65536
lxc testcuda 20200429124712.759 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.
lxc testcuda 20200429124712.853 WARN network - network.c:lxc_delete_network_priv:2613 - Invalid argument - Failed to remove interface “vethSKTQU0” from “lxdbr0”
lxc testcuda 20200429124712.853 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state “ABORTING” instead of “RUNNING”
lxc testcuda 20200429124712.854 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container “testcuda”
lxc testcuda 20200429124712.861 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [786432-851968) not allowed”: newuidmap 19289 0 786432 65536 65536 0 1
lxc testcuda 20200429124712.861 ERROR conf - conf.c:userns_exec_1:4352 - Error setting up {g,u}id mappings for child process “19289”
lxc testcuda 20200429124712.862 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1122 - Failed to destroy cgroups
lxc 20200429124712.862 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”

stgraber · April 29, 2020, 1:09pm

This is very weird, LXD really shouldn’t pick such low uid/gid.

Can you show lxc config show --expanded testcuda and also do a more simple:

lxc init images:alpine/edge a1
lxc config show --expanded a1

jef_ua · April 29, 2020, 1:11pm

oot@srv2:/home/jpe# lxc config show --expanded testcuda
architecture: x86_64
config:
image.architecture: amd64
image.description: ubuntu 18.04 LTS amd64 (release) (20200407)
image.label: release
image.os: ubuntu
image.release: bionic
image.serial: “20200407”
image.version: “18.04”
limits.cpu: “6”
limits.memory: 26GB
nvidia.runtime: “true”
security.idmap.isolated: “true”
volatile.base_image: 2cfc5a5567b8d74c0986f3d8a77a2a78e58fe22ea9abd2693112031f85afa1a1
volatile.eth0.hwaddr: 00:16:3e:07:3b:00
volatile.idmap.base: “786432”
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:786432,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:786432,“Nsid”:0,“Maprange”:65536}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:786432,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:786432,“Nsid”:0,“Maprange”:65536}]’
volatile.last_state.power: STOPPED
devices:
eth0:
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
gpu:
type: gpu
root:
path: /
pool: cpool
size: 250GB
type: disk
ephemeral: false
profiles:

all_gpu_250GB
stateful: false
description: “”
root@srv2:/home/jpe# lxc init images:alpine/edge a1
Creating a1
root@srv2:/home/jpe# lxc config show --expanded a1
architecture: x86_64
config:
image.architecture: amd64
image.description: Alpine edge amd64 (20200428_13:00)
image.os: Alpine
image.release: edge
image.serial: “20200428_13:00”
volatile.apply_template: create
volatile.base_image: 68d1dce55b29fc2bc75b558887e1c03799c74a73a3d433e473c8e23061002456
volatile.eth0.hwaddr: 00:16:3e:ff:68:0d
volatile.idmap.base: “0”
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
devices:
eth0:
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
root:
path: /
pool: cpool
type: disk
ephemeral: false
profiles:
default
stateful: false
description: “”

jef_ua · April 29, 2020, 1:17pm

this a1 container even runs!

jef_ua · April 29, 2020, 1:35pm

it has something to do with the profile i use, because i just tested without specifying a profile and that works
the profile i use ->
lxc profile show all_gpu_250GB
config:
limits.cpu: “6”
limits.memory: 26GB
nvidia.runtime: “true”
security.idmap.isolated: “true”
description: Default LXD profile
devices:
eth0:
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
gpu:
type: gpu
root:
path: /
pool: cpool
size: 250GB
type: disk
name: all_gpu_250GB
used_by:

/1.0/containers/nbanar
/1.0/containers/fivez
/1.0/containers/mdebruyn
/1.0/containers/elotfi
/1.0/containers/madhumita
/1.0/containers/walter
/1.0/containers/tulkens
/1.0/containers/testcuda

stgraber · April 29, 2020, 2:48pm

Pretty sure it’s isolated misbehaving but that’s odd as it should be restricted to its normal range…

Can you try:

lxc launch images:alpine/edge a2 -c security.idmap.isolated=true

And confirm this fails similarly?

jef_ua · April 29, 2020, 2:51pm

root@srv2:/home/jpe# lxc launch images:alpine/edge a2 -c security.idmap.isolated=true
Creating a2
Starting a2
Error: Failed to run: /usr/lib/lxd/lxd forkstart a2 /var/lib/lxd/containers /var/log/lxd/a2/lxc.conf:
Try lxc info --show-log local:a2 for more info
root@srv2:/home/jpe# lxc info --show-log local:a2
Name: a2
Remote: unix://
Architecture: x86_64
Created: 2020/04/29 14:50 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc a2 20200429145033.633 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [851968-917504) not allowed”: newuidmap 32741 0 851968 65536
lxc a2 20200429145033.633 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.
lxc a2 20200429145033.718 WARN network - network.c:lxc_delete_network_priv:2613 - Invalid argument - Failed to remove interface “veth3VE09O” from “lxdbr0”
lxc a2 20200429145033.718 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state “ABORTING” instead of “RUNNING”
lxc a2 20200429145033.719 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container “a2”
lxc a2 20200429145033.725 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [851968-917504) not allowed”: newuidmap 32757 0 851968 65536 65536 0 1
lxc a2 20200429145033.725 ERROR conf - conf.c:userns_exec_1:4352 - Error setting up {g,u}id mappings for child process “32757”
lxc a2 20200429145033.726 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1122 - Failed to destroy cgroups
lxc 20200429145033.726 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”

stgraber · April 29, 2020, 3:44pm

Ok, thanks, can you show lxc config show --expanded a2?

stgraber · April 29, 2020, 3:44pm

What OS is this running on btw?

jef_ua · April 29, 2020, 3:51pm

root@srv2:/home/jpe# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
root@srv2:/home/jpe# lxc config show --expanded a2
architecture: x86_64
config:
image.architecture: amd64
image.description: Alpine edge amd64 (20200428_13:00)
image.os: Alpine
image.release: edge
image.serial: “20200428_13:00”
security.idmap.isolated: “true”
volatile.base_image: 68d1dce55b29fc2bc75b558887e1c03799c74a73a3d433e473c8e23061002456
volatile.eth0.hwaddr: 00:16:3e:b3:4e:a3
volatile.idmap.base: “851968”
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:851968,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:851968,“Nsid”:0,“Maprange”:65536}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:851968,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:851968,“Nsid”:0,“Maprange”:65536}]’
volatile.last_state.power: STOPPED
devices:
eth0:
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
root:
path: /
pool: cpool
type: disk
ephemeral: false
profiles:

default
stateful: false
description: “”

stgraber · April 29, 2020, 3:52pm

Ok, worth noting that if you were to switch over to the snap, you’d workaround this issue as it doesn’t rely on subuid/subgid.

If that’s an option for you, you can move across with:

snap install lxd
lxd.migrate

stgraber · April 29, 2020, 3:52pm

I’m trying to reproduce the issue of idmap.isolated not respecting the configured range now, might be something we fixed in a later release.

jef_ua · April 29, 2020, 3:53pm

ok, i am willing to switch over to snap, i presume i first uninstall lxd? And what about the existing containers?

stgraber · April 29, 2020, 3:54pm

Nope, just run those two commands and your containers will be kept around and moved across, don’t remove anything, the migration tool does it for you.

jef_ua · April 29, 2020, 4:07pm

lxd migrate takes its time…

stgraber · April 29, 2020, 5:01pm

What is it showing?

jef_ua · April 29, 2020, 5:30pm

root@srv2:/home/jpe# lxd migrate
WARN[04-29|17:56:56] CGroup memory swap accounting is disabled, swap limits will be ignored.

so for more than an hour.

stgraber · April 29, 2020, 5:32pm

Ah, that explains it
ctrl+c that and run lxd.migrate (note the dot instead of space)

stgraber · April 29, 2020, 5:32pm

Newer LXD versions would yell at you when running lxd migrate telling you it’s invalid, but 3.0.3 that you’re running now, just spawns a new copy of the daemon…