jpe here is already conflicting with LXD, I don’t know if that causes the issue.
Technically there are uid/gid it could select after the LXD range, but it looks like the tool is getting confused.
I normally place LXD from 1000000 rather than 100000 which then frees the few initial ranges for use by the system, not sure if that makes it easier on useradd though.
there seem to be something wrong ->
Error: Failed to run: /usr/lib/lxd/lxd forkstart walter /var/lib/lxd/containers /var/log/lxd/walter/lxc.conf:
Try lxc info --show-log walter for more info
root@srv2:/usr/local/bin# lxc info --show-log walter
Name: walter
Remote: unix://
Architecture: x86_64
Created: 2020/04/29 10:48 UTC
Status: Stopped
Type: persistent
Profiles: all_gpu_250GB
Log:
lxc walter 20200429120514.464 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1219 - File exists - Failed to create directory “/sys/fs/cgroup/unified//lxc/walter”
lxc walter 20200429120514.464 ERROR cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1243 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/walter”
lxc walter 20200429120514.464 ERROR cgfsng - cgroups/cgfsng.c:cgfsng_payload_create:1321 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/walter”
lxc walter 20200429120514.475 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [655360-720896) not allowed”: newuidmap 30934 0 655360 65536
lxc walter 20200429120514.475 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.
lxc walter 20200429120514.541 WARN network - network.c:lxc_delete_network_priv:2613 - Invalid argument - Failed to remove interface “veth37FWQT” from “lxdbr0”
lxc walter 20200429120514.541 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state “ABORTING” instead of “RUNNING”
lxc walter 20200429120514.542 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container “walter”
lxc walter 20200429120514.548 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [655360-720896) not allowed”: newuidmap 30949 0 655360 65536 65536 0 1
lxc walter 20200429120514.548 ERROR conf - conf.c:userns_exec_1:4352 - Error setting up {g,u}id mappings for child process “30949”
lxc walter 20200429120514.549 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1122 - Failed to destroy cgroups
lxc 20200429120514.549 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”
Ok, that looks good, so I think LXD will just need a bit of a nudge to re-allocate a new range for the containers.
If you haven’t already, restart the LXD daemon with systemctl restart lxd.
Then you’ll want to temporarily mark your containers as privileged:
lxc config set NAME security.privileged true
And then mark them as unprivileged again:
lxc config unset NAME security.privileged
And finally try to start them. This should cause a new map to have been calculated and then applied during that startup.
If that doesn’t work, then you’ll have to temporarily start the container in between marking them privileged and unsetting the privileged flag. But hopefully that’s not necessary as it would double the time needed to fix this.
i also tried to recreate jefcuda, but also an error ->
root@srv2:/usr/local/bin# lxc launch ubuntu: -p all_gpu_250GB jefcuda
Creating jefcuda
Starting jefcuda
Error: Failed to run: /usr/lib/lxd/lxd forkstart jefcuda /var/lib/lxd/containers /var/log/lxd/jefcuda/lxc.conf:
Try lxc info --show-log local:jefcuda for more info
root@srv2:/usr/local/bin# lxc info --show-log local:jefcuda
Name: jefcuda
Remote: unix://
Architecture: x86_64
Created: 2020/04/29 12:32 UTC
Status: Stopped
Type: persistent
Profiles: all_gpu_250GB
Log:
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1219 - File exists - Failed to create directory “/sys/fs/cgroup/unified//lxc/jefcuda”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1243 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/jefcuda”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:cgfsng_payload_create:1321 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/jefcuda”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1219 - File exists - Failed to create directory “/sys/fs/cgroup/unified//lxc/jefcuda-1”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1243 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/jefcuda-1”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:cgfsng_payload_create:1321 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/jefcuda-1”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1219 - File exists - Failed to create directory “/sys/fs/cgroup/unified//lxc/jefcuda-2”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1243 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/jefcuda-2”
lxc jefcuda 20200429123251.618 ERROR cgfsng - cgroups/cgfsng.c:cgfsng_payload_create:1321 - Failed to create cgroup “/sys/fs/cgroup/unified//lxc/jefcuda-2”
lxc jefcuda 20200429123251.629 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [786432-851968) not allowed”: newuidmap 14696 0 786432 65536
lxc jefcuda 20200429123251.629 ERROR start - start.c:lxc_spawn:1708 - Failed to set up id mapping.
lxc jefcuda 20200429123251.701 WARN network - network.c:lxc_delete_network_priv:2613 - Invalid argument - Failed to remove interface “vethNL1705” from “lxdbr0”
lxc jefcuda 20200429123251.701 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state “ABORTING” instead of “RUNNING”
lxc jefcuda 20200429123251.702 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container “jefcuda”
lxc jefcuda 20200429123251.709 ERROR conf - conf.c:lxc_map_ids:2999 - newuidmap failed to write mapping “newuidmap: uid range [0-65536) -> [786432-851968) not allowed”: newuidmap 14713 0 786432 65536 65536 0 1
lxc jefcuda 20200429123251.709 ERROR conf - conf.c:userns_exec_1:4352 - Error setting up {g,u}id mappings for child process “14713”
lxc jefcuda 20200429123251.710 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1122 - Failed to destroy cgroups
lxc 20200429123251.710 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”