Failed to run: /usr/sbin/lxd forkstart dddd /var/lib/lxd/containers /var/log/lxd/dddd/lxc.conf

Hi guys,
I create a container instance named dddd, and start it with REST API, but start returns failed occasionally. This issue happens for first time after create it.

{“type”:“sync”,“status”:“Success”,“status_code”:200,“operation”:"",“error_code”:0,“error”:"",“metadata”:{“id”:“b19149bd-74c8-4c94-ba22-1069c6aeaec6”,“class”:“task”,“description”:“Starting container”,“created_at”:“2021-06-25T14:37:00.19801582+08:00”,“updated_at”:“2021-06-25T14:37:00.19801582+08:00”,“status”:“Failure”,“status_code”:400,“resources”:{“containers”:["/1.0/containers/dddd"]},“metadata”:null,“may_cancel”:false,“err”:"Failed to run: /usr/sbin/lxd forkstart dddd /var/lib/lxd/containers /var/log/lxd/dddd/lxc.conf: ",“location”:“none”}}

with lxc info returns
[root@cube:~]# lxc info --show-log dddd
Name: dddd
Location: none
Remote: unix://
Architecture: aarch64
Created: 2021/06/25 08:40 UTC
Status: Stopped
Type: persistent
Profiles: docker

Log:

lxc dddd 20210625084025.312 WARN initutils - initutils.c:setproctitle:341 - Invalid argument - Failed to set cmdline
lxc dddd 20210625084026.175 ERROR start - start.c:proc_pidfd_open:1644 - Function not implemented - Failed to send signal through pidfd
lxc dddd 20210625084026.385 ERROR start - start.c:start:2121 - No such file or directory - Failed to exec “/sbin/init”
lxc dddd 20210625084026.389 ERROR sync - sync.c:__sync_wait:62 - An error occurred in another process (expected sequence number 7)
lxc dddd 20210625084026.389 WARN network - network.c:lxc_delete_network_priv:3374 - Failed to rename interface with index 109 from “eth0” to its initial name “veth7433008c”
lxc dddd 20210625084026.390 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:873 - Received container state “ABORTING” instead of “RUNNING”
lxc dddd 20210625084026.394 ERROR start - start.c:__lxc_start:2036 - Failed to spawn container “dddd”
lxc 20210625084027.740 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”

any ideas?

and my config

[root@cube:/container/data/tmp]# lxc config show --expanded dddd
architecture: aarch64
config:
environment.TZ: Asia/Shanghai
image.description: Debian 64bit
image.os: Debian for Docker
image.release: stretch
limits.cpu: “3”
limits.memory: 1000MB
raw.lxc: |-
lxc.cgroup.devices.allow=a
lxc.init.cmd=/sbin/init systemd.unified_cgroup_hierarchy=1
security.nesting: “true”
security.privileged: “true”
volatile.apply_template: create
volatile.base_image: 39ca92012fb33a189a45dd52651a60cc6e161a696bb67846e69d8da0275adf92
volatile.eth0.hwaddr: 00:16:3e:ef:57:d7
volatile.eth0.name: eth0
volatile.idmap.base: “0”
volatile.idmap.current: ‘[]’
volatile.idmap.next: ‘[]’
volatile.last_state.idmap: ‘[]’
volatile.last_state.power: STOPPED
devices:
eth0:
ipv4.address: 192.168.255.40
nictype: bridged
parent: lxdbr0
type: nic
root:
path: /
pool: os
type: disk
ephemeral: false
profiles:

  • docker
    stateful: false
    description: “”
 lxc dddd 20210625084026.385 ERROR start - start.c:start:2121 - No such file or directory - Failed to exec “/sbin/init”

This error tells you that your container doesn’t have a /sbin/init in its filesystem and so can’t be started.

Thanks. But this issue occurs only when start it first time and it’s occasionally. If start failed, the second try will success and always success. The start action is made with REST API. Also I test with lxc command, the issue never occurs.

That’s odd, lxc start just talks to the REST API too, you can see its queries with --debug.

Maybe it’s some kind of race condition where some storage that LXD uses isn’t yet mounted by your system on boot and trying a bit later has it succeed?

I have done a test, if I wait 30 seconds before start action, the issue no occurs, rate is 0/10.
If I immediately start it, the issue occurs, rate is 2/10.
It looks like some condition not meet, but I don’t know which one.
Any more suggestion?