Error All containers fail to start after upgrade

Hi,

I’m using alpine linux and just ran an upgrade that caused a change where all containers are stopped and cannot be started.

LXC/LXD Version: 4.4 (not sure what it was before)


$ sudo lxc list
+-----------+---------+------+------+-----------+-----------+
|   NAME    |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-----------+---------+------+------+-----------+-----------+
| caddy     | STOPPED |      |      | CONTAINER | 0         |
+-----------+---------+------+------+-----------+-----------+
| git       | STOPPED |      |      | CONTAINER | 0         |
+-----------+---------+------+------+-----------+-----------+
| jellyfin  | STOPPED |      |      | CONTAINER | 1         |
+-----------+---------+------+------+-----------+-----------+
| teamspeak | STOPPED |      |      | CONTAINER | 0         |
+-----------+---------+------+------+-----------+-----------+
| terraria  | STOPPED |      |      | CONTAINER | 0         |
+-----------+---------+------+------+-----------+-----------+
| ubiquiti  | STOPPED |      |      | CONTAINER | 0         |
+-----------+---------+------+------+-----------+-----------+
| wiki      | STOPPED |      |      | CONTAINER | 0         |
+-----------+---------+------+------+-----------+-----------+

When I try to start a container I get an error.

$ sudo lxc start caddy
Error: Failed to run: /usr/sbin/lxd forkstart caddy /var/lib/lxd/containers /var/log/lxd/caddy/lxc.conf:
Try `lxc info --show-log caddy` for more info

$ sudo lxc info --show-log caddy
Name: caddy
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/06/26 19:06 UTC
Status: Stopped
Type: container
Profiles: default

Log:

lxc caddy 20200829221449.887 ERROR    conf - conf.c:run_buffer:324 - Script exited with status 1
lxc caddy 20200829221449.887 ERROR    start - start.c:lxc_init:798 - Failed to run lxc.hook.pre-start for container "caddy"
lxc caddy 20200829221449.887 ERROR    start - start.c:__lxc_start:1945 - Failed to initialize container "caddy"
lxc caddy 20200829221450.292 ERROR    conf - conf.c:run_buffer:324 - Script exited with status 1
lxc caddy 20200829221450.292 ERROR    start - start.c:lxc_end:965 - Failed to run lxc.hook.post-stop for container "caddy"
lxc caddy 20200829221450.292 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:841 - No such file or directory - Failed to receive the container state

I’m really not sure where to start with the debugging so any help would be appreciated.

Thanks,
Colt

I ran lxd in debug mode and tried to start caddy again:

DBUG[08-29|16:58:30] Handling                                 method=GET url=/1.0 ip=@ user=
DBUG[08-29|16:58:30] Handling                                 user= method=GET url=/1.0/instances/caddy ip=@
DBUG[08-29|16:58:30] Handling                                 method=GET url=/1.0/events ip=@ user=
DBUG[08-29|16:58:30] New event listener: d4128cd5-ff31-41e6-b4c0-e5d7f4fa349f
DBUG[08-29|16:58:30] Handling                                 method=PUT url=/1.0/instances/caddy/state ip=@ user=
DBUG[08-29|16:58:30]
        {
                "action": "start",
                "timeout": 0,
                "force": false,
                "stateful": false
        }
DBUG[08-29|16:58:30] New task Operation: cfc425f7-0447-4dea-9531-a7b38a513c5c
DBUG[08-29|16:58:30] Started task operation: cfc425f7-0447-4dea-9531-a7b38a513c5c
DBUG[08-29|16:58:30]
        {
                "type": "async",
                "status": "Operation created",
                "status_code": 100,
                "operation": "/1.0/operations/cfc425f7-0447-4dea-9531-a7b38a513c5c",
                "error_code": 0,
                "error": "",
                "metadata": {
                        "id": "cfc425f7-0447-4dea-9531-a7b38a513c5c",
                        "class": "task",
                        "description": "Starting container",
                        "created_at": "2020-08-29T16:58:30.989650923-06:00",
                        "updated_at": "2020-08-29T16:58:30.989650923-06:00",
                        "status": "Running",
                        "status_code": 103,
                        "resources": {
                                "containers": [
                                        "/1.0/containers/caddy"
                                ]
                        },
                        "metadata": null,
                        "may_cancel": false,
                        "err": "",
                        "location": "none"
                }
        }
DBUG[08-29|16:58:30] Handling                                 method=GET url=/1.0/operations/cfc425f7-0447-4dea-9531-a7b38a513c5c ip=@ user=
DBUG[08-29|16:58:30] Scheduler: network: vethfffb2f26 has been added: updating network priorities
DBUG[08-29|16:58:31] Scheduler: network: veth3a4a26a4 has been added: updating network priorities
DBUG[08-29|16:58:31] MountInstance started                    driver=zfs pool=default project=default instance=caddy
DBUG[08-29|16:58:31] MountInstance finished                   driver=zfs pool=default project=default instance=caddy
DBUG[08-29|16:58:31] UpdateInstanceBackupFile started         driver=zfs pool=default instance=caddy project=default
DBUG[08-29|16:58:31] UpdateInstanceBackupFile finished        driver=zfs pool=default instance=caddy project=default
DBUG[08-29|16:58:31] MountInstance started                    driver=zfs pool=default instance=caddy project=default
DBUG[08-29|16:58:31] MountInstance finished                   driver=zfs pool=default instance=caddy project=default
INFO[08-29|16:58:31] Starting container                       project=default name=caddy action=start created=2020-06-26T13:06:15-0600 ephemeral=false used=2020-08-28T14:54:43-0600 stateful=false
DBUG[08-29|16:58:31] Handling                                 method=GET url=/internal/containers/24/onstart ip=@ user=
DBUG[08-29|16:58:31] MountInstance started                    pool=default driver=zfs project=default instance=caddy
DBUG[08-29|16:58:31] MountInstance finished                   pool=default driver=zfs project=default instance=caddy
EROR[08-29|16:58:31] The start hook failed                    container=caddy err="Failed to run: apparmor_parser --version: "
DBUG[08-29|16:58:31] Handling                                 method=GET url="/internal/containers/24/onstopns?target=stop&netns=" ip=@ user=
DBUG[08-29|16:58:31] Stopping device                          device=eth0 project=default instance=caddy
DBUG[08-29|16:58:31] Clearing instance firewall static filters ipv6=:: project=default instance=caddy parent=lxdbr0 dev=eth0 host_name=veth3a4a26a4 hwaddr=00:16:3e:24:7b:21 ipv4=0.0.0.0
DBUG[08-29|16:58:31] Clearing instance firewall dynamic filters project=default instance=caddy parent=lxdbr0 dev=eth0 host_name=veth3a4a26a4 hwaddr=00:16:3e:24:7b:21 ipv4=<nil> ipv6=<nil>
DBUG[08-29|16:58:31] Stopping device                          device=root project=default instance=caddy
DBUG[08-29|16:58:31] Stopping device                          project=default instance=caddy device=httpport
DBUG[08-29|16:58:31] Stopping device                          device=httpsport project=default instance=caddy
DBUG[08-29|16:58:31] Handling                                 method=GET url="/internal/containers/24/onstop?target=stop" ip=@ user=
EROR[08-29|16:58:31] The stop hook failed                     err="Container is already running a start operation" container=caddy
EROR[08-29|16:58:31] Failed starting container                project=default name=caddy action=start created=2020-06-26T13:06:15-0600 ephemeral=false used=2020-08-28T14:54:43-0600 stateful=false
DBUG[08-29|16:58:31] Failure for task operation: cfc425f7-0447-4dea-9531-a7b38a513c5c: Failed to run: /usr/sbin/lxd forkstart caddy /var/lib/lxd/containers /var/log/lxd/caddy/lxc.conf:
DBUG[08-29|16:58:31] Event listener finished: d4128cd5-ff31-41e6-b4c0-e5d7f4fa349f
DBUG[08-29|16:58:31] Disconnected event listener: d4128cd5-ff31-41e6-b4c0-e5d7f4fa349f

This might be a duplicate of: LXD 4.4 -- container doesn't start unless apparmor is installed

Ah, that’s a bug that we fixed shortly after 4.4 released.

I’d recommend either upgrading to 4.5 if that’s available to you, otherwise rebuild 4.4 with e88d0ea6392fb059a31faedc47c0d3fd77b5deaa applied to it which is the fix for this issue.

As an emergency workaround, putting a script in /usr/local/bin/apparmor_parser which just outputs:

AppArmor parser version 0.0.0
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2018 Canonical Ltd.

Should get the broken code to still behave.

Interesting. I might see if I can get some builds setup and try to help the package for alpine stay more up to date. I appreciate it.

To be fair, we released 4.5 yesterday :wink:

The fix for 4.4 was pushed to the upstream branch very shortly after the 4.4 release but the only way a distro would include that kind of fix is if someone files a bug against the package and the maintainer directly includes it. Given we release almost monthly, it’s not too surprising that this didn’t happen.