After lxd upgrade to 3.x: "Error: Failed container creation: No root device could be found."


(Brian Candler) #1

A few days ago I updated some Ubuntu 16.04 (with ZFS) boxes to lxd 3.0.1 from backports. The upgrade seemed to go OK: I saw it made a bunch of changes and got rid of my old /etc/default/lxd-bridge. Existing containers are still running. But now I find that I can’t create new containers:

root@nuc1:~# lxc launch ubuntu:16.04 snf-image
Creating snf-image
Error: Failed container creation: No root device could be found.

I’ve tried looking around, and don’t see anything obviously wrong:

root@nuc1:~# lxc image list
| ALIAS | FINGERPRINT  | PUBLIC |                 DESCRIPTION                 |  ARCH  |   SIZE   |         UPLOAD DATE         |
|       | f2228450779f | no     | ubuntu 16.04 LTS amd64 (release) (20180703) | x86_64 | 157.56MB | Jul 6, 2018 at 1:07pm (UTC) |

root@nuc1:~# lxc storage list
| default |             | zfs    | zfs/lxd | 9       |

root@nuc1:~# lxc storage show default
  source: zfs/lxd
  zfs.pool_name: zfs/lxd
description: ""
name: default
driver: zfs
- /1.0/containers/xxxx   << snipped >>
status: Created
- none

root@nuc1:~# zfs list zfs/liblxd
zfs/liblxd   159M  39.2G   159M  /var/lib/lxd
root@nuc1:~# zfs list -r zfs/lxd
NAME                                                                                      USED  AVAIL  REFER  MOUNTPOINT
zfs/lxd                                                                                  16.5G  39.2G    96K  none
zfs/lxd/containers                                                                       15.8G  39.2G    96K  none
zfs/lxd/containers/builder                                                               15.3M
... goes on to list other containers, then images

I also saw that newer lxd/lxd-client packages have been released since then, so I’ve just upgraded to those too, but with the same result. “systemctl restart lxd” didn’t make a difference either.

Looking in /var/log/lxd/lxd.log:

lvl=warn msg="Failed to update instance types: Get lookup on read udp> i/o timeout" t=2018-07-06T14:25:56+0100
ephemeral=false lvl=info msg="Creating container" name=snf-image t=2018-07-06T14:25:58+0100
created=2018-07-06T14:25:58+0100 ephemeral=false lvl=info msg="Deleting container" name=snf-image t=2018-07-06T14:25:58+0100 used=1970-01-01T01:00:00+0100
created=2018-07-06T14:25:58+0100 ephemeral=false lvl=info msg="Deleted container" name=snf-image t=2018-07-06T14:25:58+0100 used=1970-01-01T01:00:00+0100
ephemeral=false lvl=eror msg="Failed creating container" name=snf-image t=2018-07-06T14:25:58+0100

I think the error about failing to resolve against (which happens when I do systemctl restart lxd) is a different problem to investigate separately: I can definitely resolve using dig @ Furthermore, if I run tcpdump while restarting lxd, I can see the queries being answered.

14:29:27.844268 IP (tos 0x0, ttl 64, id 3133, offset 0, flags [DF], proto UDP (17), length 72) > 39723+ AAAA? (44)
14:29:27.844812 IP (tos 0x0, ttl 64, id 3134, offset 0, flags [DF], proto UDP (17), length 72) > 1940+ A? (44)
14:29:27.846230 IP (tos 0x0, ttl 64, id 24796, offset 0, flags [none], proto UDP (17), length 603) > 39723 3/13/15 CNAME, AAAA 2001:67c:1562::41, AAAA 2001:67c:1560:8001::21 (575)
14:29:27.847933 IP (tos 0x0, ttl 64, id 24797, offset 0, flags [none], proto UDP (17), length 547) > 1940 3/13/13 CNAME, A, A (519)

But the primary problem is not being able to create containers, with “No root device could be found.”

Any suggestions for where else I can look?

Thanks … Brian.

apt history from initial upgrade:

Start-Date: 2018-07-01  22:20:03
Commandline: apt-get install -t xenial-backports lxd lxd-client python-pylxd
Install: xdelta3:amd64 (3.0.8-dfsg-1ubuntu2, automatic), liblxc-common:amd64 (3.0.1-0ubuntu1~16.04.1, automatic)
Upgrade: lxd:amd64 (2.0.11-0ubuntu1~16.04.4, 3.0.1-0ubuntu1~16.04.1), liblxc1:amd64 (2.0.8-0ubuntu1~16.04.2, 3.0.1-0ubuntu1~16.04.1), lxd-client:amd64 (2.0.11-0ubuntu1~16.04.4, 3.0.1-0ubuntu1~16.04.1), lxcfs:amd64 (2.0.8-0ubuntu1~16.04.2, 3.0.1-0ubuntu2~16.04.1)
Remove: lxc-common:amd64 (2.0.8-0ubuntu1~16.04.2)
End-Date: 2018-07-01  22:21:50

And subsequent upgrade:

Start-Date: 2018-07-06  14:15:36
Commandline: apt-get dist-upgrade
Upgrade: ... lxd:amd64 (3.0.1-0ubuntu1~16.04.1, 3.0.1-0ubuntu1~16.04.2), ... lxd-client:amd64 (3.0.1-0ubuntu1~16.04.1, 3.0.1-0ubuntu1~16.04.2), ...
End-Date: 2018-07-06  14:15:54

(Brian Candler) #2

Sorry for quick follow-up, but I think I’ve fixed it.

After finding and reading this post about changes to storage in lxd 2.15, I found that none of my profiles had a ‘root’ section: e.g.

root@nuc1:~# lxc profile show default
  environment.http_proxy: ""
  user.network_mode: ""
description: Default LXD profile
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
name: default
- <list of containers>

I edited it to add

    path: /
    pool: default
    type: disk

and now it appears to be happy. If I was supposed to read somewhere that I had to do that, unfortunately I missed it.

(Brian Candler) #3

Update: the other problem, of resolving, is reproducible:

root@nuc1:~# lxc launch images:debian/jessie/amd64 snf-image-jessie
Creating snf-image-jessie
Error: Failed container creation: Get lookup on read udp> i/o timeout

However I believe it’s something to do with DNSSEC, because if I change my resolver to instead of the Mikrotik router, it works.

If so, I don’t know where in the stack this validation is being done. The linux built-in resolver library doesn’t care, since it works when I’m using as my cache:

root@nuc1:~# ping
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=54 time=10.3 ms
64 bytes from ( icmp_seq=2 ttl=54 time=10.3 ms
--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 10.353/10.357/10.361/0.004 ms

So I wonder if lxd itself is doing extra validation, or using a different resolver library (thinks: golang DNS resolver library, bypassing the system one?)