Cloud-init is blocked by snapd in the new ubuntu images

ykazakov · April 16, 2019, 5:23pm

I want to automate creation of lxd container based on ubuntu images, with custom settings and scripts. Naturally, I tried using the cloud-init settings in lxd profiles, which worked well for ubuntu:18.04.

Unfortunately, with the newer images, ubuntu:18.10, ubuntu-daily:18.10, ubuntu-daily:19.04, cloud-init does no longer work (which can be verified, e.g., using these steps).

After some investigations, I figured that the cloud-init services are blocked by snapd.seeded.service

$ systemctl list-jobs
JOB UNIT                                 TYPE  STATE  
122 cloud-config.service                 start waiting
107 snapd.autoimport.service             start waiting
  2 multi-user.target                    start waiting
121 cloud-init.target                    start waiting
  1 graphical.target                     start waiting
127 cloud-final.service                  start waiting
 86 systemd-update-utmp-runlevel.service start waiting
105 snapd.seeded.service                 start running

$ less /lib/systemd/system/cloud-config.service
[Unit]
Description=Apply the settings specified in cloud-config
After=network-online.target cloud-config.target
After=snapd.seeded.service
Wants=network-online.target cloud-config.target

[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init modules --mode=config
RemainAfterExit=yes
TimeoutSec=0

# Output needs to appear in instance console output
StandardOutput=journal+console

[Install]
WantedBy=cloud-init.target

Maybe the issue is caused by this bug, I am not sure. I use the latest version of snap (within the container) for which this bug should have been fixed:

$ snap version
snap    2.38+19.04
snapd   2.38+19.04
series  16
ubuntu  19.04
kernel  4.14.98-v7+

Or may be it is because snap does not work well in unprivileged containers? Is there a workaround? I could manually remove snapd from the container, but I really like to automate this using profiles.

simos · April 16, 2019, 6:27pm

My first view is that it’s not snap that may block cloud-init but rather that the snap.seeded target does not complete in order for cloud-init to take over and do its work.

simos · April 16, 2019, 6:41pm

I just tested cloud-init on a Ubuntu:18.10 container and it worked for me.
Do you want to write a working minimal test to try as well?

gpatel-fr · April 16, 2019, 7:34pm

according to this, the snap.seeded is just a flag used to signal that snapd does not work yet.
From a few threads on askubuntu it seems that sometimes snaps does not initialize correctly first time and it’s needed to either restart the computer or even install the hello world snap so that the core installation finishes. Never happened to me but not unthinkable nonetheless.

If the OP don’t need snapd in his containers, he could just create a new image with a container from which snapd had been uninstalled.

ykazakov · April 16, 2019, 8:01pm

My host is Raspbian Stretch on RPi 3 B+.
I just tried on another machine (Ubuntu 18.04 i386) with ubuntu-daily:19.04, and surprisingly, everything went fine.

$ snap version
snap    2.38+19.04
snapd   2.38+19.04
series  16
ubuntu  19.04
kernel  4.15.0-45-generic

So it could be a platform-specific issue. Maybe the kernel on RPi is too old and/or lack some features? Any hints how to diagnose?

Creating a new image with uninstalled snapd could be a good idea. Would cloud-init work in such an image? In my understanding it runs only once when the container has been created, and this has already happened in for the created image.

gpatel-fr · April 16, 2019, 8:19pm

I don’t see why it would not run again for a new container created from an image.

ykazakov · April 16, 2019, 9:35pm

I never created images before, so I don’t now what part of a container is saved in the images.
cloud-init should somehow remember that it already has run so that it dos not generate, e.g., new ssh keys on every reboot. Would this information be saved in the image?

I just tried creating an image from a container with the removed snapd

lxc publish container-without-snapd --alias image-without-snapd

and created a container out of that with cloud-init settings

lxc launch image-without-snapd test-no-snapd -c user.user-data="$(cat cloud-init.sh)"

with

$ cat cloud-init.sh 
#!/bin/bash

touch /root/init-got-run

that seem to have worked indeed!

lxc exec test-no-snapd -- sh -c "ls -la /root"
total 8
drwx------ 1 root root   70 Apr 16 20:58 .
drwxr-xr-x 1 root root  162 Apr 15 03:06 ..
-rw-r--r-- 1 root root 3106 Aug  6  2018 .bashrc
-rw-r--r-- 1 root root  148 Aug  6  2018 .profile
drwx------ 1 root root   30 Apr 16 19:39 .ssh
-rw-r--r-- 1 root root    0 Apr 16 20:58 init-got-run
drwxr-xr-x 1 root root    6 Apr 16 19:40 snap

but, interestingly, it looks like cloud-init has 2 boot records:

$ lxc exec test-no-snapd -- cloud-init analyze show
...
2 boot records analyzed

Are log files saved in images?

I created another container out of the same image, and at least they seem to have different generated ssh keys and mac addresses. The log file /var/log/cloud-init-output.log starts the same for both of the containers (which part is probably taken from the parent container from which the image was built) but then they become different. Ssh keys and mac addresses seem to be generated two times according to these logs. Anyway, it looks indeed that cloud-init for images created out of containers works as expected.

gpatel-fr · April 16, 2019, 9:39pm

everything is saved, you have to clean up yourself. If you have personal files, unencrypted password files and credit card account numbers in a container, it’s best to remove them before turning it into an image for distribution on the internet.

simos · April 17, 2019, 9:43am

Note that cloud-init is supposed to run once. When you create a new container image from an existing container, you need to clean up the cloud-init files so that cloud-init runs once in the new container as well.

ykazakov · April 17, 2019, 9:57am

I did not clean up any files, but it seems that cloud-init still did its job. It ran all initialisation scripts (modules) to set up ssh keys, network addresses, etc. It ran my user-data script. And it did not repeat that upon the next reboot. It is unclear to me how cloud-init has detected that the new container is different from the container from which the image was created, if they suppose to have the same files.

gpatel-fr · April 17, 2019, 1:11pm

the service itself is certainly not disabled once it has run, it’s easy to see (by looking at syslog) it runs again when the container is restarted.
now in /var/log/cloud-init.log

2019-04-17 12:20:08,859 - util.py[DEBUG]: Read 6 bytes from /var/lib/cloud/data/instance-id
2019-04-17 12:20:08,859 - stages.py[DEBUG]: previous iid found to be test1
2019-04-17 12:20:08,860 - util.py[DEBUG]: Writing to /var/lib/cloud/data/instance-id - wb: [644] 6 bytes
2019-04-17 12:20:08,862 - util.py[DEBUG]: Writing to /run/cloud-init/.instance-id - wb: [644] 6 bytes
2019-04-17 12:20:08,863 - util.py[DEBUG]: Writing to /var/lib/cloud/data/previous-instance-id - wb: [644] 6 bytes
2019-04-17 12:20:08,867 - util.py[DEBUG]: Writing to /var/lib/cloud/instance/obj.pkl - wb: [400] 6526 bytes
2019-04-17 12:20:08,870 - main.py[DEBUG]: [net] init will now be targeting instance id: test1. new=False

so it seems that it relies simply on the host name. That’s probably why you can’t change it from lxd.
So i’d guess that if you delete or rename your original container, and create a new container from the published image with the same name as the original container, cloud-image will not run.

ykazakov · April 17, 2019, 1:41pm

Indeed the hostname does not get renamed when renaming the container. Then why/how it gets renamed when the container is created from an image? Presumably /etc/hostname is already present in the image if it is created from an old container.

gpatel-fr · April 17, 2019, 10:07pm

Looking a bit at cloudinit, it runs indeed at each container start, and decides what to do according to some rules. Nothing is very clear for me ATM, but it seems that the full new container handling is decided on detection of change of the ‘iid’. Is the iid just the container name, or something else I’m not sure. i have seen also that the network handling by cloud init runs indeed for a new container, but it can also run for some network configuration changes passed by the host.
Whatever the means the host use to pass the init files (I have seen references to /run in the log), the handling by cloudinit has nothing magical, it’s a bunch of json files that are read as any old files.
For anyone interested, the handling is in python modules under dist-packages/cloudinit.