Container fails to launch/start when using cloud-init templating

I’ve been running my head into this for days and I’m out of ideas of what to try. Rather than dump what may or may not be useful into the initial post, I’m going to sketch out the problem and hopefully be pointed in the right direction for troubleshooting this and then we can get specific.

The long & short of it is this: I have a CentOS 7.6 image I’m attempting to deploy on a CentOS 7.6 host via LXD 3.16. I’m using the two-tarball method of rootfs and metadata importing into LXC. The rootfs is a chroot install of a corporate CentOS 7.6 image, not one from the LXD images repo. I did not use distrobuilder to create what was imported as I have just discovered it and tar’ing these up myself and importing was working fine.

I’m attempting to use cloud-init to set the network configs of the container interface since I don’t have a DHCP server available on this network. The cloud-init (18.2) RPM was installed into the image source file system.

When I don’t mention the cloud-init templates in metadata.yaml, the container launches fine.
When I add the template stanzas in, the container refuses to launch or subsequently start. The log error is “No such file or directory” — but which?

The formatting of the templates section matches that of the examples. Each of the files I reference is in the templates directory, which is included in the metadata tarball and I’ve confirmed on the target host that it still is via an lxc export. What could it possibly be looking for that is not there? Which direction do I move in?

Thanks,
Shawn

# lxc info --show-log local:webfe-temp3
Name: webfe-temp3
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/09/04 21:59 UTC
Status: Stopped
Type: persistent
Profiles: default, webfe

Log:

lxc webfe-temp3 20190904215929.107 ERROR    conf - conf.c:run_buffer:352 - Script exited with status 1
lxc webfe-temp3 20190904215929.107 ERROR    start - start.c:lxc_init:887 - Failed to run lxc.hook.pre-start for container "webfe-temp3"
lxc webfe-temp3 20190904215929.107 ERROR    start - start.c:__lxc_start:1991 - Failed to initialize container "webfe-temp3"
lxc webfe-temp3 20190904215959.130 ERROR    conf - conf.c:run_buffer:352 - Script exited with status 1
lxc webfe-temp3 20190904215959.130 ERROR    start - start.c:lxc_fini:1024 - Failed to run "lxc.hook.stop" hook
lxc webfe-temp3 20190904215959.133 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:864 - No such file or directory - Failed to receive the container state

Can you show your metadata.yaml?

creation_date: 1567633420
properties:
 architecture: "x86_64"
 description: "CentOS 7.6.1810 "
 os: "centos"
 release: "7.6.1810"
public: yes
aliases: circonus_container
templates:
  /etc/hostname:
    when:
    - create
    - copy
    template: hostname.tpl
  /var/lib/cloud/seed/nocloud-net/meta-data:
    when:
    - create
    - copy
    template: cloud-init-meta.tpl
  /var/lib/cloud/seed/nocloud-net/network-config:
    when:
    - create
    - copy
    template: cloud-init-network.tpl
  /var/lib/cloud/seed/nocloud-net/user-data:
    when:
    - create
    - copy
    template: cloud-init-user.tpl
    properties:
      default: |
        #cloud-config
        {}
  /var/lib/cloud/seed/nocloud-net/vendor-data:
    when:
    - create
    - copy
    template: cloud-init-vendor.tpl
    properties:
      default: |
        #cloud-config
        {}

I’m not seeing anything obvious that would explain it.

Are you getting any more details if running lxc monitor while starting your container?

If not, you way want to run strace -f -o trace -p PID-OF-LXD and look at trace for what path is causing the error.

Monitor? Dang I completely missed that. You mean something like this?

location: none
metadata:
  context:
    container: webfe-temp-netonly
    err: |-
      Failed to render template: [Error (where: parser) in <string> | Line 1 Col 33 near '\_get("user.network-config", "") == "" %}version: 1
      config:
          - type: physical
            name: eth0
            subnets:
                - type: '] If-condition is malformed.
  level: eror
  message: The start hook failed
timestamp: "2019-09-06T02:42:48.065698689Z"
type: logging
$ cat cloud-init-network.tpl
{% if config\_get("user.network-config", "") == "" %}version: 1
config:
    - type: physical
      name: eth0
      subnets:
          - type: {% if config_get("user.network_mode", "") == "link-local" %}manual{% else %}dhcp{% endif %}
            control: auto{% else %}{{ config_get("user.network-config", "") }}{% endif %}

It’s that config\_get isn’t it? And == "" %}version: 1 — should be a newline there?

That was it, thank you so much. Always double-check what you copy in from another source, kids :slight_smile: