Wrong interfaces assignement when creating containers with terraform and multiple interfaces

Hi. Before posting it in terraform-lxd provider repository I thought to check here first.

My LXD host is setup with two bridge interfaces (bridge-utils), and I configure profile to use them for containers. So my containers have two interfaces too. I do not use DHCP, my network config is static. I use cloud-init I think, not sure but it works.

See my terraform code below for reference.

Half of the time the container network is created wrong (does not often happen if you create one container, but almost always happens when I create like 8 containers). The internal interface is assigned to the wrong host interface. If I ping something from within container and monitor traffic with tcpdump, I can see that traffic is being sent out via the wrong interface.

For example see the part of the config from the lxc config edit <container>:

config:
  ...
  volatile.eth0.host_name: vetheyxxxyyy
  volatile.eth0.hwaddr: 00:1x:xx:xx:xx:yy
  volatile.eth0.name: eth1
  volatile.eth1.host_name: veth6xxx6xxx
  volatile.eth1.hwaddr: 00:1x:xx:xx:xx:xx
  volatile.eth1.name: eth0

I have to fix these two lines, restart container and then network works as expected.

volatile.eth0.name: eth0
volatile.eth1.name: eth1

Looks like a race condition to me. Should I let people on terraform-lxd github know about this or is this something for you guys to look at?

My profile:

$ lxc profile show myprofile
config: {}
description: Created by Terraform
devices:
  eth0:
    nictype: bridged
    parent: br0
    type: nic
  eth1:
    nictype: bridged
    parent: br1
    type: nic

This is some of my terraform code:

# this is modules/container/main.tf, scroll down for network.yml template
resource "lxd_container" "container" {
  name = var.name
  image = var.image
  ephemeral = var.ephemeral
  profiles = var.profiles

  config = {
    "user.network-config" : data.template_file.network_config.rendered
  }
}

data "template_file" "network_config" {
  template = file("${path.module}/network.yml")
  vars = {
    eth0_ip = var.eth0_ip
    eth0_netmask = var.eth0_netmask
    eth0_gateway = var.eth0_gateway
    eth1_ip = var.eth1_ip
    eth1_netmask = var.eth1_netmask
    nameserver = var.nameserver
  }
}

# this is modules/container/network.yml
network:
  version: 1
  config:
    - type: physical
      name: eth0
      subnets:
        - type: static
          ipv4: true
          address: ${eth0_ip}
          netmask: ${eth0_netmask}
          gateway: ${eth0_gateway}
          control: auto
    - type: physical
      name: eth1
      subnets:
        - type: static
          ipv4: true
          address: ${eth1_ip}
          netmask: ${eth1_netmask}
          control: auto
# this is project
module "mycontainer1" {
  source = "./modules/container"
  name = "mycontainer1"
  image = "images:debian/buster/cloud/amd64"
  profiles = ["myprofile"]

  eth0_ip = "xxxxx"
  eth0_netmask = "255.255.255.0"
  eth0_gateway = "xxxxxxx"
  eth1_ip = "xxxxxxx"
  eth1_netmask = "255.255.255.0"
  nameserver = "xxxxxxx"
}

Feel free to ask any questions. Code above was sanitized a bit so may be few mistakes made during copy/paste.

Edit1: there is another lxd terraform glitch, when you use modules - terraform will fail to download lxd provider binaries. I had to add providers into the module itself, and pass variables to the module. If someone will be testing terraform and experience this issue, please let me know and I will help.

Is Terraform creating these instances concurrently? If so it could be a bug in LXD’s interface assignment logic.

Please could you open an issue here Issues · lxc/lxd · GitHub

Thanks

Is Terraform creating these instances concurrently?

I believe so as all of the containers appear on the lxc ls list at once.

Thanks @tomp what information in particular should I mention from my post in the bug report?

Everything you mentioned here basically. That way it won’t get lost in the forum :slight_smile:

1 Like

Thanks so much, that was very fast. Looking forward to trying.

1 Like

@tomp
if you don’t mind me asking, when will this fix be published in downloadable version from snap? I have noticed that 5.0 got updated on 20 April which was yesterday, but no release was published on github. I have refreshed LXD from snap, tried deploying containers but problem persists. So I assume fix was not added to the release yet. Just wondering.

Thank you in advance.

The next time @stgraber does a cherry-pick and push of the LXD 5.0 snap package it will be available a few hours after that.

You can keep an eye out for the cherry-pick of the commit in:

1 Like