Configuring host and container network in 18.04

I’ve recently acquired a new server from OVH, and installed 18.04.
Installed LXD via snap, and enabled the lxd bridge lxdbr0 during init.

The host config is this;

  version: 2
  renderer: networkd
  ethernets:
    enp1s0:
      dhcp4: false
      addresses:
        - 66.70.XXX.222/32
        - 198.50.XXX.40/32
        - 198.50.XXX.41/32
        - 2607:5300:XXXX:42de::0/64
        - 2607:5300:XXXX:42de::1/64
      gateway4: 66.70.XXX.254
      gateway6: 2607:5300:XXXX:42ff:ff:ff:ff:ff
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4]
      routes:
        - to: 66.70.XXX.254/32
          via: 0.0.0.0
          scope: link
        - to: 2607:5300:XXXX:42ff:ff:ff:ff:ff/64
          scope: link

Following guides from OVH;

And from these forums:

I have been unable to get the network working.
Using all the guides the external IP address is pingable from inside the host machine, but not from the internet.
I have made sure the container has a MAC address assigned to the failover IP from OVH I am trying to use
It is configured in netplan and in lxd config

If I swap the host machines config to place the IPs on lxdbr0, the external IP to the VM works!
But the host machine has no external IPs.

  version: 2
  renderer: networkd
  ethernets:
    enp1s0:
      dhcp4: false
  bridges:
    lxdbr0:
      interfaces: [ enp1s0 ]
      addresses:
        - 66.70.XXX.222/32
        - 198.50.XXX.40/32
        - 198.50.XXX.41/32
        - 2607:5300:XXXX:42de::0/64
        - 2607:5300:XXXX:42de::1/64
      gateway4: 66.70.XXX.254
      gateway6: 2607:5300:XXXX:42ff:ff:ff:ff:ff
      macaddress: 00:1b:21:dc:5c:e2
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4]
      routes:
        - to: 66.70.XXX.254/32
          via: 0.0.0.0
          scope: link
        - to: 2607:5300:XXXX:42ff:ff:ff:ff:ff/64
          scope: link

Nothing I try has worked.
I have also tried putting the network on a different bridge, which works fine. It results in the same issues as putting the network on the main NIC

To replicate:

  • Launch a 18.04 server
  • Configure the HOST network for 1 failover IP on the default NIC
  • Configure the HOST network for 1 main IP on the default NIC
  • Launch a VM with LXD
  • Assign a MAC address to the Failover IP you wish to use with OVH
  • Configure the VM network for 1 failover IP on the default NIC
  • Try to ping the Failover IP from the internet and get no response

Any help would be appreciated

Through a lot of painful debugging, I’ve managed to get it all working… Temporarily.

If I follow the configuration of setting lxdbr0 up in netplan, and then leaving it managed in lxd…

  • My VMs have internet access and their external IPs work.
  • The VMs can talk to the host machine.
  • The host machine has no network

If I log into the host machine (via passing through the vm, or via physical access)
Then netplan apply with the original config, the network on both the host and the VM work as expected.

So;

  • Start host machine with lxdbr0 configured in netplan
    • Have netplan bring the network up
    • LXD starts and modifies the lxdbr0 bridge
  • Login to the machine, and netplan apply again
    • Overwriting part of LXD’s modifications to the lxdbr0 but leaving others

Whenever anything changes the network randomly dies again.
I’m unsure how to make this persistent but I’m fairly certain it has to do with configuration in LXD

I have roughly the same netplan config as you and its working very well for me, though I add the following to my bridge:

parameters:
    forward-delay: 0
    stp: false

I didn’t want spanning tree running on it so that was clear to me, I don’t recall why I set forward-delay, it was likely just a setting I saw somewhere and it works so I don’t touch it.

One question I would have is it LXD thinks its supposed to manage this bridge? If you have it in netplan I think you want it unmanaged and just let the system do it. What does:
lxc network show lxdbr0
say?

When I configure lxdbr0 to be unmanaged by LXD then what happens is the network just doesn’t work at all for the VMs, it works as expected for the host.

Obviously when it’s managed, LXD deletes all my address records and continues on merrily. Making the VMs work, but the host not.

I will try the forward delay and stp.

Weird, on all 4 of my servers I leave it unmanaged. Some use DHCP others use static addresses in the containers and they all can get out. Now we don’t have the number of IPs you do with failovers, at least not on those hosts so that I guess could be part of it.

How are the container networks defined? Via netplan as well or something else?

Usually the machines can get out, that’s not a problem. The issue is connections coming in. The external failover I am assigning XXX.42 can’t be pinged from outside the machine.

I’ve tried simply using cloud init, I’ve tried OVH’s via 0.0.0.0 and the standard netplan config for an IP.

I’ll post the two here shortly.

OVH to Gateway:

  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: yes
      dhcp6: yes
      addresses:
        - 198.50.XXX.42/32
      gateway4: 66.70.XXX.254
      macaddress: 02:00:00:XX:71:18
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4]
      routes:
        - to: 66.70.XXX.254/32
          via: 0.0.0.0
          scope: link

OVH to 0.0.0.0:

network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: true
      dhcp6: true
      addresses:
        - 198.50.XXX.42/32
      macaddress: 02:00:00:XX:71:18
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4]
      routes:
        - to: 0.0.0.0/0
        via: 66.70.XXX.254
        on-link: true

These are the two styles OVH suggests, then obviously I also do the cloud-init as mentioned, it works but not for external IPs.

I’m completely out of ideas other than, have you looked at ARP tables outside the machine (and maybe on the host) and tcpdump to see if pings are making it to the host at least or are they not even getting there.

I’m not really sure how to check the ARP Tables outside the machine, but…

~$ tcpdump host 198.50.XXX.42 and port not 22 -n -s 0 -vvv
tcpdump: listening on enp1s0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:28:22.991354 IP (tos 0x0, ttl 119, id 55667, offset 0, flags [none], proto ICMP (1), length 60)
    66.46.XXX.34 > 198.50.XXX.42: ICMP echo request, id 62464, seq 55827, length 40

I am getting the pings passed to the host, but they don’t go any further than that.

ARP in the VM:

arp -a
? (66.70.XXX.254) at <incomplete> on eth0

ARP in the Host:

arp -a
? (66.70.XXX.254) at 00:f1:04:06:ff:ff [ether] on lxdbr0
? (66.70.XXX.252) at 18:8b:9d:e6:65:75 [ether] on lxdbr0
? (66.70.XXX.253) at cc:46:d6:64:75:fb [ether] on lxdbr0