Routed network not working on newly created Incus containers (but working correctly on LXD migrated containers)

rfruit · September 11, 2024, 9:10pm

Hello

I have just migrated from LXD to Incus.
I have one LXD container that was migrated correctly to Incus: let’s call it LXDcontainer.
This container has 2 profiles default and routed with the following network configurations:

default :

devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic

routed (based on How to get LXD containers get IP from the LAN with routed network – Mi blog lah!):

config:
  user.network-config: |
    version: 2
    ethernets:
        eth0:
            addresses:
            - my_static_ip/32
            nameservers:
                addresses:
                - my_dns_ip
                search: []
            routes:
            -   to: 0.0.0.0/0
                via: 169.254.0.1
                on-link: true
description: Routed profile
devices:
  eth0:
    ipv4.address: my_static_ip
    nictype: routed
    parent: ens3
    type: nic

LXDcontainer seems to work fine, and in particular I can access internet from the container.

I then created an Incus container: let’s call it INCUScontainer.
I added the exact same profiles to INCUScontainer but now there seems to be no internet access from within the container. The container can be accessed from the internet though (using the IP address my_static_ip specified in routed).

If I remove the second profile (routed), I can then access the web from the container (but as expected, I can no longer access my container from the web using my_static_ip).

This really bugs me. I don’t understand why the routed profile is causing internet connection troubles from within the container only for newly created containers and not old LXD containers.

Any idea how I can identify what is causing the issue ?

simos · September 11, 2024, 10:15pm

Hi!

The routed profile is supposed to have hard-coded IP addresses of your LAN.
Which means that you need separate profiles per container. Do you do that?

Because if you use the same routed-type profile for two containers, then you have an IP conflict at play.

rfruit · September 12, 2024, 7:04am

Thanks for your reply @simos

Every time I make some tests with one of the containers (LXDContainer or INCUScontainer), I make sure the other one is stopped. In any case, when both are running and using the same routed profile, Incus indeed raises an error saying that the same IP cannot be assigned to 2 distinct containers.

It should not be a problem to use the same routed profiles as long as there is never 2 or more containers running concurrently with that profile right ?

rfruit · September 12, 2024, 7:31am

Just in case, I removed the routed profile from all containers except INCUScontainer. But the problem is still there.

I did the same but for the LXDcontainer and it works

rfruit · September 12, 2024, 7:33am

Any idea what I should check ? I tried to compare the config of the two containers but apart from the profiles I don’t know if there is something else that could impact the connection of the containers.

simos · September 12, 2024, 9:46am

How do you test that from within the Incus container you cannot access the Internet? Do you run ping www.google.com? But does ping 8.8.8.8 work? Because if it does, then the issue is related to DNS while the rest of the networking is fine.

Each instance (whether it’s a container or VM) has a unique MAC address from a range of MAC addresses that are meant to be used by instances only. Your router on your LAN sees those MAC addresses. In most cases, it does not pay attention. But if your router is one of those strict routers, it may not like it that the same IP address occasionally arrives from a different MAC address. If that is the case, then you can switch the MAC address of the Incus container into that of the old container. Not the proudest networking tip, but it could be a workaround.

Of course, you can create a separate routed profile with a different static IP address to verify the claim whether your router is doing this to you.

rfruit · September 12, 2024, 11:50am

For some reasons ping is not working even on my host (and so neither on my containers). ~~I am not sure but I think this happened when I upgraded my host OS from Ubuntu 22.04 LTS to Ubuntu 24.04 LTS (right before moving to incus).~~

But anyway, to test the connection I use wget, curl, or nc. ~~It does not seem to be a DNS issue.~~

I’ll try your solution

rfruit · September 12, 2024, 12:04pm

My bad the ping issue was related to the external firewall configured on my cloud provider (OVH)

rfruit · September 12, 2024, 12:13pm

Actually, after I also disabled the external firewall specific to my_static_ip, I am now able to ping 8.8.8.8 from within the container. But ping google.com fails with Temporary failure in name resolution so indeed a DNS issue…

strange that everything is still working fine with the LXDcontainer… the DNS configuration is the same (defined in routed)

Do you think the DNS issue is related to the cloud provider as well ?

rfruit · September 12, 2024, 12:36pm

I just tried to revert the mac addresses of the 2 containers but it does not solve the problem

simos · September 12, 2024, 1:18pm

If this Incus installation is on a cloud provider, and you are using routed to get an instance to receive a public IP address, then you are likely hitting on restrictions set up by your cloud provider.

Most cloud providers require from you to specify the MAC address for that extra public IPv4 address that you have requested. Is that a mainstream cloud provider?

If you get DNS errors, those are easy to diagnose using tshark. You look into the packets as they travel between the container, the host and arriving from the Internet.

rfruit · September 12, 2024, 2:11pm

Yes this is what I am doing, but I do not recall encountering such issues with LXD using the exact same server (I struggled a bit at first before @tomp redirected me to your blog post which was really helpful by the way). Maybe something has changed on my cloud provider or in my setup.

Yes it is OVH, the my_dns_ip I use in the routed config is the one of OVH (I doubled checked it is still the same using resolvectl status)

Ok thanks for the tip, I’ll investigate further

rfruit · September 12, 2024, 8:59pm

I have found one way to fix the problem:

In file /etc/resolv.conf of the INCUScontainer, when I change the default nameserver from 127.0.0.53 to my_dns_ip the problem is solved: no DNS issue when I ping google.com (or another domain) from within the container.

Surprisingly, the same file in LXDcontainer has nameserver 127.0.0.53 (and not nameserver my_dns_ip) and everything works fine.

I am wondering:

what could cause such a different behaviour on the incus network ?
is the file /etc/resolv.conf of containers supposed to be modified ? shouldn’t it be based on the incus configuration ? (routed profile in this case)

simos · September 12, 2024, 9:30pm

The nameserver 127.0.0.53 indicates that this is a stub resolver. This means that we do not know what really took place because we are seeing just the front, which is the same everywhere.

systemd-resolved is used here. Run resolvectl to view the actual settings in order to compare between the two runtimes.

rfruit · September 12, 2024, 9:49pm

Thanks for the explanation!

I did that earlier but I am not sure what I can deduce from the info I get:

INCUScontainer with (default and) routed profile added:

~$ resolvectl
Global
         Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 71 (eth0)
    Current Scopes: none
         Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

INCUScontainer with routed profile removed (and default still there):

Global
         Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 73 (eth0)
    Current Scopes: DNS
         Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
       DNS Servers: 10.24.65.1 fd42:4bed:5ca8:5821::1 fe80::216:3eff:fef3:bdb8
        DNS Domain: incus

LXDcontainer with (default and) routed profile added:

$ resolvectl
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (br-15702ee19731)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 3 (docker0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 75 (eth0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: my_dns_ip
       DNS Servers: my_dns_ip

where my_dns_ip is the DNS IP configured in profile routed.

There is clearly a DNS issue in case 1, and not in case 3. But I don’t know what is causing this difference. It really seems related to my incus configuration, something I missed probably. I’ll investigate tshark.

rfruit · September 13, 2024, 1:52pm

I reset my server to a snapshot I took just before installing Incus and migrating from LXD to Incus.
I created a new container on lxd:
lxc launch ubuntu:22.04 TestDNSlxd --profile default --profile routed--storage default
When opening a shell in TestDNSlxd, both ping 8.8.8.8 and ping google.com work fine.
I then shutdown lxd (sudo snap stop lxd) + installed Incus without migrating from LXD + initialized Incus
I created the exact same routed profile on Incus and created a similar container: incus launch images:ubuntu/22.04 TestDNSincus --profile default --profile ip_routing --storage default (same command as in step 1 but with lxd replaced by incus). But now when opening a shell in TestDNSincus, ping 8.8.8.8 works fine but ping google.com returns a Temporary failure in name resolution.

The 2 potential explanations I see:

Incus works slightly differently than lxd regarding routing networks. @simos have you tried to replicate what you describe in your blog How to get LXD containers get IP from the LAN with routed network – Mi blog lah! but on Incus ? Does it work ?
The incus and lxd ubuntu images differ and for some reasons this causes a DNS issue when routing is configured

Could there be another explanation ?

simos · September 13, 2024, 2:13pm

Here’s me running the routed profile.

$ incus profile show routed
config:
  user.network-config: |
    #cloud-config
    version: 2
    ethernets:
        eth0:
          addresses:
          - 192.168.1.200/32
          nameservers:
            addresses:
            - 8.8.8.8
            search: []
          routes:
          - to: 0.0.0.0/0
            via: 169.254.0.1
            on-link: true
description: Routed profile for Incus
devices:
  eth0:
    ipv4.address: 192.168.1.200
    name: eth0
    nictype: routed
    parent: br0
    type: nic
name: routed
used_by: []
project: default
$ incus launch images:ubuntu/24.04/cloud routed --profile default --profile routed
Launching routed
$ incus list routed
+--------+---------+----------------------+------+-----------+-----------+
|  NAME  |  STATE  |         IPV4         | IPV6 |   TYPE    | SNAPSHOTS |
+--------+---------+----------------------+------+-----------+-----------+
| routed | RUNNING | 192.168.1.200 (eth0) |      | CONTAINER | 0         |
+--------+---------+----------------------+------+-----------+-----------+
$ incus shell routed
root@routed:~# ping -c 3 www.google.com
PING www.google.com (216.58.213.100) 56(84) bytes of data.
64 bytes from lhr25s02-in-f4.1e100.net (216.58.213.100): icmp_seq=1 ttl=55 time=13.8 ms
64 bytes from lhr25s02-in-f4.1e100.net (216.58.213.100): icmp_seq=2 ttl=55 time=14.1 ms
64 bytes from lhr25s02-in-f4.1e100.net (216.58.213.100): icmp_seq=3 ttl=55 time=13.7 ms

--- www.google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 13.707/13.886/14.139/0.113 ms
root@routed:~# logout
$

From your description this also works for you (you can ping) but there is a small issue with DNS resolutions, which requires a bit of troubleshooting.
If you can setup a test server that exhibits the issue and give me access, I can look into this.

simos · September 16, 2024, 9:46pm

@rfruit gave me access to a testing VPS that exhibited this issue and I had a look.
This is the write-up.

There was a routed Incus profile that when it was applied to an Incus image, it would create an instance that did not have working DNS settings. You would have to configure manually the DNS. This would only happen to images from images.linuxcontainers.org, though there were some working old images from a few years ago (pre-Incus).

On my system the same routed configuration would work and produce instances with proper network configuration. What could be happening?

yaml-difference

Is there any significance with the above innocuous indentation? Sure, there is.

$ incus exec TestDNS -- cat /var/log/cloud-init-output.log
Cloud-init v. 24.2-0ubuntu1~24.04.2 running 'init-local' at Mon, 16 Sep 2024 20:58:29 +0000. Up 0.57 seconds.
2024-09-16 20:58:29,931 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 13 column 16: "mapping values are not allowed here
  in "<unicode string>", line 13, column 16:
                via: 169.254.0.1
                   ^"
...
$

So, there’s significance with the indentation at that part of the profile.
If they are not indented properly, then the network configuration fails. It does not fail completely because the specific routed has some configuration that is not part of cloud-init, which still works. Hence the difficulty in the troubleshooting.

When you use cloud-init instructions in a profile, those instructions are pushed by Incus into the container image. The instance is launched, and it runs whatever instructions are supplied. If there is an error, it can be found in /var/log/cloud-init-output.log.

But were those previous versions of cloud-init more relaxed in accepting bad-formed YAML configuration? Let’s test this with Incus.

First, we need an old container image from Ubuntu 20.04, from April 2024. Here it is.
Then, we download two of the files.

$ wget http://cloud-images-archive.ubuntu.com/releases/focal/release-20200423/ubuntu-20.04-server-cloudimg-amd64-lxd.tar.xz
...
$ wget http://cloud-images-archive.ubuntu.com/releases/focal/release-20200423/ubuntu-20.04-server-cloudimg-amd64-root.tar.xz
...
$ incus image import ubuntu-20.04-server-cloudimg-amd64-lxd.tar.xz ubuntu-20.04-server-cloudimg-amd64-root.tar.xz --alias ubuntu-from-canonical-20.04
Image imported with fingerprint: af34e9b8cb04c78250a4967306a45ca1dee482e91aca49760d22b070086d42fa
$ incus launch images:ubuntu/20.04/cloud TestDNS --profile default --profile routed
Launching TestDNS
$ incus exec TestDNS2 -- cat /var/log/cloud-init-output.log
Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'init-local' at Mon, 16 Sep 2024 21:23:06 +0000. Up 0.74 seconds.
2024-09-16 21:23:06,063 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 13 column 16: "mapping values are not allowed here
  in "<unicode string>", line 13, column 16:
                via: 169.254.0.1
                   ^"
2024-09-16 21:23:06,063 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloud'> failed
$

It is likely that the YAML format was not relaxed in that aspect at any time, it’s just the formatting detail is so innocuous that it is easy to not notice it.

Is there a validator for cloud-init files?
There’s cloud-init schema, though it does not appear to work with user-data.

$  cat routed.yaml 
#cloud-config
version: 2
ethernets:
    eth0:
      addresses:
      - 192.168.1.200/32
      nameservers:
        addresses:
        - 8.8.8.8
        search: []
      routes:
      - to: 0.0.0.0/0
        via: 169.254.0.1
        on-link: true
$  cloud-init schema --config-file routed.yaml
Invalid user-data routed.yaml
Error: Cloud config schema errors: ethernets: Additional properties are not allowed ('ethernets' was unexpected)

Error: Invalid schema: user-data
$

There’s supposed to be a validator with cloud-init devel schema on some forums, though it did not work for me. The cloud-init in Ubuntu 24.04 does not have devel schema.

candlerb · September 17, 2024, 8:07am

Try: cloud-init schema -t network-config --config-file routed.yaml

However, testing on ubuntu 22.04 I get:

# cloud-init schema -t network-config --config-file /etc/netplan/01-netcfg.yaml
Skipping network-config schema validation for version: 2. No netplan API available.

I suspect “cloud-init devel schema” means “the schema subcommand in the devel branch of cloud-init”

Aside: I believe the #cloud-config header line is only required for user-data, not network-config.

EDIT: works in Ubuntu 24.04

# cloud-init schema -t network-config --config-file /etc/netplan/50-cloud-init.yaml
<< snip warnings about gateway4/gateway6 >>
Valid schema /etc/netplan/50-cloud-init.yaml

rfruit · September 17, 2024, 9:25pm

Just to complement the answer of @simos for n00bs like me: for cloud-init to be used in the container (and so for the profile to work properly), one must use a /cloud image from http://images.linuxcontainers.org/.