Centos8 containers unable to automatically get ipv4 addresses after update

Those are the results from our last test run earlier today on the Oracle images:

PASS: IPv4 address: oracle-7-unpriv
PASS: IPv6 address: oracle-7-unpriv
PASS: DNS resolution: oracle-7-unpriv
PASS: systemd clean: oracle-7-unpriv

PASS: IPv4 address: oracle-7-priv
PASS: IPv6 address: oracle-7-priv
PASS: DNS resolution: oracle-7-priv
PASS: systemd clean: oracle-7-priv

PASS: IPv4 address: oracle-7-cloud-unpriv
PASS: IPv6 address: oracle-7-cloud-unpriv
PASS: DNS resolution: oracle-7-cloud-unpriv
PASS: cloud-init user-data provisioning: oracle-7-cloud-unpriv
PASS: cloud-init vendor-data provisioning: oracle-7-cloud-unpriv
PASS: systemd clean: oracle-7-cloud-unpriv

PASS: IPv4 address: oracle-7-cloud-priv
PASS: IPv6 address: oracle-7-cloud-priv
PASS: DNS resolution: oracle-7-cloud-priv
PASS: cloud-init user-data provisioning: oracle-7-cloud-priv
PASS: cloud-init vendor-data provisioning: oracle-7-cloud-priv
PASS: systemd clean: oracle-7-cloud-priv

PASS: IPv4 address: oracle-8-unpriv
PASS: IPv6 address: oracle-8-unpriv
PASS: DNS resolution: oracle-8-unpriv
PASS: systemd clean: oracle-8-unpriv

PASS: IPv4 address: oracle-8-priv
PASS: IPv6 address: oracle-8-priv
PASS: DNS resolution: oracle-8-priv
PASS: systemd clean: oracle-8-priv

PASS: IPv4 address: oracle-8-cloud-unpriv
PASS: IPv6 address: oracle-8-cloud-unpriv
PASS: DNS resolution: oracle-8-cloud-unpriv
PASS: cloud-init user-data provisioning: oracle-8-cloud-unpriv
PASS: cloud-init vendor-data provisioning: oracle-8-cloud-unpriv
PASS: systemd clean: oracle-8-cloud-unpriv

PASS: IPv4 address: oracle-8-cloud-priv
PASS: IPv6 address: oracle-8-cloud-priv
PASS: DNS resolution: oracle-8-cloud-priv
PASS: cloud-init user-data provisioning: oracle-8-cloud-priv
PASS: cloud-init vendor-data provisioning: oracle-8-cloud-priv
PASS: systemd clean: oracle-8-cloud-priv

This shows our test system running a simple Ubuntu 20.04 host did get working network on all Oracle images, privileged or not, cloud or not.
So far the majority of users who reported issues with this have been found to run kernels that have broken network interface ownership which then break network manager.

To see if that’s what’s affecting you, check ls -lh /sys/class/net/ inside your container.

It should look like:

stgraber@shell01:~$ ls -lh /sys/class/net/
total 0
lrwxrwxrwx 1 root   root       0 Jul 18 16:10 eth0 -> ../../devices/virtual/net/eth0
lrwxrwxrwx 1 root   root       0 Jul 18 16:10 lo -> ../../devices/virtual/net/lo
stgraber@shell01:~$ 

If eth0 is owned by nobody:nogroup, this is an indication that your kernel doesn’t properly handled network interface ownership in unprivileged kernels and that NetworkManager will therefore refuse to use it.

2 Likes

Thanks.

In this system, both the container and the host system are Oracle 8. As mentioned, using privileged container does workaround the issue and container gets IP address successfully from DHCP in the privileged case.

If the Oracle 8 container is unprivileged then it does indeed have “nobody” as the owner of the virtual eth0, as shown below, and then is unable to get a DHCP address.

Results for each case are shown below.

Container (privileged)

[ubuntu@o83sv2 ~]$ lxc exec ora83d14 bash
[root@ora83d14 ~]# ls -lh /sys/class/net
total 0
lrwxrwxrwx. 1 root root 0 Jul 19 03:40 eth0 → 
/
/devices/virtual/net/eth0
lrwxrwxrwx. 1 root root 0 Jul 19 03:40 lo → 
/
/devices/virtual/net/lo
[root@ora83d14 ~]# cat /etc/oracle-release
Oracle Linux Server release 8.4
[root@ora83d14 ~]# uname -a
Linux ora83d14 5.4.17-2102.203.5.el8uek.x86_64 #2 SMP Mon Jun 28 16:44:26 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@ora83d14 ~]#

Container (unprivileged):

[root@oel83d12 rules.d]# ls -lh /sys/class/net
total 0
lrwxrwxrwx. 1 nobody nobody 0 Jul 19 08:32 eth0 → 
/
/devices/virtual/net/eth0
lrwxrwxrwx. 1 root root 0 Jul 19 08:32 lo → 
/
/devices/virtual/net/lo
[root@oel83d12 rules.d]# uname -a
Linux oel83d12 5.4.17-2102.203.5.el8uek.x86_64 #2 SMP Mon Jun 28 16:44:26 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@oel83d12 rules.d]# cat /etc/oracle-release
Oracle Linux Server release 8.4
[root@oel83d12 rules.d]#

Right, so in this setup, NM will indeed not work. Your best bet short of finding a way to run a kernel with the needed fix is to manually switch over to something other than NM for your network configuration.

1 Like

just a note (unfortunately, not helpful in case of Oracle linux) - Centos 8 Stream image comes without nm

Could you describe the needed fix? (I’m assuming this isn’t something that can applied without building a new kernel.)

If I had the details I could at least approach someone at Oracle about incorporating it into their UEK.

@brauner should have the link to the relevant kernel pull request and patches

@trystan I did reach out to Avi Miller (@AviAtOracle) via Twitter, although he is no longer the Product Director for Oracle Linux, he would be able comment on it and/or to route it internally at Oracle. Haven’t heard back yet from Avi but it’s not even been 24 hours yet.

Pull request:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebb4a4bf76f164457184a3f43ebc1552416bc823

Patches:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f70ce185687bbe4e2d7ff126a8c890631f5fc2af
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0666a3aee762cd4f7981c2eed0fd8cab87533539
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=303a42769c4c4d8e5e3ad928df87eb36f8c1fa60
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2c4f9401ceb00167a3bfd322a28aa87b646a253f
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8f33e5d76a7a1b87e0cc760d05bf2477b4e91d6
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b52fc5d7876a312e6a964d7e626ba05ab1ea6b2
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6dee9f3893c823dff9c7f33fe0a598ee25c78f7
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d755407d4444c3e0fbd7d7c3aa666d595e9ab217
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ef6a4c88e9e11bc32cd02b052d04745af9691412

2 Likes

Thanks so much for getting that info! Much appreciated.

I decided to attempt booting the elrepo mainline kernel and sure enough the appropriate permissions were applied and NM acted on the interface.

However, I’m noticing that with a manually created bridge, the container will not pull an IP (NM says connecting waiting for IP indefinitely) if the host has an IP assigned to the same bridge.

As soon as I set the host’s ipv4 address on the bridge to ‘disabled’ the container is able to assign an IP.

For devices with a single interface this presents an issue as the only two options to bring the container into the same IP space as the host (macvlan or bridged interface) are not available.

What platform are you running on?
This behavior of only the host or a container being able to get an address usually points towards the network switch filtering MAC addresses.

Oracle Linux 8 w/ the bridge (STP off) on top of a team

team details:

{"runner":{"name":"loadbalance","tx_balancer":{"name":"basic"}},"link_watch":{"name":"ethtool","delay_up":"0","delay_down":"0"}}

So I’m assuming that “team” is some kind of wrapper around standard Linux bonding. All that is fine. If there is MAC filtering, it’s likely outside of your system (done by the physical switch or a vSwitch in a virtual environment).

I swapped from redhat’s new ‘team’ feature to the legacy ‘bonding’ using the closest comparable settings and this resolved it. Both host and container are now happily sharing the interface via a bridge.

As for the NM inside the LXC container: UEK kernel needs privileged mode, elrepo mainline 5.13 kernel does not. Hopefully I can convince someone at Oracle to add that PR to their kernel.

Hey folks. as @Gilbert_Standen mentioned, I’m no longer an Oracle Linux PM, but I did send this thread to the current PM for him to review with our engineering team.

2 Likes

Thanks!

This has been logged internally as bug 33141684. If any of you have a paid Oracle Linux support subcription, I recommend opening an SR for this issue so that we get some customers attached to the bug (which raises its priority).

1 Like

Can someone please shed some light on how this is a solution? I’m not really certain how quickly I can see this pull request making it’s way to my lxc host kernel.

I’m running some CentOS 8 containers. Even with static IPs set inside the container, I have to run ifup after each restart. Which is REALLY inconvenient when you need a container to start up and provide DHCP and DNS to your network


1 Like

Hello.
After updating almalinux (RHEL based) in LXC I faced the same problem. After the update, the NetworkManager was be installed. (Before that was are network-scripts for managing network) Solution for me:

systemctl disable NetworkManager.service
reboot

Unfortunately in EL8 network-scripts are not installed by default

So the process for EL8 or any of its clone distro’s is


systemctl disable NetworkManager
dhclient eth0
yum -y install network-scripts
systemctl enable network --now

I’ve managed to create a cloud-init bootcmd section which automatically works around this issue for RHEL8 devilled containers
 Disable NetworkManger which is ignoring veth derived devices (e.g. eth0) and then bring up the eth0 manually and then install and enable old school networking which does work.

config:
raw.idmap: both 1001 1001
user.user-data: |-
#cloud-config
package_update: true
timezone: Europe/London
bootcmd:
- [ cloud-init-per, once, nmdis, systemctl, disable, NetworkManager, --now ]
- [ cloud-init-per, once, eth0up, dhclient, eth0 ]
- [ cloud-init-per, once, epel, yum, -y, install, epel-release, network-scripts ]
- [ cloud-init-per, once, nwup, systemctl, enable, network, --now ]