When LXD creates NAT rules for proxy, could it also SNAT back hairpin connections from lxdbr0?

johanehnberg · April 14, 2022, 7:37am

The listen config currently creates hairpin rules nicely. But there is one typical scenario where the rules generated do not quite suffice. Consider the following scenario:

One server with one public IP
Container 1: a reverse proxy with a listen config for ports 80 and 443 on the public IP
Containers 2 and 3: application servers that are normally accessed through the reverse proxy

When the application servers want to address each other in a location-agnostic manner, they should use the FQDN, which translates to the public IP. The current NAT rules translates the packets from the application container to the reverse proxy to the proxy local IP, but the response is not SNAT’ed or MASQUERADEd back, resulting in a timeout.

Affected examples include federated cloud applications such as Nextcloud as well as Nextcloud’s integration with Collabora Office. Basically this applies to any application that wants to talk over a REST HTTPS API with another one when it happens to be on the same server.

There are obviously a ton of workarounds possible (IPv6, split horizon DNS, application intelligence, manually added NAT rules), so this question is really about having something elegant that integrates well with LXD. That way, it would not require a lot of orchestration as the container landscape evolves.

Listen rules can use ports, but since SNAT rules cannot rely on ports, such rules cannot be created 1:1 with the listen rules. The only theoretical approach I’ve come up with so far is SNAT based on connection marking. Have there been any other approaches already explored in LXD? Are there any other solutions for LXD but implemented outside of LXD out there?

tomp · April 14, 2022, 8:12am

What are you referring to when you say “listen rules”, this isn’t a concept I am familiar with in LXD?

Can you show the the sudo iptables-save (or sudo nft list ruleset if you’re using nftables) along with a reproducer command that isn’t working (e.g. curl) and lxc config show <instance> --expanded for the relevant instances?

johanehnberg · April 14, 2022, 8:24am

Sorry, I mean the port forwarding function in LXD for the proxy device. For some reason, I remembered it as listen which is the first key for it. Probably because as we use it, it is not really a proxy.

The relevant bits from iptables-save -t nat on one host:

# Generated by iptables-save v1.8.4 on Thu Apr 14 06:15:28 2022
*nat
:PREROUTING ACCEPT [27004:1767638]
:INPUT ACCEPT [7093:620126]
:OUTPUT ACCEPT [8839:663732]
:POSTROUTING ACCEPT [34487:2337309]
-A PREROUTING -d 91.190.196.250/32 -p tcp -m tcp --dport 80 -m comment --comment "generated for LXD container proxy-aec2 (tcp91.190.196.250port80)" -j DNAT --to-destination 10.10.2.5:80
-A PREROUTING -d 91.190.196.250/32 -p tcp -m tcp --dport 443 -m comment --comment "generated for LXD container proxy-aec2 (tcp91.190.196.250port443)" -j DNAT --to-destination 10.10.2.5:443
-A OUTPUT -d 91.190.196.250/32 -p tcp -m tcp --dport 80 -m comment --comment "generated for LXD container proxy-aec2 (tcp91.190.196.250port80)" -j DNAT --to-destination 10.10.2.5:80
-A OUTPUT -d 91.190.196.250/32 -p tcp -m tcp --dport 443 -m comment --comment "generated for LXD container proxy-aec2 (tcp91.190.196.250port443)" -j DNAT --to-destination 10.10.2.5:443
-A POSTROUTING -s 10.10.2.5/32 -d 10.10.2.5/32 -p tcp -m tcp --dport 80 -m comment --comment "generated for LXD container proxy-aec2 (tcp91.190.196.250port80)" -j MASQUERADE
-A POSTROUTING -s 10.10.2.5/32 -d 10.10.2.5/32 -p tcp -m tcp --dport 443 -m comment --comment "generated for LXD container proxy-aec2 (tcp91.190.196.250port443)" -j MASQUERADE
-A POSTROUTING -s 10.10.2.0/24 ! -d 10.10.2.0/24 -m comment --comment "generated for LXD network lxdbr0" -j MASQUERADE
COMMIT
# Completed on Thu Apr 14 06:15:28 2022

The relevant config:

devices:
  eth0:
    ipv4.address: 10.10.2.5
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  tcp91.190.196.250port80:
    connect: tcp:10.10.2.5:80
    listen: tcp:91.190.196.250:80
    nat: "true"
    type: proxy
  tcp91.190.196.250port443:
    connect: tcp:10.10.2.5:443
    listen: tcp:91.190.196.250:443
    nat: "true"
    type: proxy

tomp · April 14, 2022, 8:28am

And the reproducer commands/setup details to show the issue?

tomp · April 14, 2022, 8:29am

I’m looking to understand the issue more concretely, as there may already be a fix in LXD that needs a specific kernel module to be loaded.

johanehnberg · April 14, 2022, 8:30am

I think the most simple reproducer is to run this on container 2 or 3:
telnet 91.190.196.250 80

tomp · April 14, 2022, 8:33am

Do you have br_netfilter kernel module loaded?

sudo lsmod | grep br_netfilter?

As we have detection for this kind of hairpin request in LXD:

github.com

lxc/lxd/blob/master/lxd/device/proxy.go#L413-L439

      
        
            	err = network.BridgeNetfilterEnabled(ipVersion)
            	if err != nil {
            		msg := fmt.Sprintf("IPv%d bridge netfilter not enabled. Instances using the bridge will not be able to connect to the proxy listen IP", ipVersion)
            		d.logger.Warn(msg, logger.Ctx{"err": err})
            		err := d.state.Cluster.UpsertWarningLocalNode(d.inst.Project(), cluster.TypeInstance, d.inst.ID(), db.WarningProxyBridgeNetfilterNotEnabled, fmt.Sprintf("%s: %v", msg, err))
            		if err != nil {
            			logger.Warn("Failed to create warning", logger.Ctx{"err": err})
            		}
            	} else {
            		err = warnings.ResolveWarningsByLocalNodeAndProjectAndTypeAndEntity(d.state.Cluster, d.inst.Project(), db.WarningProxyBridgeNetfilterNotEnabled, cluster.TypeInstance, d.inst.ID())
            		if err != nil {
            			logger.Warn("Failed to resolve warning", logger.Ctx{"err": err})
            		}
            
            
		if hostName == "" {
            			return fmt.Errorf("Proxy cannot find bridge port host_name to enable hairpin mode")
            		}
            
            
		// br_netfilter is enabled, so we need to enable hairpin mode on instance's bridge port otherwise
            		// the instances on the bridge will not be able to connect to the proxy device's listn IP and the

This file has been truncated. show original

johanehnberg · April 14, 2022, 8:34am

From outside, it works as expected. But from any host on lxdbr0’s subnet, it times out since the reply packet does not return from 91.190.196.250. As such is a hairpin NAT case.

tomp · April 14, 2022, 8:34am

You can also check the LXD logs for:

tomp · April 14, 2022, 8:38am

Oh and lxc warning ls and lxc warning show may help too.

johanehnberg · April 14, 2022, 8:41am

OK, so the logs indeed show the message and br_netfilter is not loaded. It does not show up in warnings though.

But the best integrated way is thus to enable br_netfilter. I assume there are some drawbacks since this is not enabled automatically?

tomp · April 14, 2022, 8:42am

Yes loading br_netfilter will unlock that functionality and restarting the instance.
The reason we never load the br_netfilter module is because it will then potentially apply the system’s existing firewall rules to intra-bridge traffic, which may cause unexpected disruption, depending on the rules in place.

johanehnberg · April 14, 2022, 8:56am

Thanks!

Here is the shorthand for anyone finding themselves on this thread:

lxc profile set default linux.kernel_modules br_netfilter

Or for lxd init yaml:

profiles:
- name: default
  config:
    linux.kernel_modules: br_netfilter

johanehnberg · April 18, 2022, 9:41am

It is worth adding a note about a caveat here:

When adding the br_netfilter module parameter to a running container, it breaks its existing hairpin NAT setup (at least within some days). However, the new rule using br_netfilter does not take effect until after a container reboot.

I noticed this when certain quite specific PHP routines started failing with a timeout. They tried to fetch an HTTPS resource that happened to be hosted on the same container.

kamzar1 · December 22, 2022, 2:29pm

Got a similar issue, mocking br_netfilter, but it is related to forward listen_ip:

level=warning msg="IPv4 bridge netfilter not enabled. Instances using the bridge will not be able to connect to the forward listen IPs" driver=bridge err="br_netfilter kernel module not loaded" network=lxdbr0 project=default

This might solve the issue:
modprobe br_netfilter

And perhaps this as well?

echo "net.bridge.bridge-nf-call-iptables=1" >> /etc/sysctl.conf
sysctl -p

Though, i dont have iptables and using nftables.
I thought nftables do not require br_netfilter anymore.