How to investigate why IPv4 proxy stops for one container?

I have a setup with multiple LXD containers. All of them do things like this:

  proxy_587:
    connect: tcp:10.68.0.104:587
    listen: tcp:65.21.191.121:587
    nat: "true"
    type: proxy 

Only one of them hits a problem where it stops responding on the IPv4 port-forwards. Every other one works.

There’s no duplication of port forwards either, so it’s not like one container is taking over port forwards of another.

A simple lxc restart <container> works, and the port forwards come back.

I could script this on a systemd timer, but I’d love to know where to even start poking at this. I’m using nftables on an Ubuntu 22.04 host.

We might be having similar issues; I’ll cross post if I come up with a solution that doesn’t originate from the container’s OS; but I’m baffled too.

Future debugging note - with nftables, you can use ‘nft monitor trace’, but you’ll need to do some setup.

Following an article in fedora magazine (Network address translation part 1 – packet tracing - Fedora Magazine):

# Create a new chain that can be dropped to remove debugging
nft 'add chain inet trace_debug trace_pre { type filter hook prerouting priority -200000; }'
# Add a rule to match particular packets,  and use meta nftrace set 1 to turn on tracing
# Set PUBLIC_IP
export PUBLIC_IP=1.1.1.1
nft "insert rule inet trace_debug trace_pre ip daddr ${PUBLIC_IP} tcp dport 587 tcp flags syn limit rate 1/second meta nftrace set 1"
# Start tracing
nft monitor trace

Of course, I learned this after it broke again. What I did find is that restarting the container moves the rules around in the chain (to be expected, I guess it’s appending). Rules look the same though. Could be an ordering issue, but that would imply something is moving the rules after they were created.

Cheers @shimmy, not sure your issue is the same as mine. I don’t have dhclient issues etc - the containers come up just fine, it’s only the MX container that randomly loses ip routing on the forwarded ports (it’s running just fine on IPv6 and is reachable). telnet says ‘no route to host’ for the port when it’s unavailable, which might be one of the LXD default chain entries somehow?

Can you get the output of sudo nft list ruleset when it is working and when it isn’t working please.

Yep, will do. It takes at least a week to manifest after I’ve done a restart of the container. Have monitoring on the public port now, so I can at least look at it much closer to when it happens.

Ok, 20 days later, it failed (right as I was going to sleep too, so I was grumpy and only captured a trace and the ruleset, didn’t look in the container).

Reading the nft traces, the difference is that a working trace goes through the following chains:

  • trace_pre (the tracing-enable chain)
  • prert.mx.proxy_[port] (from the proxy statement)
  • fwd.lxdbr0
  • pstrt.mx.proxy_[port]
  • fwdpstrt.lxdbr0
  • pstrt.lxdbr0
  • pstrt.[other containers].proxy_[port] (but not the mx container)

A broken trace is shorter.

  • trace_pre
  • fwdprert.lxdbr0
  • fwd.lxdbr0
  • fwdpstrt.lxdbr0
  • pstrt.lxdbr0
  • pstrt.[container].proxy_[port] (all of them)

The packet trace then repeats, because of retries on the client side.

Working out whether I can attach the captures… - https://nextcloud.cricalix.net/index.php/s/kx5g8HeENZCSeMy has them.

Have now had a chance to start log diving, and there’s a veeeeery suspicious set of entries in syslog on the host. LXD snap upgraded via a snap refresh. Log file is included at the URL in the previous message. 23:17 is the refresh, 23:31 is me logging in and doing lxc restart mx to kick it so I could go back to bed.

The most finger-pointing line is:
Jan 30 23:31:51 mackerel lxd.daemon[2466942]: time="2023-01-30T23:31:51Z" level=warning msg="IPv4 bridge netfilter not enabled. Instances using the bridge will not be able to connect to the proxy listen IP" device=proxy_587 driver=proxy err="br_netfilter kernel module not loaded" instance=mx project=default

which is repeated for port 25 (and earlier on during the refresh for lxdbr0!)

I note that a restart of the container has put the *.mx.* rules at the end of the ruleset. I have to wonder if what I’m seeing is a combination of “the mx container is configured badly in some way” plus container restarts putting the rules in an odd place because of that mis-configuration, and then the lxd refresh is causing a different rewrite of the rules and breaking access.

None of my containers specify that they need br_netfilter.

None of the other containers emit that log message.

The devices config for the mx container is (disks elided for brevity):

devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
    ipv4.address: 10.68.0.104
    ipv6.address: 2a01:4f9:c012:7dfb:76c9:1422:e591:48ad
...
  proxy_25:
    connect: tcp:10.68.0.104:25
    listen: tcp:65.21.191.121:25
    nat: "true"
    type: proxy
  proxy_587:
    connect: tcp:10.68.0.104:587
    listen: tcp:65.21.191.121:587
    nat: "true"
    type: proxy

For the mail container, which has never had issues:

devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
    ipv4.address: 10.68.0.103
    ipv6.address: 2a01:4f9:c012:7dfb:2a0e:72e8:6687:2235
...
  proxy_993:
    connect: tcp:10.68.0.103:993
    listen: tcp:65.21.191.121:993
    nat: "true"
    type: proxy

Appears to be similar to When LXD creates NAT rules for proxy, could it also SNAT back hairpin connections from lxdbr0? … but why this one container? Why not all of them? In all cases, I’m not hair-pinning on the bridge from one container to another, it’s external traffic to the container that breaks.

The br_netfilter issue is unlikely to be the issue as it would be happening the whole time for all instances.

The most likely scenario sounds like snap refresh is causing LXD to reload its firewall rules (normal) but somehow the fresh ordering is conflicting with other rules on the system.

Do you have docker installed on the host by any chance?

I did ask for the full sudo nft list ruleset output before/after, as that will really help to see the state when its working and not working.

You could try simulating a snap refresh by reloading LXD (which doesn’t restart containers):

sudo systemctl reload snap.lxd.daemon

No docker.

The full rules dump, broken and working, are at Nextcloud - can inline them if you prefer for record keeping.

Reloading the daemon causes the failure to happen (and only in that one container’s connectivity).

Rebooted the host, and let it start all the containers from a clean slate. All the containers have a delay on the startup, and are sequenced. Squid, db, mail, everything else (so if it was related to this, I’d sort of expect the www container to fail the same way).

Post-reboot rules from nft added to the link above.

Tried some more reproduction.

  1. Reboot the host
  2. Before lxd starts services, nft ruleset is 2-before-containers-rules.txt
  3. After the containers start, nft ruleset is now 3-after-containers-start-rules.txt
  4. nc 65.21.191.121 25 results in the Postfix ESMTP banner
  5. systemctl reload snap.lxd.daemon, nft ruleset is now 5-post-reload-lxd-rules.txt
  6. The same nc command fails.
  7. Reboot the host to clean-slate everything again.

It reads to me that pre-reload, the rules are organised as NIC, then container proxy rules. Post-reload, the ordering is container proxy, then NIC rules.

Thanks for this.
Yes I can see the issue.

You appear to have a separate network forward listening on 65.21.191.121 forwarding to 10.68.0.179.

Can you show me lxc network forward list lxdbr0?

So they are in conflict.

lxc network forward list lxdbr0
+----------------+-------------+------------------------+-------+
| LISTEN ADDRESS | DESCRIPTION | DEFAULT TARGET ADDRESS | PORTS |
+----------------+-------------+------------------------+-------+
| 65.21.191.121  |             |                        | 5     |
+----------------+-------------+------------------------+-------+

Interesting about .179 - that’s not an allocated IP, which would explain things failing.

lxc list
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
|  NAME   |  STATE  |        IPV4        |                     IPV6                      |   TYPE    | SNAPSHOTS |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| db      | RUNNING | 10.68.0.238 (eth0) | 2a01:4f9:c012:7dfb:78c5:3af:9b47:fbaa (eth0)  | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| emoncms | RUNNING | 10.68.0.135 (eth0) | 2a01:4f9:c012:7dfb:fb5a:3cb4:83fe:1d65 (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| mail    | RUNNING | 10.68.0.103 (eth0) | 2a01:4f9:c012:7dfb:2a0e:72e8:6687:2235 (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| mx      | RUNNING | 10.68.0.104 (eth0) | 2a01:4f9:c012:7dfb:76c9:1422:e591:48ad (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| squid   | RUNNING | 10.68.0.33 (eth0)  | 2a01:4f9:c012:7dfb:c015:6563:6362:70f (eth0)  | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| vpn     | RUNNING | 10.9.8.1 (wg0)     | 2a01:4f9:c012:7dfb:c030:da81:5a3e:bb00 (eth0) | CONTAINER | 0         |
|         |         | 10.68.0.170 (eth0) |                                               |           |           |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| www     | RUNNING | 10.68.0.73 (eth0)  | 2a01:4f9:c012:7dfb:c1d1:5420:9291:bf24 (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+

Checking for port forwards:

grep proxy_ *
mail.yaml:  proxy_993:
mx.yaml:  proxy_25:
mx.yaml:  proxy_587:
vpn.yaml:  proxy_51820:
www.yaml:  proxy_80:
www.yaml:  proxy_443:

All instances of containers were basically profile edited from the yaml file, then booted from that profile.

Details on the forwarding list show clearly that the wrong IP is in use.

lxc network forward show lxdbr0 65.21.191.121
description: ""
config: {}
ports:
- description: SMTP
  protocol: tcp
  listen_port: "25"
  target_port: "25"
  target_address: 10.68.0.179
- description: SMTP Submission
  protocol: tcp
  listen_port: "587"
  target_port: "587"
  target_address: 10.68.0.179
- description: SMTP Submission backup
  protocol: tcp
  listen_port: "588"
  target_port: "588"
  target_address: 10.68.0.179
- description: IMAP
  protocol: tcp
  listen_port: "143"
  target_port: "143"
  target_address: 10.68.0.103
- description: IMAPS
  protocol: tcp
  listen_port: "993"
  target_port: "993"
  target_address: 10.68.0.103
listen_address: 65.21.191.121
location: none

If I go back in my history, sure enough, lxc network forward port add lxdbr0 65.21.191.121 tcp 589 10.68.0.179 587

That must have been from before I used the proxy setup; indeed, my blog notes-to-self point out I discovered both ways of doing things (https://www.cricalix.net/2022/09/14/rebuilding-cricalix-net-part-2/), but apparently I never decided to delete all the forwards.

I’ve now edited the forwarding list, removed all the forwards.

lxc network forward list lxdbr0
+----------------+-------------+------------------------+-------+
| LISTEN ADDRESS | DESCRIPTION | DEFAULT TARGET ADDRESS | PORTS |
+----------------+-------------+------------------------+-------+
| 65.21.191.121  |             |                        | 0     |
+----------------+-------------+------------------------+-------+

systemctl reload snap.lxd.daemon, and

nc 65.21.191.121 25
220 ESMTP Postfix

Thank you @tomp, that was the brick to the head I needed to work out what was wrong.

Is this something that lxd should detect, if it can? Basically, “you’re assigning a proxy for a port in a profile, but you’re also forwarding on the NIC, this will end badly”?

1 Like

Yes we could probably do with that conflict check indeed.

Shall I go file an issue on the Github side, pointing to this discussion?

Yes thanks