How to investigate why IPv4 proxy stops for one container?

cricalix · December 29, 2022, 5:54pm

I have a setup with multiple LXD containers. All of them do things like this:

  proxy_587:
    connect: tcp:10.68.0.104:587
    listen: tcp:65.21.191.121:587
    nat: "true"
    type: proxy

Only one of them hits a problem where it stops responding on the IPv4 port-forwards. Every other one works.

There’s no duplication of port forwards either, so it’s not like one container is taking over port forwards of another.

A simple lxc restart <container> works, and the port forwards come back.

I could script this on a systemd timer, but I’d love to know where to even start poking at this. I’m using nftables on an Ubuntu 22.04 host.

shimmy · January 1, 2023, 8:32pm

We might be having similar issues; I’ll cross post if I come up with a solution that doesn’t originate from the container’s OS; but I’m baffled too.

cricalix · January 8, 2023, 10:45am

Future debugging note - with nftables, you can use ‘nft monitor trace’, but you’ll need to do some setup.

Following an article in fedora magazine (Network address translation part 1 – packet tracing - Fedora Magazine):

# Create a new chain that can be dropped to remove debugging
nft 'add chain inet trace_debug trace_pre { type filter hook prerouting priority -200000; }'
# Add a rule to match particular packets,  and use meta nftrace set 1 to turn on tracing
# Set PUBLIC_IP
export PUBLIC_IP=1.1.1.1
nft "insert rule inet trace_debug trace_pre ip daddr ${PUBLIC_IP} tcp dport 587 tcp flags syn limit rate 1/second meta nftrace set 1"
# Start tracing
nft monitor trace

Of course, I learned this after it broke again. What I did find is that restarting the container moves the rules around in the chain (to be expected, I guess it’s appending). Rules look the same though. Could be an ordering issue, but that would imply something is moving the rules after they were created.

Cheers @shimmy, not sure your issue is the same as mine. I don’t have dhclient issues etc - the containers come up just fine, it’s only the MX container that randomly loses ip routing on the forwarded ports (it’s running just fine on IPv6 and is reachable). telnet says ‘no route to host’ for the port when it’s unavailable, which might be one of the LXD default chain entries somehow?

tomp · January 10, 2023, 3:00pm

Can you get the output of sudo nft list ruleset when it is working and when it isn’t working please.

cricalix · January 10, 2023, 9:38pm

Yep, will do. It takes at least a week to manifest after I’ve done a restart of the container. Have monitoring on the public port now, so I can at least look at it much closer to when it happens.

cricalix · January 31, 2023, 8:14am

Ok, 20 days later, it failed (right as I was going to sleep too, so I was grumpy and only captured a trace and the ruleset, didn’t look in the container).

Reading the nft traces, the difference is that a working trace goes through the following chains:

trace_pre (the tracing-enable chain)
prert.mx.proxy_[port] (from the proxy statement)
fwd.lxdbr0
pstrt.mx.proxy_[port]
fwdpstrt.lxdbr0
pstrt.lxdbr0
pstrt.[other containers].proxy_[port] (but not the mx container)

A broken trace is shorter.

trace_pre
fwdprert.lxdbr0
fwd.lxdbr0
fwdpstrt.lxdbr0
pstrt.lxdbr0
pstrt.[container].proxy_[port] (all of them)

The packet trace then repeats, because of retries on the client side.

Working out whether I can attach the captures… - https://nextcloud.cricalix.net/index.php/s/kx5g8HeENZCSeMy has them.

cricalix · January 31, 2023, 8:44am

Have now had a chance to start log diving, and there’s a veeeeery suspicious set of entries in syslog on the host. LXD snap upgraded via a snap refresh. Log file is included at the URL in the previous message. 23:17 is the refresh, 23:31 is me logging in and doing lxc restart mx to kick it so I could go back to bed.

The most finger-pointing line is:
Jan 30 23:31:51 mackerel lxd.daemon[2466942]: time="2023-01-30T23:31:51Z" level=warning msg="IPv4 bridge netfilter not enabled. Instances using the bridge will not be able to connect to the proxy listen IP" device=proxy_587 driver=proxy err="br_netfilter kernel module not loaded" instance=mx project=default

which is repeated for port 25 (and earlier on during the refresh for lxdbr0!)

I note that a restart of the container has put the *.mx.* rules at the end of the ruleset. I have to wonder if what I’m seeing is a combination of “the mx container is configured badly in some way” plus container restarts putting the rules in an odd place because of that mis-configuration, and then the lxd refresh is causing a different rewrite of the rules and breaking access.

None of my containers specify that they need br_netfilter.

None of the other containers emit that log message.

The devices config for the mx container is (disks elided for brevity):

devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
    ipv4.address: 10.68.0.104
    ipv6.address: 2a01:4f9:c012:7dfb:76c9:1422:e591:48ad
...
  proxy_25:
    connect: tcp:10.68.0.104:25
    listen: tcp:65.21.191.121:25
    nat: "true"
    type: proxy
  proxy_587:
    connect: tcp:10.68.0.104:587
    listen: tcp:65.21.191.121:587
    nat: "true"
    type: proxy

For the mail container, which has never had issues:

devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
    ipv4.address: 10.68.0.103
    ipv6.address: 2a01:4f9:c012:7dfb:2a0e:72e8:6687:2235
...
  proxy_993:
    connect: tcp:10.68.0.103:993
    listen: tcp:65.21.191.121:993
    nat: "true"
    type: proxy

cricalix · January 31, 2023, 8:47am

Appears to be similar to When LXD creates NAT rules for proxy, could it also SNAT back hairpin connections from lxdbr0? … but why this one container? Why not all of them? In all cases, I’m not hair-pinning on the bridge from one container to another, it’s external traffic to the container that breaks.

tomp · January 31, 2023, 9:19am

The br_netfilter issue is unlikely to be the issue as it would be happening the whole time for all instances.

The most likely scenario sounds like snap refresh is causing LXD to reload its firewall rules (normal) but somehow the fresh ordering is conflicting with other rules on the system.

Do you have docker installed on the host by any chance?

I did ask for the full sudo nft list ruleset output before/after, as that will really help to see the state when its working and not working.

tomp · January 31, 2023, 9:20am

You could try simulating a snap refresh by reloading LXD (which doesn’t restart containers):

sudo systemctl reload snap.lxd.daemon

cricalix · January 31, 2023, 7:25pm

No docker.

The full rules dump, broken and working, are at Nextcloud - can inline them if you prefer for record keeping.

cricalix · January 31, 2023, 7:35pm

Reloading the daemon causes the failure to happen (and only in that one container’s connectivity).

Rebooted the host, and let it start all the containers from a clean slate. All the containers have a delay on the startup, and are sequenced. Squid, db, mail, everything else (so if it was related to this, I’d sort of expect the www container to fail the same way).

Post-reboot rules from nft added to the link above.

Tried some more reproduction.

Reboot the host
Before lxd starts services, nft ruleset is 2-before-containers-rules.txt
After the containers start, nft ruleset is now 3-after-containers-start-rules.txt
nc 65.21.191.121 25 results in the Postfix ESMTP banner
systemctl reload snap.lxd.daemon, nft ruleset is now 5-post-reload-lxd-rules.txt
The same nc command fails.
Reboot the host to clean-slate everything again.

It reads to me that pre-reload, the rules are organised as NIC, then container proxy rules. Post-reload, the ordering is container proxy, then NIC rules.

tomp · February 2, 2023, 10:06am

Thanks for this.
Yes I can see the issue.

You appear to have a separate network forward listening on 65.21.191.121 forwarding to 10.68.0.179.

Can you show me lxc network forward list lxdbr0?

So they are in conflict.

cricalix · February 2, 2023, 6:00pm

lxc network forward list lxdbr0
+----------------+-------------+------------------------+-------+
| LISTEN ADDRESS | DESCRIPTION | DEFAULT TARGET ADDRESS | PORTS |
+----------------+-------------+------------------------+-------+
| 65.21.191.121  |             |                        | 5     |
+----------------+-------------+------------------------+-------+

Interesting about .179 - that’s not an allocated IP, which would explain things failing.

lxc list
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
|  NAME   |  STATE  |        IPV4        |                     IPV6                      |   TYPE    | SNAPSHOTS |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| db      | RUNNING | 10.68.0.238 (eth0) | 2a01:4f9:c012:7dfb:78c5:3af:9b47:fbaa (eth0)  | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| emoncms | RUNNING | 10.68.0.135 (eth0) | 2a01:4f9:c012:7dfb:fb5a:3cb4:83fe:1d65 (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| mail    | RUNNING | 10.68.0.103 (eth0) | 2a01:4f9:c012:7dfb:2a0e:72e8:6687:2235 (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| mx      | RUNNING | 10.68.0.104 (eth0) | 2a01:4f9:c012:7dfb:76c9:1422:e591:48ad (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| squid   | RUNNING | 10.68.0.33 (eth0)  | 2a01:4f9:c012:7dfb:c015:6563:6362:70f (eth0)  | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| vpn     | RUNNING | 10.9.8.1 (wg0)     | 2a01:4f9:c012:7dfb:c030:da81:5a3e:bb00 (eth0) | CONTAINER | 0         |
|         |         | 10.68.0.170 (eth0) |                                               |           |           |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+
| www     | RUNNING | 10.68.0.73 (eth0)  | 2a01:4f9:c012:7dfb:c1d1:5420:9291:bf24 (eth0) | CONTAINER | 0         |
+---------+---------+--------------------+-----------------------------------------------+-----------+-----------+

Checking for port forwards:

grep proxy_ *
mail.yaml:  proxy_993:
mx.yaml:  proxy_25:
mx.yaml:  proxy_587:
vpn.yaml:  proxy_51820:
www.yaml:  proxy_80:
www.yaml:  proxy_443:

All instances of containers were basically profile edited from the yaml file, then booted from that profile.

Details on the forwarding list show clearly that the wrong IP is in use.

lxc network forward show lxdbr0 65.21.191.121
description: ""
config: {}
ports:
- description: SMTP
  protocol: tcp
  listen_port: "25"
  target_port: "25"
  target_address: 10.68.0.179
- description: SMTP Submission
  protocol: tcp
  listen_port: "587"
  target_port: "587"
  target_address: 10.68.0.179
- description: SMTP Submission backup
  protocol: tcp
  listen_port: "588"
  target_port: "588"
  target_address: 10.68.0.179
- description: IMAP
  protocol: tcp
  listen_port: "143"
  target_port: "143"
  target_address: 10.68.0.103
- description: IMAPS
  protocol: tcp
  listen_port: "993"
  target_port: "993"
  target_address: 10.68.0.103
listen_address: 65.21.191.121
location: none

If I go back in my history, sure enough, lxc network forward port add lxdbr0 65.21.191.121 tcp 589 10.68.0.179 587

That must have been from before I used the proxy setup; indeed, my blog notes-to-self point out I discovered both ways of doing things (https://www.cricalix.net/2022/09/14/rebuilding-cricalix-net-part-2/), but apparently I never decided to delete all the forwards.

I’ve now edited the forwarding list, removed all the forwards.

lxc network forward list lxdbr0
+----------------+-------------+------------------------+-------+
| LISTEN ADDRESS | DESCRIPTION | DEFAULT TARGET ADDRESS | PORTS |
+----------------+-------------+------------------------+-------+
| 65.21.191.121  |             |                        | 0     |
+----------------+-------------+------------------------+-------+

systemctl reload snap.lxd.daemon, and

nc 65.21.191.121 25
220 ESMTP Postfix

Thank you @tomp, that was the brick to the head I needed to work out what was wrong.

Is this something that lxd should detect, if it can? Basically, “you’re assigning a proxy for a port in a profile, but you’re also forwarding on the NIC, this will end badly”?

tomp · February 2, 2023, 8:37pm

Yes we could probably do with that conflict check indeed.

cricalix · February 3, 2023, 8:38am

Shall I go file an issue on the Github side, pointing to this discussion?

tomp · February 3, 2023, 10:52am

Yes thanks

tomp · February 16, 2023, 10:46am

github.com/lxc/lxd

LXD does not cross-check if network forwards conflict with proxy port forwarding

opened 05:19PM - 03 Feb 23 UTC

cricalix

Bug Easy

For context, see https://discuss.linuxcontainers.org/t/how-to-investigate-why-ip…v4-proxy-stops-for-one-container/16014/. # Required information ``` driver: lxc driver_version: 5.0.2 firewall: nftables kernel: Linux kernel_architecture: x86_64 kernel_features: idmapped_mounts: "true" netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" shiftfs: "false" uevent_injection: "true" unpriv_fscaps: "true" kernel_version: 5.15.0-58-generic lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: Ubuntu os_version: "22.04" project: default server: lxd server_clustered: false server_event_mode: full-mesh server_name: mackerel server_pid: 1110326 server_version: "5.10" storage: zfs storage_version: 2.1.4-0ubuntu0.1 ``` # Issue description If both `lxc network forward port add lxdbr0` and profile proxy configurations are used to set up port forwarding, a `snap refresh` or `systemctl reload` of lxd will potentially break the firewall rules used to enact the forwarding. My specific environment is nftables, with `type: proxy`, `nat: "true"` devices set up in profiles. # Steps to reproduce 1. Boot a new container, using DHCP addressing. (Do not, at this point, enable static/long term memory of addresses in lxd) 2. Add a network forward for a TCP port from the public IP of the host to the internal IP of the container (tcp 25 as an example). 3. Verify the forwarding works (set up something to listen on the port) 4. Add a new profile from default, and add a devices section that sets up a NAT proxy for a port. For instance, ``` proxy_25: connect: tcp:10.68.0.103:25 # container ip listen: tcp:1.2.3.4:25 # public ip nat: "true" type: proxy ``` 5. Stop and delete the container 6. Create the container again, from the profile that has the proxy device; it should get a different IP. If it doesn't, make it. 7. Verify that the port forwarding works. If it doesn't, reboot the host and see if it does. If it doesn't, I'm not sure what's different from my scenario. 8. `systemctl reload snap.lxd.daemon` 9. Verify that the port forwarding works. It **should not**. Review `nft list ruleset` for the port. It should appear in several chains with both the current container IP, and the one used in the `network forward port` command. The network forward should have put the forward in something like `chain fwdprert.lxdbr0`. It'll be a `dnat` rule, with the receiving IP being the old container IP. The proxy forward should have put the forward in `chain prert.[container].[proxy_name]`. It'll be the same `dnat` rule, but the receiving IP will be the current container IP. This feels like something that lxd should be able to detect, as it has full context on both network forwards and proxy devices being set up with the same public IP and port.