Best practice for postrouting rule order with nftables?

johanehnberg · November 19, 2024, 10:57am

Hello,

I have a number of hosts that use multiple public IP’s, of which ports are directed to different containers. For most cases, simple proxy devices work fine. There are however cases where we need to have outbound traffic SNAT’ed to the same public IP it has forwards on, such as those sending email.

The feature request Allow setting SNAT address per container · Issue #8614 · canonical/lxd · GitHub would make this manageable basically as a per-container ipv[46].nat.address. Currently we have simple SNAT rules added separately for this purpose. However, our current script feels clumsy, and I wanted to hear if anyone has a better practice for such a case?

Issue 1: our script relies on /etc/ufw/before.rules, which in turn relies on iptables-nft. Incus is using nftables natively so we have to manually adjust the chain priority since incus’ ipv4.nat.order: after has no effect in this context (the nftables equivalent for ipv4.nat.order: after would have incus adjust its hook priority by +10 or similar).

Issue 2: our script does not inherently connect with the container config. It would take a lot of work to automate detection of container IP changes etc., so for now that is done manually.

Any suggestions to overcome these with something more elegant are very welcome!

johanehnberg · November 19, 2024, 2:05pm

For completeness, our approach to fixing the priority is the following adjustment in nft after ufw has loaded:

NFTR="$(nft list chain nat POSTROUTING | sed 's/srcnat/90/')"
nft delete chain ip nat POSTROUTING
echo "$NFTR" | nft --file -

This script could just as well load the whole ruleset but that is still seems clumsy.

stgraber · November 19, 2024, 2:54pm

I have a bit of planned work for a customer to implement a way to do SNAT from within a network forward. So you’d basically have the network forward own the external IP and then dispatch various ports to various instances, but also have a config flag on that forward indicating to snat the traffic going out to the forward IP.

I think that might actually work for your use case as forwarding tcp/25 from a network forward onto a target instance and then having that upcoming snat=true config key will make it so that anything coming out of that server from tcp/25 would get snated too. That however assumes that the mail server uses its own port as the source, if not, then that won’t really work for you.

Adding an ipv4.nat.address or similar config key on the NIC device itself could work too for your case (not really for my customer’s).

johanehnberg · November 19, 2024, 3:37pm

Outbound is often a random port, and would require application logic to be available. It would also be hard in case of nested / dockers etc that masquerade. But it is certainly an encouraging start.

I’d be happy to receive a suggestion for sponsoring the NIC-based config key on johan@molnix.com. I will also ask our devs if anyone is up for contributing a PR for it. While we don’t internally use golang, chances are one of us knows it.

johanehnberg · November 19, 2024, 3:40pm

Since we are at it, what is the use or effect of ipv4.nat.order: after in the context of nftables?

stgraber · November 19, 2024, 5:49pm

No effect whatsoever, I just checked and the Append flag is never looked at in the nftables driver, which makes sense given how nftables works.

stgraber · November 19, 2024, 5:55pm

I don’t mind explaining how I usually handle that stuff here for anyone else who’s wondering.

Most of that kind of small feature work I usually handle as part of a prepaid bank of hours.
Folks often start with something like 4 to 10 hours or so (250 CAD/hour) and I then slowly consume those hours either through consultation/help related to the customer’s infrastructure or spend time implementing this kind of smaller features.

My guestimate for ipv4.nat.address and ipv6.nat.address on NIC devices would be 4-6 hours of work. The actual implementation should be pretty easy through our existing firewalling code, the main complication will be hooking up the IP ownership validation, basically ensuring that the user’s project and parent networks allow for that particular IP or a subnet containing that IP to be used.

This may not affect you specifically, but it’s something that’s needed to avoid less privileged users being able to hijack IP addresses.

And as always, if you have staff interested in contributing this, that’d be great, the more folks we have contributing and getting familiar with the codebase the better!

johanehnberg · November 20, 2024, 7:56am

What about
before → priority srcnat - 10
after → priority srcnat + 10

Currently on nftables, the order is actually a tossup that comes down to loading order or table & chain name.

We have two developers interested to contribute. Would the nttables order-as-priority implementation be desired as a PR? It makes for an excellent starter issue before taking on NIC ipv4.nat.address.

I assume the validations for SNAT would align with How to configure network forwards - Incus documentation ? Beyond that, anyone with the rights to change a NIC can presumably already set a static IP, create more NICs on other networks, and create proxy devices to the same effect. There may be some angle that I am missing though.

stgraber · November 21, 2024, 1:32am

Yep, I think that’d make sense!

stgraber · November 21, 2024, 1:35am

Yeah, the same restriction as picking an address for a new network forward would work fine here.
It’s also similar validation to what’s done for ipv4.routes.external.

For your other comment. A user can set whatever IP they want in the instance, that’s fine but restricted projects usually prevent proxy devices, only let you attach to networks that are in your project and are Incus managed, … specifically to prevent this kind of potential abuse, keeping you to only whatever IPs/subnets are allowed for the project.