Reasoning behind in/out/fwd netfilter rules

m1cha · June 26, 2022, 6:11am

To me the fwd, in and out chains look kinda useless since they use an accept policy.
That means that accepting certain packets explicitly doesn’t have any effect.
So is there a reason why they exist?

That’s what they currently look like on arch linux with lxd 5.2:

table inet lxd {
        chain pstrt.lxdbr0 {
                type nat hook postrouting priority srcnat; policy accept;
                ip saddr 10.149.19.0/24 ip daddr != 10.149.19.0/24 masquerade
                ip6 saddr fd42:8ec3:2d17:407a::/64 ip6 daddr != fd42:8ec3:2d17:407a::/64 masquerade
        }

        chain fwd.lxdbr0 {
                type filter hook forward priority filter; policy accept;
                ip version 4 oifname "lxdbr0" accept
                ip version 4 iifname "lxdbr0" accept
                ip6 version 6 oifname "lxdbr0" accept
                ip6 version 6 iifname "lxdbr0" accept
        }

        chain in.lxdbr0 {
                type filter hook input priority filter; policy accept;
                iifname "lxdbr0" tcp dport 53 accept
                iifname "lxdbr0" udp dport 53 accept
                iifname "lxdbr0" icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
                iifname "lxdbr0" udp dport 67 accept
                iifname "lxdbr0" icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, nd-router-solicit, nd-neighbor-solicit, nd-neighbor-advert, mld2-listener-report } accept
                iifname "lxdbr0" udp dport 547 accept
        }

        chain out.lxdbr0 {
                type filter hook output priority filter; policy accept;
                oifname "lxdbr0" tcp sport 53 accept
                oifname "lxdbr0" udp sport 53 accept
                oifname "lxdbr0" icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
                oifname "lxdbr0" udp sport 67 accept
                oifname "lxdbr0" icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert, mld2-listener-report } accept
                oifname "lxdbr0" udp sport 547 accept
        }
}

tomp · June 26, 2022, 6:04pm

If we dropped all unmatching traffic it would also drop all allowed traffic in other tables. This is how nftables works unfortunately, a packet has to be allowed in all hook tables for it to pass, whereas it only has to be dropped in a single table to be blocked.

tomp · June 26, 2022, 6:07pm

See discussion here for more detail

To actually drop traffic you can use our network acl feature.

m1cha · June 29, 2022, 6:03pm

correct me if I’m wrong, but the way I understand it, with the policy of lxd’s chains being accept, all the other rules don’t have any effect and we could just remove them without any change in behavior. That doesn’t mean that we should switch to drop, just that we can basically remove those rules because they don’t do anything.

That being said, IMO using LXD on it’s own without an additional firewall is pretty bad because it both enables IP forwarding and sets the forward policy to accept effectively turning your device into a router.

And if you decide to use a firewall, the only rules that have an effect are the NAT rules. If the firewall has a drop-chain it doesn’t matter if it has a lower or a higher priority than LXDs chain - LXDs rules need to be copied to the firewall implementation to have any effect meaning there’s effectively no difference between enabling and disabling LXDs internal firewall.

For that reason I’d suggest to let LXDs firewall use a default-drop chain to make it secure by default. If the user has their own firewall they can disable LXD’s and copy it’s rules into the firewall configuration.

On top of all that LXD could try to be a better citizen than e.g. docker(which is REALLY bad when it comes to that) and provide integrations with firewalls so you don’t have to hardcode IP addresses hoping that they don’t change. Ideally the interface would be generic enough so we don’t have to support every firewall on earth explicitly, e.g. by calling an external script where the user can react to changes in LXDs network setup.

The alternative would be to do it the other way around and let users subscribe to firewall changes through the LXD server API. That should be made as easy as possible though because most people don’t want to compile code to do scripting on their server.

Again, correct me if I’m wrong.

tomp · June 30, 2022, 11:55am

You are not wrong. Those rules, apart from the SNAT rule, are not really doing anything at this time, by themselves. Really they are a skeleton ruleset used to allow managed bridge services if the user enables the ACL feature (Linux Containers - LXD - Has been moved to Canonical).

There doesn’t appear to be consensus yet on how different applications should interact when creating nftables rules. The table namespace feature is great to simplify rule management for an application, but the behaviour of nftables to apply a drop rule in any table even if the traffic is allowed in another table is awkward as it undoes some of the benefits of table namespace that could have been realised if that was not the behaviour. This is different than when LXD uses xtables (iptables/ip6tables/ebtables) as there are standard chains that we can add rules to to have our traffic policy interact with other application’s rules.

This can be disabled by setting ipv4.routing=false on the managed bridge (see Linux Containers - LXD - Has been moved to Canonical).

That would be very disruptive for existing users, and even if we only applied it on newly created networks, it would complicate getting started quickly with LXD (as a novice user would immediately have to think about their outbound internet policy). We wouldn’t be able to add a default drop policy to the entire machine as it would potentially disconnect any remote users or interfere with other applications, so at best we could add a default drop rule to traffic to/from the lxdbr0 interface, which wouldn’t improve security of enabling router mode for the other interfaces on the system anyway.

We don’t go in for external hooks with LXD, as our experiences with LXC indicated that adding hook support meant it was hard to get an understanding of how everyone uses the application and makes changing things risky as it would bound to break someone’s integration work flow. Also in a cluster environment its not always clear where the hook should be run. We do have a REST API you can subscribe to to get events, although im not sure what specific events you would be interested in.
Although one thing comes to mind is something @stgraber wrote as a proof of concept for our BGP integration, which uses the event stream to monitor for instances starting and then integrating the config to get their IP in order to advertise them via BGP.

See GitHub - stgraber/lxd-bgp: A tiny BGP server in Go exposing LXD external routes that may provide some inspiration.

Hopefully in the future we will see some consensus around how applications should coexist when using nftables, or perhaps its behavior will be changed to allow an allow rule to apply irrespective of a drop rule in another table.

m1cha · July 17, 2022, 3:16pm

We wouldn’t be able to add a default drop policy to the entire machine as it would potentially disconnect any remote users or interfere with other applications, so at best we could add a default drop rule to traffic to/from the lxdbr0 interface, which wouldn’t improve security of enabling router mode for the other interfaces on the system anyway.

Right, but couldn’t we just drop forwards instead? That’d still allow all traffic like without a firewall but basically undoes enabling forwarding(except for lxdbr0).

We do have a REST API you can subscribe to to get events, although im not sure what specific events you would be interested in.

A list of LXD-managed interfaces with all their addresses. As you can see in my OP there’s basically just two important rules in the postrouting table which masquerades all traffic that want’s to leave lxdbr0. That requires to know the addresses which were generated when creating the interface though.
Sure, usually you’d just setup all your bridges once, copy the addresses to your firewall and never touch it again, but it’d obviously be easier if you could just modify networks using the LXD cli however you want and have your firewall adjust automatically.

Hopefully in the future we will see some consensus around how applications should coexist when using nftables, or perhaps its behavior will be changed to allow an allow rule to apply irrespective of a drop rule in another table.

Is that an issue on LXDs side or on nftables side? Sounds more like an nftables limitation to me.

tomp · July 18, 2022, 8:19am

Its an nftables behaviour. Its not clear yet what the correct way of approaching multiple applications managing the firewall yes.

tomp · July 18, 2022, 8:21am

Thats what Docker does (see LXD and Docker Firewall Redux - How to deal with FORWARD policy set to drop) and it causes no end of confusion and problem reports on these forums. Certainly not keen to add system wide default drop rules for any sort of traffic due to the potential for unexpected blocking of other application’s traffic.

tomp · July 18, 2022, 8:24am

Do you mean the equivalent of lxc network ls or lxc ls?

m1cha · July 18, 2022, 8:43am

yes but there’d have to be an event for network configuration changes so you can update it accordingly.
While writing this answer I started looking into how you’d do that and unfortunately the REST API documentation seems to be pretty bad when it comes to events.
There’s no mention about which event types there are so you have to check the code. And even then it’s not clear what those mean. My current assumption after reading the code for 10min is that a network change would trigger an operation-event with some kind of otherwise unspecified payload - but it’s probably in a similar format to what you’d request to start that operation.

tomp · July 18, 2022, 9:12am

You can use lxc monitor to subscribe to events (and get an idea of the type of API requests being made).

E.g.

lxc monitor --loglevel=info --pretty --type=lifecycle
INFO   [2022-07-18T10:11:27+01:00] Action: network-updated, Source: /1.0/networks/lxdbr0, Requestor: unix/user (@)

Shows when lxdbr0 network was updated, at which point that can trigger you to pull its latest config/info.

The events documentation is here Events - LXD documentation with a list of the event types.

And recently they got their own constants in the api package too api package - github.com/lxc/lxd/shared/api - Go Packages

m1cha · July 18, 2022, 10:06am

Oh nice so the documentation is there so we just have to make it easier to find.
Through Google I came to this page
Neither the linked go nor python documentation list any event types. The python impl at least has some constants in it’s code.
But also, the python doc links to GitHub which also doesn’t have any description of the event types.

Also the doc is 404ing right now

m1cha · July 18, 2022, 10:14am

Either way, I think with some additional cross-links the documentation will be way easier to find(I can make a PR for that) .
Also while we’ve now come to a conclusion how such a solution could be easily implemented using the REST API, it also looks like most people(including me) might never need it because they probably rarely change the bridge config.

tomp · July 18, 2022, 10:37am

Thanks, I’ve asked @ru-fu to take a look at fixing/improving the cross links.
We’re in the process of re-doing our documentations so some of the structure has been changing.

victoitor · October 19, 2022, 4:06am

I was just about to create a new topic with the same questions as @m1cha but gladly I found this thread and I don’t need to explain the whole issue with lxd’s nft tables.

But it bugs me slightly to see @tomp 's statement.

Hopefully in the future we will see some consensus around how applications should coexist when using nftables, or perhaps its behavior will be changed to allow an allow rule to apply irrespective of a drop rule in another table.

Although it’s an understandable issue on nftables, this comment seems to indicate there will be no solution unless someone else solves the problem. This will leave lxd in an awkward situation as nftables usage increases.

Expected behavior

It should be possible to obtain a hardened instance firewall without crippling LXD. It should be possible to route forward only packets to LXD managed networks while denying everything else. It should also be possible to harden input firewall rules to accept only a few rules and to be able to add a few extra LXD generated rules.

How it works currently

To allow for instance internet access, you need to allow all forward packets unless you configure firewall rules by hand. Similar issues with input and output rules.

The issue

It seems the main issue is that lxd is using a base chain instead of a regular chain. Indeed the base chain is useless if there is another base chain using the same hook which rejects packets, as was the case in this thread, for example.

A secondary base chain would only make sense to add drop veredict statements. As is, a secondary base chain with only accept rules is pointless.

A possible solution

Be able to change the behaviour of LXD firewall generated rules so that LXD rules are contained in a regular chain and are inserted as a jump from another base chain set of rules.

Just as there are network configuration options which control a few aspects of how these rules are added, how about adding a bit more so the user can configure them correctly?

Make the defaults to add rules as they are so there is no change for anyone who doesn’t run into these issues. As @m1cha was mentioning, add a ipv4.nft.forward.policy configuration option so users can change the default policy of these rules.

It would also be nice to be able to add the chains as regular and not base chains and be able to add something like a vmap to jump from another base chain into the correct lxd regular chain.

Something like ip4.nft.input.type=jump, ip4.nft.input.table=filter, ip4.nft.input.chain=INPUT, ip4.nft.input.vmap_line=5 would produce a regular chain with the input rules and it would add a jump statement to the INPUT chain in the filter table on line 5. Here I’m guessing ip4.nft.input.type could be either base, jump or goto in which case both jump and goto would make a custom input chain and the vmap would either jump or goto the correct table in the specified line. Default value would be base.

tomp · October 20, 2022, 9:56am

This is basically how the LXD xtables driver works. It injects rules into the main base chains (or uses its own chains with jump rules from the main chains).

This comes with its own set of problems as now multiple applications will be managing the same ruleset and potentially affecting each other’s rules due to ordering or default policy (e.g LXD and Docker Firewall Redux - How to deal with FORWARD policy set to drop). There are numerous examples of problems like this in the forums.

It would be good if nftables provided a way to state that an accept in a base chain was final, so that it couldn’t then potentially be dropped by other base chains (which is how the drop policy is for nftables chains). Then we would just need to control the priority LXD uses for the netfilter hooks in the base chains (which could be a setting) to ensure its ordered how the user wants.

Otherwise one of the main benefits of nftables (the use of separate table namespaces) which allows for isolated rules for each application is lost and we’re back to trying to order the rules correctly by controlling the start up order of each application (which we’ve seen can then break if you reload them in a different order later).

The rules LXD adds are only to allow instances access to the managed bridge (lxdbr0) services (such as DNS, DHCP, ping) and for SNAT to the external interfaces. It doesn’t add drop/reject rules (with the exception of the ACL feature).

So with the nftables driver, these default accept rules are really only effective to provide instances with access to the managed services when the LXD ACL is enabled (as that adds a default drop rule for lxdbr0 traffic).

My suggestion right now would be to ensure that the manual rules you add affect only traffic on non-LXD managed interfaces (as LXD will only add rules for its own interfaces), without adding a default drop/reject for all interfaces (i.e add a default drop/reject rule for all interfaces except lxdbr0).

Or you can turn off the LXD firewall rules (ipv{n}.firewall=false) entirely and manage them centrally via which ever firewall configuration software you are using (this is what I do as I prefer to have the firewall policy for a system managed centrally).

Its a tricky one for sure. Which ever way we do it, LXD rules will potentially (likely in my experience) be affected by rules added by other applications/systems. I’m not against using non-base chains and then adding jump rules into the main base chains. Although there isn’t, as far as I know, a standard set of base chains, except those added by the nftables iptables shim commands.

victoitor · October 20, 2022, 4:48pm

Indeed it would not be bad, but I’m not sure this problem should be ignored and considered as just a case of wishful thinking. I think nftables has this behavior not as a bug, but by design. Since it seems LXD is developed to ease the usage of containers, it should adapt to how nftables actually works to help end users run their containers/vms. And I don’t think the solution it currently provides is really helpful.

To explain what I mean, I want to consider a situation which I believe the LXD team wants to be able to solve. I also do not wish to consider the case in which the netfilter team will change their mind on how their policy

LXD use case

Set up a hardened server which runs LXD containers/vms which can access the internet. By a hardened server, I mean one with a set of firewall accept rules and a drop policy for input and forward hooks, at least. I want to consider the control obtained by ipv{n}.firewall=false as was mentioned.

Solution 1: `ipv{n}.firewall=true`

In this case, if input and forward hooks have a drop policy in another chain, I’ll basically not have any connectivity in my containers, so it doesn’t make sense to use them in the first place.

The other solution would be to change the drop policy in my other base chain, which ranges from undesirable to unacceptable, depending on how important your data on that server is.

I guess this solution could be discarded, as both options are unacceptable. This leaves our only real solution to be the following one.

Solution 2: `ipv{n}.firewall=false`

Ideally, I would copy the current tables and use those. But what if I want to add another managed LXD network? I would have to manually add rules for every other network and research how to do it properly. In this case, in what way is LXD making my life easier? Although I can finally get control over the firewall rules, I thought the purpose of LXD was to be better than using LXC and having to configure everything yourself.

Conclusion

Solution 1 is unacceptable while solution 2 is having to do everything yourself (which is contrary to the automation LXD should provide).

victoitor · October 20, 2022, 4:57pm

I can agree adding these rules automatically has its downsides as well, but in my initial post, I proposed a configurable solution. Give the user the control to choose to add a regular chain. It would not be default, but the user can choose to change this behaviour.

The only concern which would remain is what you mentioned that “there is no standard set of base chains”. Similar to what I mentioned above, it would be nice if the user could then choose this behaviour. It would not be automatic. There would be 3 configuration options for the user to choose for each base hook.

Should the firewall rules be placed in a base chain or in a regular chain?
If the user chooses a regular chain, then he must also provide which base chain should be used for the jump. This should not be auto detected as that comes with natural problems.
In which line should the jump rule be added? Maybe make this default to first or last rule so the user only needs to change this if necessary.

I’ve seen how LXD may even add one base chain for every managed network. As I mentioned before, all that would be needed would be a vmap jump line to choose the appropriate table and there would even be a smaller number of tests for verification.

Current input chains for two networks

table inet lxd {
	chain in.lxdbr0 {
		type filter hook input priority filter; policy accept;
		iifname "lxdbr0" tcp dport 53 accept
		iifname "lxdbr0" udp dport 53 accept
		iifname "lxdbr0" icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
		iifname "lxdbr0" udp dport 67 accept
		iifname "lxdbr0" icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, nd-router-solicit, nd-neighbor-solicit, nd-neighbor-advert, mld2-listener-report } accept
		iifname "lxdbr0" udp dport 547 accept
	}

	chain in.lxdcustombr0 {
		type filter hook input priority filter; policy accept;
		iifname "lxdcustombr0" tcp dport 53 accept
		iifname "lxdcustombr0" udp dport 53 accept
		iifname "lxdcustombr0" icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
		iifname "lxdcustombr0" udp dport 67 accept
		iifname "lxdcustombr0" icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, nd-router-solicit, nd-neighbor-solicit, nd-neighbor-advert, mld2-listener-report } accept
		iifname "lxdcustombr0" udp dport 547 accept
	}
}

How it would look like with this alternate configuration

Set the following configuration variables:

ip4.nft.input.type=jump
ip4.nft.input.table=filter
ip4.nft.input.chain=INPUT
ip4.nft.input.insert_position=first

Which would produce

table ip filter {
	chain INPUT {
		type filter hook input priority filter; policy drop;
               iifname vmap { "lxdbr0" : jump inet lxd in.lxdbr0, "lxdcustombr0" : jump inet lxd in.lxdcustombr0}
	}
}

table inet lxd {
chain in.lxdbr0 {
		tcp dport 53 accept
		udp dport 53 accept
		icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
		udp dport 67 accept
		icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, nd-router-solicit, nd-neighbor-solicit, nd-neighbor-advert, mld2-listener-report } accept
		udp dport 547 accept
	}
chain in.lxdcustombr0 {
		tcp dport 53 accept
		udp dport 53 accept
		icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
		udp dport 67 accept
		icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, nd-router-solicit, nd-neighbor-solicit, nd-neighbor-advert, mld2-listener-report } accept
		udp dport 547 accept
	}

Can’t guarantee those rules would compile perfectly since I edited them by hand, but I think they might be understandable. In this case we would get the correct behavior of accepting those rules and there would even be less checks since the check for iifname was already made in the vmap entry. If those rules did not apply, then the chain would just return to the next position in the base chain and continue from there.

I think regular chains can have policies too. It would also be able to harden the lxd firewall rules by adding a drop policy without affecting rules to networks not managed by LXD since those will not be sent to the regular chains on the vmap entry.

tomp · October 20, 2022, 5:08pm

My main point was that doing it like you propose is how we do it for xtables and that has also has a different set of problems, but the solution is the same, that is to modify the system firewall rules manually.

Arguably making xtables and nftables drivers the same could be an approach we take, but we would still encounter issues with ordering conflicts with other firewalls and applications (e.g docker).

tomp · October 20, 2022, 5:10pm

Your point about having a setting which places the rules in a certain place in the ruleset doesn’t address the ordering issues when applications that modify the firewall are restarted in an order different to the boot time order and then apply their rules in a new order.

Reasoning behind in/out/fwd netfilter rules

Expected behavior

How it works currently

The issue

A possible solution

LXD use case

Solution 1: ipv{n}.firewall=true

Solution 2: ipv{n}.firewall=false

Conclusion

Current input chains for two networks

How it would look like with this alternate configuration

Set the following configuration variables:

Which would produce

Solution 1: `ipv{n}.firewall=true`

Solution 2: `ipv{n}.firewall=false`