Best advice for networking for public server with IPV4 allocation

hifly · December 14, 2022, 12:30pm

Hi, I’m asking this as a separate question, but it relates in various ways to the network acls thread.

There are a lot of options for networking with lxd, but all have various limitations and it’s not clear to me what the default recommended option is for new installs? I think I have read and absorbed the videos on the youtube channel, the LXD documentation website and the blog series at http://stgraber.org on “creating an inexpensive cluster”. However, I’m struggling to condense this all and make a decision…

What I have are two sites, one with a /24 IPV4 allocation. The other with a /28 allocation.

I’m starting with individual machines running LXD and I’m content to develop a solution to backup and move machines between individual host machines. However, the direction of travel is towards a cluster + some extra individual machines in each location.

The /24 site is broadly facing a switch and multiple machines can simply start answering requests on any IP to claim it. The /28 is more restrictive and has a firewall in front, currently doing proxyarp to push individual IPs to individual servers. Probably only 2-4 IPs are available for use, currently contiguous, but might not be in the future (proxyarp avoids wasting IP space by needing to subnet)

The host machine has a bond over multiple nics, that bond is then in a bridge (br0) which is then the main interface in use by that physical machine to get internet access (there is another bond which makes up a private net between machines as well, but not relevant for now)

I will be setting up (only) around 10-20 containers, mostly running very simple services, eg exposing a few ports (mail/http, similar) and I would prefer to limit the outbound traffic to a few select services as well. Whilst some could survive with forwarded/proxied traffic, for the machines that I want to allocate an public ipv4, I’ve experimented with “bridged”, “routed”, and “ipvlan” devices. None seem perfect (or more likely I don’t understand how to use them?)

Bridged means creating the networking inside the container. I would prefer the container not to have an option to choose IP addresses and this to be set on the host if possible. Also, it seems more complex to prevent machines having unrestricted access to each other, but I’m not so worried about that.
Routed seems better. I can create a device in the instance config which sets up the public IP. Also networking is easily controlled between containers
ipvlan didn’t seem to offer any benefits to me over routed?

However, I liked the idea of using the ACLs feature for networking. However, I don’t see that I can use the ACL feature for outbound traffic in any of the above cases? This does seem to massively reduce the utility of the ACL feature, in fact I’m not totally sure what you can use it for apart from controlling intra instance traffic? I guess it could be used on an internal managed bridge which has a NAT to the outside world? What do people actually use it for??

However, the specifics aren’t covered in the blog series on

Inexpensive highly available LXD cluster: 6 months later | Stéphane Graber's website

However, I get the impression that OVN networking can make use of ACLs for outbound networking control? There is a lot of docs on it, but I’m struggling to grok the big picture on what it really is and how devices work? Q: Is OVN the solution to what I’m looking for here? Am I right in understanding it’s something like the simple internal NAT bridge, but I can also route external public IPv4 traffic into it? I’m not quite getting the big picture to be honest…

For now I’m just planning to do the same as I did with my old linux-vserver installs and run iptables rules on the host. However, this is mildly inconvenient in that it it’s out of sync with creating the container instances and not so simple to apply traffic profiles to classes of servers (sure I can create chains with traffic profiles).

It seems like a useful feature would be for unmanaged networks, to be able to apply ACLs to the devices in the instances (feature request?). So eg in the case of a bridge or routed device in an instance, attached to the unmanaged device on the host, to be able to apply profiles/config to the instance device? This seems possible with OVN, hence wondering if the correct solution is to persevere with understanding that and migrating everything to that?

If the answer is OVN, can I use this across both a cluster and a few independent machines? Is this a good solution overall?

Alternatively, would I be better to look at “forwarding” for the entire solution? (none of the instances are high traffic, so load isn’t a concern). What I’m unclear about with forwarding is how this interacts with the outbound IP and NAT? I don’t see anything in the documentation on this, so presuming it doesn’t affect outbound traffic from the instance? In cases of Letsencrypt (or FTP) you have some challenges to match outbound and incoming traffic and instances work better if they are aware of their external IP, etc. So I don’t think forwarding is going to work as a general solution as I still need to figure out my outbound traffic through some other mechanism?

So far “routed” appears best for my needs (mainly because I can configure IP address outside the container). Is there another option I should research though? Any hidden issues? Can I combine forwarding and “routed”? eg to avoid writing iptables DNAT port rewriting rules (port 80 → 3080), can I write these using “forwarding” and push the IP via “routed”?

Why do others go with “bridged”? Or in fact what do people generally go with? Why?

rocket · December 14, 2022, 7:40pm

i am interested in this topic as well. I have a similar setup I am looking at. I would have 3 servers with a /28 subnet on their external nics … and an internal nic … would you setup ovn on the internal nic address space? does that need to use the external space for the virtual router addresses?

tomp · December 16, 2022, 9:47am

Indeed routed sounds like your best bet at the moment as it offers basic, controlled connectivity without any bells and whistles (leaving you fully in control of it). The biggest downside of routed, and why we use bridged by default is that routed doesn’t support broadcast/multicast and thus doesn’t support automatic IP configuration, and requires manual configuration inside the instance.

But this is what is was designed for anyway, situations where you want to statically pass a specific set of IPs into an instance, without needing to route larger prefixes to your host.

WRT to ACLs, we discussed this over at Network ACLs possible on routed devices? - #4 by tomp

But to summarise:

OVN networks support the full ACL functionality (including intra-network filtering). But OVN is quite a bit more complex and has more “moving parts” so not ideal for smaller setups (but perfectly possible). It does require a shared uplink network for each LXD host so that the virtual route IPs can be announced from any LXD host. See How to configure network ACLs - LXD documentation and LXD network ACLs - YouTube
Bridged networks don’t support intra-network filtering (because netfilter doesn’t support providing ingress and egress port at the same time for a single rule). They do support filtering at the boundary between the bridge and the host though. See Prevent cross-talk - #8 by tomp for an example. Also see other limitations here How to configure network ACLs - LXD documentation
You can use the host_name setting on routed NICs to allow for each NIC to have a predictable/known host-side interface name. This should help with manual firewalling as you can reference that interface in your rules. E.g. lxc config device set <instance> eth0 host_name=foo See Type: nic - LXD documentation

hifly · December 16, 2022, 11:29am

Could you discuss whether OVN would be a sensible solution for a couple of machines in a non clustered environment? (Of course feel free to add reasons why it would be sensible for clustered use, I was trying to find out the worst case limitations)

The biggest limitation that I’m finding with routed/bridged networks is firewall integration. I don’t have an issue with writing my own rules, or using my own rule compiler, but lxd is a bad citizen here (at least for iptables) in that it takes over the whole root INPUT/OUTPUT table. I think it’s a fairly trivial change to make it put it’s (iptables) rules into a separate chain (or chains), but obviously that’s not done at present. Note, I have worked around this for one server by putting all MY rules into chains and then I can carefully insert them around the lxd stuff, but this doesn’t scale well in general)

I’m struggling to understand the OVN setup? The top of the docs say “… following steps to create a standalone OVN network that is connected to a managed LXD parent bridge network (for example, lxdbr0) for outbound connectivity.” However, I don’t see how in general I can have a managed network as my internet facing network?

eg if I were using this in a home/office setting and relying on upstream NAT for connectivity (eg say in some local ipv4 private space), then some separate machine would run the DHCP for that environment, so surely I can’t use a “managed bridge” to connect to that?

If I move the machine outside onto some “DMZ” environment, say I attach it to some public /28 space then it will be sharing that space with other machines, so surely that makes using a “managed bridge” tricky? Say I wanted 2x standalone LXD machines here, both with OVN and sharing some small ipv4 pool, surely they can’t both have a managed bridge config?

In the clustered docs it talks about using an unmanaged interface? So why can’t that be used in the original case? However, the docs don’t make it clear what features are lost from this? Can all the ACL stuff still be used?

What happens if you want to run 2x clusters in the same public address space? Am I right in thinking this isn’t possible?

Public ipv4 address space is crazy expensive and getting more so. Using bridges with cidr is not necessarily optimal. Generally you want to allocate a few individual IPs to a pool (might not even be contiguous). I’m not yet clear how to do that within the OVN/LXD bridging setup?

One of the attractive elements of OVN (if I have understood it correctly?) is that you could use it to bridge datacentres and share the public external IPs across data centres? So I could have some public service announced on an IP in Amsterdam, but if for some reason it makes more sense to attach that service to a machine in London, then OVN will take care of routing that to the correct datacentre? (Obviously you need to accept the latency cost)

However, this also implies that an OVN cluster can span multiple lxd clusters? Correct? (Because lxd clusters are encouraged to say within a single low latency space, eg single datacentre). So that is what is confusing me about the need to use managed lxd interfaces, etc?

A diagram or some other talk on this would be helpful! I don’t disagree that there are good docs on the low level specifics of how to do very specific things. However, from an outsiders point of view, I’m missing the big picture on how to actually architect a non trivial lxd/ovn setup??

Thanks for the links above!

tomp · December 16, 2022, 12:45pm

If you remove the lxdbr0 you’re not using anyway (I believe) then it won’t add any firewall rules.

tomp · December 16, 2022, 12:46pm

If you’re connecting to an external network which provides DHCP you wouldn’t use a managed bridged, correct.

tomp · December 16, 2022, 12:46pm

You could have the managed bridge operate using a dedicated subnet and then route that to the LXD host. This is how some users have done it.

tomp · December 16, 2022, 12:47pm

This is what routed is for. Or you can use a smaller subnet with managed bridges.

tomp · December 16, 2022, 12:50pm

In principle it would be possible. But unlikely for 2 reasons:

LXD cluster expects to run within same rack/location from a latency expecation perspective.
LXD’s OVN implementation uses a shared L2 for its uplink network, so it can move the virtual router IP between LXD hosts in the case one goes down. We’ve discussed adding a configuration setting to allow controlling which cluster members would be considered as uplink chassis candidates, but at this time all cluster members are candidates, so need to operate on the same L2 uplink network.