LXD and MACVLAN

JeToJedno · March 21, 2021, 10:38pm

A quick check before I waste hours:

From my reading, I think I need a “source” of IP Addresses for containers if I’m to use MACVLAN. The subnet which I’ll be using has no DHCP on it and has traffic for this LXD cluster as well as a K8S cluster.

I think that means I’ll need to either:

write a small program to manage containers <-> ip addresses in a database and assign the IP Address when the container is created
or
start a dnsmasq service for this subnet, possibly running on 127.x.x.x or a socket and get the containers to get their IP Address from it.

I can see that the first option could use dnsmasq with a simple wrapper to maintain this database of container_name <=> IP address.

I’d prefer to not supply a general DHCP server on this subnet, so I’d like some means of ensuring all requests are from LXD containers.

I believe LXD already starts dnsmasq for BRIDGE subnets. Is there some way to get it to start the same service for MACVLAN subnets?

I’m interested in going down this route as the firewalling is a pain in the neck with the clustered bridged networks, and the NAT involved is an unnecessary overhead.

stgraber · March 21, 2021, 10:42pm

LXD doesn’t support running dnsmasq on a macvlan network as with such networks there is no guarantee that there is a host interface associated with the network and even when there is, one main design limitation of dnsmasq is that the host cannot interact with the guests, so in this case, dnsmasq running on the host would not be able to give out IPs to your containers.

It would also come with the clear issue of now offering DHCP to your entire network, not just what’s running on this one host.

tomp · March 22, 2021, 9:02am

You can configure the instances to assign IPs manually using their own internal network config, or you could automate that using cloud-init, similar to how it can be done with routed NIC types (that don’t support DHCP).

See How to get LXD containers get IP from the LAN with routed network

JeToJedno · March 22, 2021, 9:02am

Thanks @stgraber.

Idea 2 was creating a vlan and putting the MACVLAN on it, along with the dnsmasq instance (service running on cluster’s primary host). This is, again, replicating something which is already implemented in LXD.

Idea 3 was giving up on this and running flannel for the networking, which requires an etcd service and, I’m sure, has its own constraints.

All of these increase config complexity, but provided it’s a one-off cost, I can live with that.

JeToJedno · March 22, 2021, 9:04am

That was the thinking behind my option 1 in my post. I hadn’t got as far as the exact mechanism.

tomp · March 22, 2021, 9:04am

Can you describe what you are trying to achieve overall, as it sounds like macvlan might not be the best approach. Macvlan is for when you want to connect the containers to the external network so they can access network resources like they were physically connected (i.e not isolated from the rest of the network).

JeToJedno · March 22, 2021, 9:45am

Thanks @tomp.

I would like:

the containers in the cluster to be able to talk to each other using http(s) APIs;
nothing on this physical subnet to be able to talk to them (or v-v);
HTTPS ingress via proxies (mostly NGINX) from this physical network to the LXD subnet;
egress (for software download, DNS etc) preferably also via proxies (e.g. squid, dnsmasq);
access from cluster’s primary host to the containers for problem resolution (lxc exec is fine);
the ability, if I choose, to firewall some containers so their traffic is restricted to the proxies only.

I’d prefer minimal (system) overhead (wouldn’t everyone) and I like containers / whatever to keep their own identifiers and have these appear in reports, logs etc for traceability. DNS lookups (or an equivalent such as etcd) are good as it makes inter-container connections easier to automatically configure, although a lightweight message queue subscription route may be possible.

The workload is distributed ML [multiple (low hundreds) medium to long running (hours to days to weeks) standard parameterised container setups using cloud-init on multiple (tens) hosts with the necessary hardware], and there are a number of large databases / file sources (NFS), which will be made available to the physical subnet and proxied (in some fashion) into this LXD subnet. There is, potentially, a constant large volume of network traffic within the LXD subnet as the ML containers asynchronously coordinate their workloads.

The container lifetimes follow a typical ML path - quick cycle times (hours-days) with D.S. interaction via some variety of IDE when the project is young, getting longer (days-weeks) with minimal interaction but increased monitoring as it stabilises, and potentially moving to constant, perpetual (on-line) ML when “stable”.

At present a single “private” (LXD) subnet which would be fine for the small number of teams if firewalling can be made to work. There are open source tools for resource management.

I’d prefer to not go the kubernetes route as it’s not designed for this kind of workload, but that’s Plan B as many people are using it for a wide variety of use-cases so I can probably find a base to plagiarise.

tomp · March 22, 2021, 10:15am

You could look at the fan bridge type (IPv4 only), see bridge.mode and fan.* in https://linuxcontainers.org/lxd/docs/master/networks#network-bridge and https://wiki.ubuntu.com/FanNetworking

Alternatively, if you’re looking for a lower overhead/hardware assisted solution, then using macvlan ontop of a private vlan interface makes sense. You can specify vlan property on the macvlan NIC and LXD will use an existing VLAN interface on the host’s parent (or create one if needed).

See vlan on https://linuxcontainers.org/lxd/docs/master/instances#nic-macvlan

The host’s would not be able to communicate with the instances, but you could use the proxy device to expose port(s) from the host into an instance (or vice versa). See https://linuxcontainers.org/lxd/docs/master/instances#type-proxy

You could also run a private DHCP server inside a container and that would be used by the instances on that private VLAN.

For other network services that need outbound external network access (like DNS, outbound proxy), you could run an instance that provides these services to the private vlan, and then add an additional NIC device to that instance (perhaps another macvlan or bridged device) that would be use as the port to get external network access.

JeToJedno · March 22, 2021, 1:12pm

Thanks @tomp - a day’s worth or read and experimenting there, but it’ll save me a week of errors and frustration.