LXD in production environment

Hi,

Just out of curiosity, anyone who uses LXD in production environment here?

[]'s

Yes I’am.
Currently Xenial + LXD 2.21, upgrade to 3.0 soon.
One server has 60 containers.
I prepare containers on a stage server and then move to production server.
There is an automatic image creation for bakup on a dedicated server.

1 Like

Been running LXC1.0 in production since 2014 and currently in the process of migrating my ~250 containers on Xenial with LXD 3.0.

Running like a charm still, but managing the old LXC way is becoming a pain.

1 Like

I’m deciding migrate to LXD.

Your containers work with different network services or all 60 containers work with same application (like apache)?

The server running LXD in your environment are configured with bridge or macvlan interfaces?

You host your ~250 containers in more than one host server? Live migration is a common task in your day-to-day?

The storage pool used are in a block device?

I am hosting in one LXD host. My existing setup was LVM storage but the new build I am working on is going to be ZFS block device.

LXC in production since 2012, and migrate to LXD (dir) in 2017.

1 Like

What kind of services run on their containers?

I intend run Postgres, MySQL, NFS server (and NFS mount point), Glusterfs mount point, Apache, and Wildfly in some LXC containers (but i’m have a bit of fear).

I’m studing all documentation available in:

I’m trying to enumerate a set of best practices for use LXD in production environment!

Thanks!

running nginx, php, tomcat, mysql, hadoop, docker and almost everything :smiley:

1 Like

Hi.
We use bridge, with default configuration.

Because of that big number of containers, we had to increase some sysctl values. I will tell you which if you need.

Regards

1 Like

Here is what we set in /etc/sysctl.conf :

fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
vm.max_map_count = 262144
kernel.dmesg_restrict = 1

And in /etc/security/limits.conf :

*               soft    nofile          1048576
*               hard    nofile          1048576
root            soft    nofile          1048576
root            hard    nofile          1048576
*               soft    memlock         unlimited
*               hard    memlock         unlimited
2 Likes

Thanks for the clarifications! I’ll prepare my setup with these adjustments.

You use one profile per container? Or all containers share the same profile?

Your disk limits are applied via profile or manual via command line?

These might also be a good idea to add in /etc/sysctl.conf:

net.ipv4.neigh.default.gc_thresh3=8192
net.ipv6.neigh.default.gc_thresh3=8192

They increase the maximum entries limit in ARP tables, IPv4 and IPv6 respectively and will allow you to have a lot more containers.

On that note, I believe the standard linux bridge has a hard limit of 1024 hosts, so in case you might go over, either balance your hosts in different bridges, or, my preference, got with openvswitch.

1 Like

We have been running LXD in production mode for about 2 years now. Each container host a WordPress site for our customers (nginx and PHP) - we have about 650 sites spread across 10 LXD container servers. We also run MariaDB servers for the WordPress sites in other containers.

So far, so good. We have run into minor issues along the way (more of a learning-curve for containers), but overall, the containers run very well.

Things we really like about LXD:

  • For our workloads (web site hosting) the containers offer a great way to isolate each site from another
  • Using BTRFS filesystem allows us to take snapshots of each site in less than a couple of seconds
  • Starting/stopping containers is very fast; very little downtime for our customers
  • Very easy to spin up a new container

Things we struggle with:

  • No centralized management tool to see all container servers with their running containers (LXDUI is a per-server only at this time)
  • Getting per-container stats is difficult - especially to see which sites are misbehaving
  • Until LXD 3.7 (just released), no easy way to incrementally move a site from one container server to another
3 Likes

Do you connect WordPress to the MariaDB servers using tcp/ip, or do you somehow share/bind the MariaDB unix socket between containers?

Why did you choose BTRFS instead of ZFS?

All containers share same profile because its the same softwares inside.
We defined CPU and RAM limits in container configuration but you may create dedicated profiles.

1 Like

You mean disk size ? We are not using ZFS so we dont have limits.

1 Like

We run the MariaDB on a different server and connect over TCP.

We have been using BTRFS for so long, it is not even a question anymore. I tried ZFS in the past, and it required lots more RAM than BTRFS to get similar performance. Maybe things have changed over the past couple of years. For us, BTRFS requires minimal overhead to perform well, and the snapshots are almost instantaneous. We don’t use BTRFS to create any RAID devices. The underlying storage for the VM already does this.

1 Like

@rkelleyrtp @fridobox @druggo @datablitz7 Have you thought about use Kubernetes/Docker?

My use case requires persistent system containers rather than application hosting or stateless containers. Pets vs cattle!