How to implement smarter container provisioning for LXD Cluster?

Hi guys, any recommendation or starting points on how to extend the current round robbin deployment order on a cluster?

I would like containers to be launched on the node with the fewer load instead of round robbin.

LXD doesn’t provide any CPU load avg information, so you would need to pull that data in yourself and create containers based on your own metrics / judgements. (LXCFS can now provide per container load avgs)

From the LXD side its probably much easier to do a round robin approach as CPU load avgs can change pretty quick (i know you have 1/5/15 minute load avgs but that doesn’t mean loads don’t spike) so it can be hard to analyse and judge where is best to deploy a container.

For example if I have an LXD host dedicated to running cron jobs once an hour and I looked at the load avgs it would be quite low, so I think great and deploy a web server to it, once an hour when the cron jobs cause a spike its possible it could cause the web server to drastically slow down / be non-responsive (a pretty unlikely scenario but you get the idea)

As you move forward with this idea, you will eventually have a bunch of metrics that are needed to identify correct placement of containers. For example, you may want to ensure two similar containers are not on the same site. In addition, you will want to track CPU, RAM, SWAP, Disk IO, and Network stats.

Something I have done is use a Redis DB to track these stats on all LXD servers. I find this easier than managing a “real” DB to keep the data. Plus, you can easily setup Redis on all your LXD servers and have a highly available instance across multiple servers.

As for collecting the data, you can either script your own tools to get the data or use something like “collectd” to grab them. Once you get the raw data, simply add those items to the Redis DB with an expiry timer.

My $0.02

Thank you @turtle0x1, you opened my eyes, load now (or for the last 15 mins) <> load the next hour, or next day, etc…

The scenario I’m playing with right now is a service running hundreds of small websites (container per site) from around the world, so indeed loads could change a lot for different hours of the day due to different time zones involved.

Your answer made me think a lot of how these variables would change over time and maybe it is best to hard-code a set of rules to try to balance loads based on timezone of users + type of load + container size, which would also allow building a map of “overlapping loads” and hence resource provisioning taking into account those factors and variables could potentially allow me higher density per node.

Thank you for the Redis tip @rkelleyrtp, do you think Kafka would do the job as well? I am reading it can write data to disk directly and efficiently and use barely any RAM (which is pretty scarce in my case, while I barely use disk I/O).

Hi @Yosu_Cadilla,

I don’t know about Kafka, but I do know redis is very fast with a small footprint. We have been using it reliable for 2yrs to track our container activity.

Also, I manage 16 LXD servers that run multiple web sites as you noted above to @turtle0x1 using Debian containers with NGinx and PHP 7.2. All totaled, I think we are over 770 websites spread across the LXD servers. For this application workload, LXD works great! Let me know if you want to discuss particulars of how we setup our websites and LXD servers.

1 Like

Thanks for the offering @rkelleyrtp, I would love to know more about how you setup your websites and LXD servers.