High load when launching container

_stfn · January 24, 2020, 10:16pm

I’ve noticed my load averages spikes to ± 2.50 when launching a new container. When the container is based on an image I’ve created, it looks like the load spike is caused by gzip and tar (I think because the image has to be unpacked?).

Do I need to worry about the high load averages and, if so, do I need to queue the creation of new containers in order to keep them to a minimum?

stgraber · January 24, 2020, 10:20pm

If using an optimized storage driver (anything but dir effectively), then only the first container created from an image results in an unpack. We also prevent multiple unpacks in parallel in that case.

Otherwise, you can try to throttle a bit, but it may be interesting to know where your load comes from.
The load average on Linux is simply the number of processes running or waiting to be running.
An ideally used system would have a load average identical to the number of CPU cores.

It’s also worth noting that I/O wait causes the load to increase when other processes may not be impacted as they’re not accessing the same storage device.

All that to say, it tends to be better to focus on the actual responsiveness of the system and monitor things like latency and throughput of the services running on the system, rather than looking at a pretty artificial value such as the load average.

_stfn · January 27, 2020, 8:42am

Thanks for the explanation!

I’m using ZFS as the storage driver, but I still see the gzip and tar processes come up every time I launch a container based on my custom image. So far it still looks like those processes are causing the high load average. Or are those processes needed during container creation, even if the base image was already unpacked?

Also, the node on which I’m testing has just one core, so the average of ± 2.50 is too high. But like you said, I shouldn’t focus primarily on the load average. It just felt a bit scary seeing it go up so much during the creation of a new container.

stgraber · January 27, 2020, 9:14am

Could you run lxc monitor --type=logging --pretty while creating a container from that image?
It should show what’s going on as far as image re-use and the like.

_stfn · January 27, 2020, 9:22am

This caught my attention while creating a container from my image:

DBUG[01-27|09:16:59] Filesystem of pool has changed since cached image volume created, regenerating image volume driver=zfs fingerprint=*** pool=default
DBUG[01-27|09:16:59] DeleteImage started                      fingerprint=*** pool=default driver=zfs
DBUG[01-27|09:17:00] DeleteImage finished                     driver=zfs fingerprint=*** pool=default

Could this cause LXD to unpack the image every time a container is created?

stgraber · January 27, 2020, 9:32am

Yes, that’s definitely an issue. Could you show the full debug log so we have a bit more context on what led you to that part?

@tomp implemented that logic, he may be able to quickly track it down.

tomp · January 27, 2020, 9:34am

Yep will take a look.

_stfn · January 27, 2020, 9:43am

Here you go. Thanks in advance.

DBUG[01-27|09:41:06] New event listener: 00176d84-9096-4ad6-be08-14d3cb19d107 
DBUG[01-27|09:41:15] Handling                                 ip=@ method=GET url=/1.0 user=
DBUG[01-27|09:41:15] Handling                                 ip=@ method=GET url=/1.0/images/aliases/base user=
DBUG[01-27|09:41:15] Handling                                 method=GET url=/1.0/images/65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 user= ip=@
DBUG[01-27|09:41:15] Handling                                 user= ip=@ method=GET url=/1.0/events
DBUG[01-27|09:41:15] New event listener: 902dc450-d8bf-4a4b-af7c-35a18d5f3a13 
DBUG[01-27|09:41:15] Handling                                 ip=@ method=POST url=/1.0/instances user=
DBUG[01-27|09:41:15] Responding to container create 
DBUG[01-27|09:41:15] New task Operation: 0cc16a88-af54-4bcb-8b9a-2cf76e403d04 
DBUG[01-27|09:41:15] Started task operation: 0cc16a88-af54-4bcb-8b9a-2cf76e403d04 
DBUG[01-27|09:41:15] Handling                                 ip=@ method=GET url=/1.0/operations/0cc16a88-af54-4bcb-8b9a-2cf76e403d04 user=
INFO[01-27|09:41:15] Creating container                       ephemeral=false name=test-container project=default
INFO[01-27|09:41:15] Created container                        ephemeral=false name=test-container project=default
DBUG[01-27|09:41:15] CreateInstanceFromImage started          driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:15] EnsureImage started                      driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default
DBUG[01-27|09:41:15] Filesystem of pool has changed since cached image volume created, regenerating image volume pool=default driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2
DBUG[01-27|09:41:15] DeleteImage started                      fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default driver=zfs
DBUG[01-27|09:41:15] DeleteImage finished                     driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default
DBUG[01-27|09:41:15] EnsureImage finished                     driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default
DBUG[01-27|09:41:16] CreateInstanceFromImage finished         driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:16] UpdateInstanceBackupFile started         driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:16] UpdateInstanceBackupFile finished        project=default driver=zfs instance=test-container pool=default
DBUG[01-27|09:41:16] Success for task operation: 0cc16a88-af54-4bcb-8b9a-2cf76e403d04 
DBUG[01-27|09:41:16] Handling                                 url=/1.0/instances/test-container user= ip=@ method=GET
DBUG[01-27|09:41:16] Handling                                 ip=@ method=PUT url=/1.0/instances/test-container/state user=
DBUG[01-27|09:41:16] New task Operation: d366f109-f0a8-4ca4-99b8-bb6825693b27 
DBUG[01-27|09:41:16] Started task operation: d366f109-f0a8-4ca4-99b8-bb6825693b27 
DBUG[01-27|09:41:16] Container idmap changed, remapping 
DBUG[01-27|09:41:16] Updated metadata for task Operation: d366f109-f0a8-4ca4-99b8-bb6825693b27 
DBUG[01-27|09:41:16] MountInstance started                    driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:16] Handling                                 ip=@ method=GET url=/1.0/operations/d366f109-f0a8-4ca4-99b8-bb6825693b27 user=
DBUG[01-27|09:41:16] MountInstance finished                   pool=default project=default driver=zfs instance=test-container
DBUG[01-27|09:41:22] Updated metadata for task Operation: d366f109-f0a8-4ca4-99b8-bb6825693b27 
DBUG[01-27|09:41:22] Scheduler: network: vetha62625e0 has been added: updating network priorities 
DBUG[01-27|09:41:22] Scheduler: network: veth81c5965e has been added: updating network priorities 
DBUG[01-27|09:41:22] MountInstance started                    driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] MountInstance finished                   pool=default project=default driver=zfs instance=test-container
DBUG[01-27|09:41:22] UpdateInstanceBackupFile started         driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] UpdateInstanceBackupFile finished        driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] MountInstance started                    driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] MountInstance finished                   driver=zfs instance=test-container pool=default project=default
INFO[01-27|09:41:22] Starting container                       action=start created="2020-01-27 09:41:15.369792115 +0000 UTC" ephemeral=false name=test-container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC"
DBUG[01-27|09:41:22] Handling                                 user= ip=@ method=GET url=/internal/containers/66/onstart
DBUG[01-27|09:41:22] MountInstance started                    pool=default project=default driver=zfs instance=test-container
DBUG[01-27|09:41:22] MountInstance finished                   driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:23] Scheduler: container test-container started: re-balancing 
INFO[01-27|09:41:23] Started container                        name=test-container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC" action=start created="2020-01-27 09:41:15.369792115 +0000 UTC" ephemeral=false
DBUG[01-27|09:41:23] Success for task operation: d366f109-f0a8-4ca4-99b8-bb6825693b27 
DBUG[01-27|09:41:23] Event listener finished: 902dc450-d8bf-4a4b-af7c-35a18d5f3a13 
DBUG[01-27|09:41:23] Disconnected event listener: 902dc450-d8bf-4a4b-af7c-35a18d5f3a13

tomp · January 27, 2020, 10:54am

This should fix it:

_stfn · January 28, 2020, 8:56am

Awesome! Thanks for resolving this so quickly. Keep up the amazing work.

Will this fix be included in the next release or can I get it sooner?

tomp · January 28, 2020, 10:02am

I believe the next release is due at the end of day tomorrow (UTC), but @stgraber can confirm.

stgraber · January 28, 2020, 9:02pm

Depending on what we still have pending, I’m looking at either late tomorrow or reasonably early Thursday. In any case, I want it out before I fly to Brussels for FOSDEM