_stfn
(Stefan)
January 24, 2020, 10:16pm
1
I’ve noticed my load averages spikes to ± 2.50 when launching a new container. When the container is based on an image I’ve created, it looks like the load spike is caused by gzip
and tar
(I think because the image has to be unpacked?).
Do I need to worry about the high load averages and, if so, do I need to queue the creation of new containers in order to keep them to a minimum?
stgraber
(Stéphane Graber)
January 24, 2020, 10:20pm
2
If using an optimized storage driver (anything but dir
effectively), then only the first container created from an image results in an unpack. We also prevent multiple unpacks in parallel in that case.
Otherwise, you can try to throttle a bit, but it may be interesting to know where your load comes from.
The load average on Linux is simply the number of processes running or waiting to be running.
An ideally used system would have a load average identical to the number of CPU cores.
It’s also worth noting that I/O wait causes the load to increase when other processes may not be impacted as they’re not accessing the same storage device.
All that to say, it tends to be better to focus on the actual responsiveness of the system and monitor things like latency and throughput of the services running on the system, rather than looking at a pretty artificial value such as the load average.
_stfn
(Stefan)
January 27, 2020, 8:42am
3
Thanks for the explanation!
I’m using ZFS as the storage driver, but I still see the gzip
and tar
processes come up every time I launch a container based on my custom image. So far it still looks like those processes are causing the high load average. Or are those processes needed during container creation, even if the base image was already unpacked?
Also, the node on which I’m testing has just one core, so the average of ± 2.50 is too high. But like you said, I shouldn’t focus primarily on the load average. It just felt a bit scary seeing it go up so much during the creation of a new container.
stgraber
(Stéphane Graber)
January 27, 2020, 9:14am
4
Could you run lxc monitor --type=logging --pretty
while creating a container from that image?
It should show what’s going on as far as image re-use and the like.
_stfn
(Stefan)
January 27, 2020, 9:22am
5
This caught my attention while creating a container from my image:
DBUG[01-27|09:16:59] Filesystem of pool has changed since cached image volume created, regenerating image volume driver=zfs fingerprint=*** pool=default
DBUG[01-27|09:16:59] DeleteImage started fingerprint=*** pool=default driver=zfs
DBUG[01-27|09:17:00] DeleteImage finished driver=zfs fingerprint=*** pool=default
Could this cause LXD to unpack the image every time a container is created?
stgraber
(Stéphane Graber)
January 27, 2020, 9:32am
6
Yes, that’s definitely an issue. Could you show the full debug log so we have a bit more context on what led you to that part?
@tomp implemented that logic, he may be able to quickly track it down.
_stfn
(Stefan)
January 27, 2020, 9:43am
8
Here you go. Thanks in advance.
DBUG[01-27|09:41:06] New event listener: 00176d84-9096-4ad6-be08-14d3cb19d107
DBUG[01-27|09:41:15] Handling ip=@ method=GET url=/1.0 user=
DBUG[01-27|09:41:15] Handling ip=@ method=GET url=/1.0/images/aliases/base user=
DBUG[01-27|09:41:15] Handling method=GET url=/1.0/images/65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 user= ip=@
DBUG[01-27|09:41:15] Handling user= ip=@ method=GET url=/1.0/events
DBUG[01-27|09:41:15] New event listener: 902dc450-d8bf-4a4b-af7c-35a18d5f3a13
DBUG[01-27|09:41:15] Handling ip=@ method=POST url=/1.0/instances user=
DBUG[01-27|09:41:15] Responding to container create
DBUG[01-27|09:41:15] New task Operation: 0cc16a88-af54-4bcb-8b9a-2cf76e403d04
DBUG[01-27|09:41:15] Started task operation: 0cc16a88-af54-4bcb-8b9a-2cf76e403d04
DBUG[01-27|09:41:15] Handling ip=@ method=GET url=/1.0/operations/0cc16a88-af54-4bcb-8b9a-2cf76e403d04 user=
INFO[01-27|09:41:15] Creating container ephemeral=false name=test-container project=default
INFO[01-27|09:41:15] Created container ephemeral=false name=test-container project=default
DBUG[01-27|09:41:15] CreateInstanceFromImage started driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:15] EnsureImage started driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default
DBUG[01-27|09:41:15] Filesystem of pool has changed since cached image volume created, regenerating image volume pool=default driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2
DBUG[01-27|09:41:15] DeleteImage started fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default driver=zfs
DBUG[01-27|09:41:15] DeleteImage finished driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default
DBUG[01-27|09:41:15] EnsureImage finished driver=zfs fingerprint=65fb19045ca25b3bebe02763dbe55af55e56564c07b66927939e20a5123323f2 pool=default
DBUG[01-27|09:41:16] CreateInstanceFromImage finished driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:16] UpdateInstanceBackupFile started driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:16] UpdateInstanceBackupFile finished project=default driver=zfs instance=test-container pool=default
DBUG[01-27|09:41:16] Success for task operation: 0cc16a88-af54-4bcb-8b9a-2cf76e403d04
DBUG[01-27|09:41:16] Handling url=/1.0/instances/test-container user= ip=@ method=GET
DBUG[01-27|09:41:16] Handling ip=@ method=PUT url=/1.0/instances/test-container/state user=
DBUG[01-27|09:41:16] New task Operation: d366f109-f0a8-4ca4-99b8-bb6825693b27
DBUG[01-27|09:41:16] Started task operation: d366f109-f0a8-4ca4-99b8-bb6825693b27
DBUG[01-27|09:41:16] Container idmap changed, remapping
DBUG[01-27|09:41:16] Updated metadata for task Operation: d366f109-f0a8-4ca4-99b8-bb6825693b27
DBUG[01-27|09:41:16] MountInstance started driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:16] Handling ip=@ method=GET url=/1.0/operations/d366f109-f0a8-4ca4-99b8-bb6825693b27 user=
DBUG[01-27|09:41:16] MountInstance finished pool=default project=default driver=zfs instance=test-container
DBUG[01-27|09:41:22] Updated metadata for task Operation: d366f109-f0a8-4ca4-99b8-bb6825693b27
DBUG[01-27|09:41:22] Scheduler: network: vetha62625e0 has been added: updating network priorities
DBUG[01-27|09:41:22] Scheduler: network: veth81c5965e has been added: updating network priorities
DBUG[01-27|09:41:22] MountInstance started driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] MountInstance finished pool=default project=default driver=zfs instance=test-container
DBUG[01-27|09:41:22] UpdateInstanceBackupFile started driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] UpdateInstanceBackupFile finished driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] MountInstance started driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:22] MountInstance finished driver=zfs instance=test-container pool=default project=default
INFO[01-27|09:41:22] Starting container action=start created="2020-01-27 09:41:15.369792115 +0000 UTC" ephemeral=false name=test-container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC"
DBUG[01-27|09:41:22] Handling user= ip=@ method=GET url=/internal/containers/66/onstart
DBUG[01-27|09:41:22] MountInstance started pool=default project=default driver=zfs instance=test-container
DBUG[01-27|09:41:22] MountInstance finished driver=zfs instance=test-container pool=default project=default
DBUG[01-27|09:41:23] Scheduler: container test-container started: re-balancing
INFO[01-27|09:41:23] Started container name=test-container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC" action=start created="2020-01-27 09:41:15.369792115 +0000 UTC" ephemeral=false
DBUG[01-27|09:41:23] Success for task operation: d366f109-f0a8-4ca4-99b8-bb6825693b27
DBUG[01-27|09:41:23] Event listener finished: 902dc450-d8bf-4a4b-af7c-35a18d5f3a13
DBUG[01-27|09:41:23] Disconnected event listener: 902dc450-d8bf-4a4b-af7c-35a18d5f3a13
_stfn
(Stefan)
January 28, 2020, 8:56am
10
Awesome! Thanks for resolving this so quickly. Keep up the amazing work.
Will this fix be included in the next release or can I get it sooner?
tomp
(Thomas Parrott)
January 28, 2020, 10:02am
11
I believe the next release is due at the end of day tomorrow (UTC), but @stgraber can confirm.
stgraber
(Stéphane Graber)
January 28, 2020, 9:02pm
12
Depending on what we still have pending, I’m looking at either late tomorrow or reasonably early Thursday. In any case, I want it out before I fly to Brussels for FOSDEM
1 Like