FYI
Launching ubuntu24
Error: Failed instance creation: Unable to fetch https://images.linuxcontainers.org/images/ubuntu/noble/amd64/default/20250615_07:42/incus.tar.xz: 503 Service Unavailable
FYI
Launching ubuntu24
Error: Failed instance creation: Unable to fetch https://images.linuxcontainers.org/images/ubuntu/noble/amd64/default/20250615_07:42/incus.tar.xz: 503 Service Unavailable
Thanks to @stgraber - everything appears to be fully restored now.
Appreciate the quick resolution!
It’s back online now. We have pretty good monitoring on all those services but it sometimes takes a little while to get to the bottom of what’s going on and get things back online…
In this instance, the primary cluster serving images.linuxcontainers.org ran into a Ceph failure on one of the 3 servers, rather than just fail (which would be fine), this caused all I/O to the shared filesystem to hang instead. The trigger for this appears to have been a kernel bug somewhere in the CephFS driver as we can see a kernel stacktrace happening several hours before things go really bad with the system load climbing slowly from that point on.
There appears to also have been some abuse of the rsync server going on which probably exacerbated the issue by hammering the already struggling shared storage.
In any case, things are back online now except for the rsync server which is purposefully kept offline now and that’s headed for decommissioning.
We changed the way we recommend folks run mirrors well over a year ago and the current way doesn’t use rsync at all, so hopefully we can just get rid of that and avoid bots/attack scripts flooding it.