I don’t see nixos images listed on https://images.linuxcontainers.org/ anymore. Were they removed for some reason?
Only images that pass our automatic tests get published. Also all images have a 10 days expiry before they get automatically removed from the image server.
NixOS has been failing its tests for the past 10 days apparently.
Here’s the most recent failure: Console - amd64 - Jenkins
Once the NixOS image files downloaded from upstream once again pass the tests, the images will automatically re-appear.
@adamcstephens FYI ^
Thanks for the explanation. Have you considered keeping that last known-good image in a series indefinitely?
We did consider it but decided not to, at least not for now.
As is clear here, we don’t exactly have folks keeping on top of every build and noticing when things have been failing for a few days, who can reach out to folks familiar with the distro in question to resolve the issue.
So if we didn’t have expiry on images that cause them to go out completely, we may not notice.
Which then gets us to why shipping old images is a problem. Folks generally assume the images are up to date and so don’t apply package update right after deployment. If we start shipping old images, we will start shipping images with some pretty nasty security issues…
The best path forward likely would be to have a team of folks who more actively monitor, update and fix our images. But it’s the kind of work that very few find interesting so we mostly deal with drive by reactive contributions rather than long term continuous contributions from a stable team.
I actually do follow the image-nixos jobs, so if those are failing for a couple days I dig into them. Unfortunately, the test jobs for all distros are intermixed and I don’t see a way for jenkins to give me a filtered rss feed for them, they’re all named test-image, and thus too much noise for me to follow. Therefore, I wasn’t aware these were failing. Some assistance here in improving the visibility would be great.
I’d also note that only unstable images are affected by this problem but unfortunately the current setup blocks publishing the working stable images too. It would be very nice if these could be decoupled such that there was granular publishing and (as above) visibility. Nixpkgs unstable is a rolling release and can on occasion have breakages that won’t impact the stable release.
This should be the fix. But it still needs to be reviewed, and then will take a few days to reach unstable, before any of our images will be published again. nixos/channel: fix channel linkage if broken channel link already exists by adamcstephens · Pull Request #513441 · NixOS/nixpkgs · GitHub
@adamcstephens You can send a change to test-image to have it ignore failures on unstable. We have similar rules for a couple of other spots where we have images for pre-release/development distros.
For visibility, yeah, Jenkins doesn’t make that easy for sure…
I have a pretty hacky script which tries to track things down:
#!/usr/bin/python3
import json
from urllib.request import urlopen
from colorama import Fore, Style
def get_json(url, depth=0):
return json.loads(urlopen("%s/api/json?depth=%d" % (url, depth)).read().decode())
# Load the test results.
results = {}
data = get_json("https://jenkins.linuxcontainers.org/job/test-image", 1)
for build in data['builds']:
if build['inProgress']:
status = "RUNNING"
else:
status = build['result']
for action in build['actions']:
if not "parameters" in action:
continue
image_url = action['parameters'][1]['value']
break
if image_url not in results:
results[image_url] = (status, build['url'])
# Go over the image list.
data = get_json("https://jenkins.linuxcontainers.org", 1)
for job in data['jobs']:
if not job['name'].startswith("image-"):
continue
name = job['name']
url = ""
status = "UNKNOWN"
if 'lastSuccessfulBuild' in job and job['lastSuccessfulBuild']:
data = results.get(job['lastSuccessfulBuild']['url'], None)
if data:
status = data[0]
if status in ("FAILURE", "RUNNING"):
url = data[1]
if status == "UNKNOWN":
status = f"{Fore.CYAN}{status}{Style.RESET_ALL}"
elif status == "SUCCESS":
status = f"{Fore.GREEN}{status}{Style.RESET_ALL}"
elif status == "FAILURE":
status = f"{Fore.RED}{status}{Style.RESET_ALL}"
elif status == "RUNNING":
status = f"{Fore.YELLOW}{status}{Style.RESET_ALL}"
print(f"{name : <30}{status : <20}{url : <50}")
I depend on these nixos images being available (even unstable).
The default incus cache expiration (ttl) is 10 days of inactivity. Does it seem like a good strategy to increase this to something like 60 days to allow for more grace? Is there a better strategy to ensure a known good image is always saved somewhere locally?
Definitely not going to be doing 60 days ![]()
The best bet here is to have the tests ignore failures on the unstable image though that means that potentially broken images will be heading to the image server.
Note that I was referring to ‘my local’ expiration (not the communities’). Does setting my local server configuration to expire images with inactivity over 60 days seem reasonable and acceptable? Is there a better way?
The expiry for cached images on your local server works differently, it defaults to 10 days but not 10 days since the image was created or first downloaded, instead, it’s 10 days since it was last used. So if you keep creating containers from the image, it won’t ever expire.
That said, the image going away from the image server will still prevent you from running a simple incus launch iamges:nixos/unstable. To specifically use the older locally cached image, you’ll need to look at incus image list and launch based on the fingerprint or attach a local alias to the image.
Oh right, I see you already know that since you mention inactivity.
So yeah, bumping it to more than 10 days is perfectly fine. The only downside is more disk usage if you’re using a large number of images very occasionally, but if you only use a few images, then whether they’re set to 10 or 360 days won’t really make any difference.
Confirmed on all points - thank you!