LXD image servers

Introduction

Some of you may have noticed some bumpiness with our UK and US mirrors over the past few weeks/months. This was caused by a mix of hardware failures and network issues on those mirror servers operated by Canonical Ltd.

We have now designed a new, simpler, less resource intensive, much more reliable way to handle our regional mirrors.

Unfortunately due to other priority work taking precedence, the Canonical IS team has indicated that they are unlikely to have time to change their side of the infrastructure over the coming months.

As a result, we will be phasing out the mirror servers operated by Canonical Ltd and will initially be serving the traffic directly from our main servers located in Canada and are hoping that some of our community members will lend us a hand and offer to operate some regional servers to improve latency for our users.

Infrastructure

The updated image distribution infrastructure looks like this:

  • Image builders generating our images daily (by Canonical Ltd. and @stgraber)
  • Test servers validating a subset of our images for quality (by @stgraber)
  • Image signing and publishing server (by @stgraber)
  • Cluster of 3 fully redundant web servers serving the original copy of our images (by @stgraber)

Those main web servers run in a datacenter in Canada with a variety of Tier 1 transit providers (Zayo, HE, Cogent, Arelion) and 20Gbps of peak transit capacity.

However as good and diversified a set of transit providers you may have, you can’t beat physics and to provide a great experience to our worldwide user base, we need servers closer to our users, at minimum some Europe and Asia capacity.

The main servers in Canada do GeoIP lookups on all client requests and can then dispatch users to a server closer to them when available.

While our servers have very fast connectivity, the total daily throughput of our mirrors is pretty low at somewhere between 300Mbps and 500Mbps for the total worldwide traffic. The faster connectivity is mostly useful to handle spikes in demand and to always be able to provide a download as fast as a client’s own internet connection.

Security

Before going into much more details about running regional mirrors, we need to touch on how any of this can be safe. After all, we never want one of our users to download an altered, compromised image.

To avoid such issues, the main servers that we operate will ALWAYS be the ones serving the index files which LXD then fetches over HTTPS. This means LXD can trust the index files it downloaded, after that, downloading the actual image artifacts is where regional mirrors come into play. LXD will download the artifacts from the regional mirror over HTTP (with fallback on HTTPS), validating the hash (SHA256) of all the downloaded files against the hash contained in the index file.

Any attempt at altering an image file will therefore cause a download validation failure when LXD checks the downloaded hash against the expected hash from the trusted index file.

This effectively means that while we definitely want our mirrors to be functional and well maintained, we don’t have to strictly trust their operators as the worst that can happen is a denial of service by returning broken files to LXD. Should this happen, we’d simply change the GeoIP rules in the main servers and stop sending users to the affected server.

Running a mirror

Running a mirror for internal company or personal use is effectively the same as running a public regional mirror, so you can follow the same instructions and just skip the last step :slight_smile:

To do so, we’ve come up with a pretty simply nginx configuration file you can use.
This has been tested on Ubuntu 22.04 LTS though it should work just as well on other platforms.

If you intend to become a public regional mirror, you’ll need the following:

  • At least 1Gbps of symmetric internet access
  • IPv4 and IPv6 connectivity
  • A DNS record pointing to both the IPv4 and IPv6 address of your server
  • Let’s Encrypt or another valid TLS certificate for your DNS record
  • 100GB of spare disk space
  • System TCP congestion control set to bbr (net.ipv4.tcp_congestion_control=bbr)

The nginx config is as follow:

# Setup a local cache of 60GB with expiry after 3 days.
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=STATIC:10m inactive=3d max_size=100g;

server {
    # Listeners
    listen 80;
    listen 443 ssl;
    listen [::]:80 ipv6only=on;
    listen [::]:443 ssl ipv6only=on;
    server_name lxd-images.example.net

    # Enable HTTPS with Let's Encrypt
    ssl_certificate /etc/letsencrypt/live/lxd-images.example.net/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/lxd-images.example.net/privkey.pem;
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;

    # Disable all client logging
    access_log off;

    # Default to bouncing everything back to the original server.
    location / {
        return 302 "$scheme://images.linuxcontainers.org$request_uri";
    }

    # For the images themselves, serve directly from the local cache.
    location /images {
        proxy_pass                      https://images.linuxcontainers.org/images;
        proxy_cache                     STATIC;
        proxy_cache_key                 $proxy_host$request_uri;
        proxy_cache_lock                on;
        proxy_cache_valid               200 3d;
        proxy_cache_valid               301 302 1m;
        proxy_cache_valid               404 5m;
        proxy_http_version              1.1;
        proxy_max_temp_file_size        2048m;
        proxy_set_header                "Connection" "";
        proxy_set_header                "X-LXC-No-GeoIP" "on";
        proxy_ssl_protocols             TLSv1.3;
        proxy_ssl_server_name           on;
        proxy_ssl_trusted_certificate   /etc/ssl/certs/ca-certificates.crt;
        proxy_ssl_verify                on;
    }
}

This configuration is for a virtual-host of lxd-images.example.net, it’s configured to operate as a caching reverse proxy for the LXD image artifacts, sending everything else back to the main servers. The cache is configured to not exceed 100GB and to cache files for up to 3 days (which is the maximum life time of a file in our case).

The benefit of this caching reverse proxy approach is that only the files that are being accessed will be downloaded to your server. Images that nobody in your region uses, will simply never be downloaded, saving a whole bunch of space.

To use your image server directly, you can use:

lxc remote add my-images https://lxd-images.example.net --protocol=simplestreams --public

And then launch an instance using it with:

lxc launch my-images:ubuntu/22.04 foo

Then if you want us to send you some traffic, send a private message to @stgraber with:

  • Who you are (who to credit)
  • Where your server is
  • What kind of connectivity you have
  • What’s the DNS record for your server

We’ll then do some tests to make sure connectivity is working as expected and will then add it to the rotation on our side. You can ask us to remove you from GeoIP at any point and should also notify us of any planned maintenance lasting more than a couple of minutes so we can send the traffic elsewhere during that time.

Regional mirrors

Here is the list of currently operating mirrors:

Server Location Speed Operator Notes
images.linuxcontainers.org Montreal, Canada 20Gbps @stgraber Primary servers
us.lxd.images.canonical.com Boston, USA 100Gbps Canonical Ltd. Being phased out
uk.lxd.images.canonical.com London, UK 100Gbps Canonical Ltd. Being phased out

Please note that we NEVER recommend directly connecting to a particular mirror.
Our users should always interact with https://images.linuxcontainers.org which will then redirect to a regional mirror if applicable. Mirrors can get decommissioned or change addresses at any time.

2 Likes