Snap refresh has caused lots of issues - LXD on Ubuntu 18.10

We are running into lots of issues with the automated snap update process on our Ubuntu 18.10 servers. Over the past few days, all of our container servers were auto-upgraded from 3.14 to 3.15 which has caused loads of problems.

Specifically:

  • The snap refresh command hangs due to an NFS mount point from our backup servers. As a result, the refresh command fails and the “lxc” command gets removed. The fix is to forcefully unmount the NFS drive which clears up the refresh command. Problem is, all our backups fail.

  • A second, and more serious issue, is the “snap refresh” command seems to have unmounted the /proc filesystem on our containers, and we have to restart them. When this issue occurs, the “free -m” command fails.

Furthermore, it seems there is NO WAY to disable the snap update process. You can read the 2yr long thread here: https://forum.snapcraft.io/t/disabling-automatic-refresh-for-snap-from-store/707/243). The only way around this is to run a couple of firewall rules to drop all connections to the snap store: "sudo iptables -A OUTPUT -d search.apps.ubuntu.com -j DROP” and “sudo iptables -A OUTPUT -d api.snapcraft.io -j DROP”

Given this are production servers, we take any outage very seriously, and the snap tool has totally destroyed our uptime numbers for our clients. I cannot believe there is no way to disable the snap update command. This is totally unacceptable.

Is there a non-snap version of LXD we can install via separate repository? Any other work arounds we can use to keep the snap update from automatically occuring?

2 Likes

The /proc issue is a LXCFS 3.1.1 bug, not snap specific, you can read more in the many other forum posts about it or on the Github issue here: https://github.com/lxc/lxcfs/issues/295

The NFS issue is interesting, do you have any idea why it’s hanging?
Is it because the backup server itself isn’t responsive at the time, so any filesystem access under its mountpoint is held by the kernel or does it look like something else?

As for controlling refresh, you can configure a time during which those happen:

For production environments where you may want to do staging on a number of machines before updating the rest, the snap proxy lets you do revision overrides of any snap that uses it: https://docs.ubuntu.com/snap-store-proxy/en/

Thanks Stephane.

The NFS mount point is valid - no issues. My suspicion is the snap update command tries to unmount some file systems which seems to lock the NFS mount point. The snap process hangs indefinitely until we forcefully unmount the NFS share. Imagine if my containers were hosted on an NFS mount point during the snap refresh - all the containers would die!

And, this is exactly why we want to move away from the Snap tool. Edge conditions like this are not normally caught in standard test cases, and there is NO WAY to disable snap updates. Case in point: it seems a number of people have run into lots of issues with LXD 3.15. Using automated snap updates is unacceptable in a production environment.

We don’t want workarounds (snap proxy server, etc). We want the ability to review and test the updates ourselves before adding them to the system. Relying on Canonical’s test cases (regardless of how good they are) is a very risky way of upgrading production servers.

I also think that automatic update of LXD is unacceptable. It has forced me to reboot my system at least on two different occasions, at inconvenient times.

The /proc issue may not be snap related, but snap updated my system with this bug at an unexpected time. I found out accidentally when I wanted to find out how busy my system was with “uptime”, which did not work.

The snap store proxy seems like a good solution for a big company that can afford the time and resources needed to deploy it. It doesn’t seem trivial to install and I am hesitant to dedicate several hours installing it and testing it.

Can the snap store proxy be installed in a container of the host that needs to use it?

Can you share where the NFS mountpoint is located on your filesystem, what version of NFS it is and whether autofs is involved in mounting that?

Hopefully that will let us figure out why that would cause a hang.

Am alternative workaround to the proxy workaround, is to

  1. Run sudo snap login to login to the Snap Store with your Ubuntu One account.
  2. Then, go to the Ubuntu One website and change the password of your account.

By doing so, there will be no snap updates.

It happened to me and I was mystified when will my LXD get eventually updated.
Just adding it here in case it may be helpful to someone.

Hi Stephane,

Some notes about the NFS Server:

  • The NFS server is running CentOS 6.9 with kernel 2.6.32-696.16.1.el6.centos.plus.x86_64.
  • The drive is a1 TB SSD that is exported to our LXD servers that is used to backup btrfs snapshots of our containers
  • The NFS mount point is connected each night via our nightly backup script

Some notes about the LXD Server:

  • Running Ubuntu 18.10 with kernel 4.18.0-17-generic
  • LXD/LXC version 3.15

Also, we have a number of other LXD servers that did not have this issue, but the backup script runs at various times on all the servers. Thus, I suspect this is a timing issue when snap update runs at the same time the backup drive is mounted.

Thanks Simos. But, I don’t have a Ubuntu One account. Is that a paid service?

Where is the NFS mount on the server’s filesystem when it’s mounted during a backup?

Ubuntu One is free. It’s a Single-Sign On service, used for the Ubuntu and Canonical services (such as launchpad.net for the development of Ubuntu).

You can create an account at https://login.ubuntu.com/

After you do the sudo snap login with your new account, you can then run any snap commands without the sudo at the start. That is, if you are already logged in into Snap with your Ubuntu One SSO account, you can snap install somepackage to install snap packages. No more sudo.

Canonical has a few paid services like Landscape and Livepatch. With your Ubuntu One SSO account, you can also use these services in the free tier.

Here are the variables used for our backup script. Notice the snapshot dir is part of the lxd storage-pool directory. This is probably why the snap update is having issues.

BACKUP_DIR="/usr/local/BACKUP_CACHE"
BTRFS_SNAPDIR="/var/lib/lxd/storage-pools/default/.nightly_snaps"
CONTAINER_BASE_DIR="/var/lib/lxd/storage-pools/default/containers"

Thanks Simos!