Snap lxd auto refresh stuck on 'Copy snap "lxd" data'

The snap auto refresh my lxd server, from 5.1 to 5.2, but all of my lxd servers are stuck on ‘Copy snap “lxd” data’.

root@dev-physical-0-19:~# snap list
Name           Version      Rev    Tracking         Publisher   Notes
core           16-2.55.5    13250  latest/stable    canonical✓  core
core18         20220428     2409   latest/stable    canonical✓  base
core20         20220512     1494   latest/stable    canonical✓  base
distrobuilder  2.1          1364   latest/stable    stgraber    classic
lxd            5.1-1f6f485  23037  latest/stable/…  canonical✓  disabled
snapd          2.55.5       15904  latest/stable    canonical✓  snapd
root@dev-physical-0-19:~# snap changes
ID   Status  Spawn                   Ready               Summary
37   Undone  yesterday at 21:04 CST  today at 09:57 CST  Auto-refresh snap "lxd"
38   Done    today at 09:50 CST      today at 09:50 CST  Install "core" snap
39   Undone  today at 10:02 CST      today at 10:33 CST  Auto-refresh snap "lxd"
40   Done    today at 10:21 CST      today at 10:21 CST  Refresh all snaps: no updates
41   Undone  today at 10:40 CST      today at 10:59 CST  Refresh "lxd" snap
42   Done    today at 11:00 CST      today at 11:00 CST  Running service command
43   Done    today at 11:01 CST      today at 11:01 CST  Running service command
44   Undone  today at 11:14 CST      today at 11:23 CST  Refresh "lxd" snap
45   Doing   today at 11:24 CST      -                   Refresh "lxd" snap

root@dev-physical-0-19:~# snap tasks 45
Status  Spawn               Ready               Summary
Done    today at 11:24 CST  today at 11:24 CST  Ensure prerequisites for "lxd" are available
Done    today at 11:24 CST  today at 11:24 CST  Download snap "lxd" (23155) from channel "latest/stable/ubuntu-21.10"
Done    today at 11:24 CST  today at 11:24 CST  Fetch and check assertions for snap "lxd" (23155)
Done    today at 11:24 CST  today at 11:24 CST  Mount snap "lxd" (23155)
Done    today at 11:24 CST  today at 11:24 CST  Run pre-refresh hook of "lxd" snap if present
Done    today at 11:24 CST  today at 11:24 CST  Stop snap "lxd" services
Done    today at 11:24 CST  today at 11:24 CST  Remove aliases for snap "lxd"
Done    today at 11:24 CST  today at 11:24 CST  Make current revision for snap "lxd" unavailable
Doing   today at 11:24 CST  -                   Copy snap "lxd" data
Do      today at 11:24 CST  -                   Setup snap "lxd" (23155) security profiles
Do      today at 11:24 CST  -                   Make snap "lxd" (23155) available to the system
Do      today at 11:24 CST  -                   Automatically connect eligible plugs and slots of snap "lxd"
Do      today at 11:24 CST  -                   Set automatic aliases for snap "lxd"
Do      today at 11:24 CST  -                   Setup snap "lxd" aliases
Do      today at 11:24 CST  -                   Run post-refresh hook of "lxd" snap if present
Do      today at 11:24 CST  -                   Start snap "lxd" (23155) services
Do      today at 11:24 CST  -                   Remove data for snap "lxd" (23001)
Do      today at 11:24 CST  -                   Remove snap "lxd" (23001) from the system
Do      today at 11:24 CST  -                   Clean up "lxd" (23155) install
Do      today at 11:24 CST  -                   Run configure hook of "lxd" snap if present
Do      today at 11:24 CST  -                   Run health check of "lxd" snap
Doing   today at 11:24 CST  -                   Handling re-refresh of "lxd" as needed

I restart the snapd service, but it’s not working, after restart, it still stuck on refreshing the lxd, any ideas?

some information:

root@dev-physical-0-19:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 21.10
Release:        21.10
Codename:       impish
root@dev-physical-0-19:~# uname -a
Linux dev-physical-0-19 5.13.0-30-generic #33-Ubuntu SMP Fri Feb 4 17:03:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@dev-physical-0-19:~# systemctl status snapd
● snapd.service - Snap Daemon
     Loaded: loaded (/lib/systemd/system/snapd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-06-01 11:24:02 CST; 3h 36min ago
TriggeredBy: ● snapd.socket
   Main PID: 856052 (snapd)
      Tasks: 63 (limit: 462584)
     Memory: 425.6M
     CGroup: /system.slice/snapd.service
             ├─ 732061 sync
             ├─ 783222 sync
             ├─ 856052 /usr/lib/snapd/snapd
             ├─ 868088 sync
             ├─1102884 sync
             ├─3440815 sync
             ├─3447357 sync
             └─4164420 sync

Jun 01 11:24:02 dev-physical-0-19 snapd[856052]: patch.go:63: Patching system state level 6 to sublevel 2...
Jun 01 11:24:02 dev-physical-0-19 snapd[856052]: patch.go:63: Patching system state level 6 to sublevel 3...
Jun 01 11:24:02 dev-physical-0-19 snapd[856052]: daemon.go:247: started snapd/2.55.5 (series 16; classic) ubuntu/21.10 (amd64) linux/5.13.0-30-generic.
Jun 01 11:24:02 dev-physical-0-19 snapd[856052]: daemon.go:340: adjusting startup timeout by 1m0s (pessimistic estimate of 30s plus 5s per snap)
Jun 01 11:24:02 dev-physical-0-19 systemd[1]: Started Snap Daemon.
Jun 01 11:24:35 dev-physical-0-19 snapd[856052]: services.go:919: RemoveSnapServices - disabling snap.lxd.activate.service
Jun 01 11:24:35 dev-physical-0-19 snapd[856052]: services.go:895: RemoveSnapServices - socket snap.lxd.user-daemon.unix.socket
Jun 01 11:24:35 dev-physical-0-19 snapd[856052]: services.go:919: RemoveSnapServices - disabling snap.lxd.user-daemon.service
Jun 01 11:24:35 dev-physical-0-19 snapd[856052]: services.go:895: RemoveSnapServices - socket snap.lxd.daemon.unix.socket
Jun 01 11:24:35 dev-physical-0-19 snapd[856052]: services.go:919: RemoveSnapServices - disabling snap.lxd.daemon.service

This looks like this snapd issue here:

Though the lxd service is down, but the containers are still running, as the lxd server is running on the production environment, I must act very carefully.

Can I reinstall the lxd service and import the data of containers? I am afraid to restart the server, if all the container are down and the lxd service can’t be started , it will be a disaster!!

Restarting snapd has been reported to fix the issue:

I already restarted the snapd a few hours ago, I have done it for serveral times, it’s not working :smiling_face_with_tear:.

Finally, I found out the primary cause, the sync of operating system is stucked, and he snap refresh is stucked by the sync operation.

root@dev-physical-0-19:~# systemctl status snapd
× snapd.service - Snap Daemon
     Loaded: loaded (/lib/systemd/system/snapd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2022-06-01 18:09:47 CST; 36min ago
TriggeredBy: ○ snapd.socket
    Process: 951327 ExecStart=/usr/lib/snapd/snapd (code=exited, status=1/FAILURE)
   Main PID: 951327 (code=exited, status=1/FAILURE)
      Tasks: 7 (limit: 462584)
     Memory: 2.0M
     CGroup: /system.slice/snapd.service
             ├─ 732061 sync
             ├─ 783222 sync
             ├─ 868088 sync
             ├─1102884 sync
             ├─3440815 sync
             ├─3447357 sync
             └─4164420 sync

And the sync is I/O opertion, it can’t be killed:

root@dev-physical-0-19:~# kill -9 732061 783222 868088 1102884 3440815 3447357 4164420 
root@dev-physical-0-19:~# ps aux | grep sync
systemd+  107947  0.0  0.0  88004  6736 ?        Ssl  May10   0:05 /lib/systemd/systemd-timesyncd
guest     153721  0.6  0.0 1343048 22784 ?       Sl   May31  12:35 ./bin/sync_center
root      732061  0.0  0.0   5752   972 ?        D    11:17   0:00 sync
root      767675  0.0  0.0   5752   948 ?        D    11:19   0:00 sync
root      783222  0.0  0.0   5752   972 ?        D    11:20   0:00 sync
root      849466  0.0  0.0   5752   976 ?        D    18:04   0:00 sync
root      868088  0.0  0.0   5752   928 ?        D    11:24   0:00 sync
root     1102884  0.0  0.0   5752   932 ?        D    May31   0:00 sync

The Memory of my physcial server is very large:

root@dev-physical-0-19:/data/lxc/containers# free -h 
               total        used        free      shared  buff/cache   available
Mem:           376Gi       106Gi        34Gi        27Gi       236Gi       241Gi
Swap:             0B          0B          0B

I had manually release the cache, but the sync process still exists:

root@dev-physical-0-19:~# echo 1 > /proc/sys/vm/drop_caches
root@dev-physical-0-19:~# echo 2 > /proc/sys/vm/drop_caches
root@dev-physical-0-19:~# echo 3 > /proc/sys/vm/drop_caches
root@dev-physical-0-19:~# 
root@dev-physical-0-19:~# systemctl status snapd
× snapd.service - Snap Daemon
     Loaded: loaded (/lib/systemd/system/snapd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2022-06-01 18:09:47 CST; 35min ago
TriggeredBy: ○ snapd.socket
    Process: 951327 ExecStart=/usr/lib/snapd/snapd (code=exited, status=1/FAILURE)
   Main PID: 951327 (code=exited, status=1/FAILURE)
      Tasks: 7 (limit: 462584)
     Memory: 2.0M
     CGroup: /system.slice/snapd.service
             ├─ 732061 sync
             ├─ 783222 sync
             ├─ 868088 sync
             ├─1102884 sync
             ├─3440815 sync
             ├─3447357 sync
             └─4164420 sync

Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Failed with result 'exit-code'.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 1102884 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 3440815 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 3447357 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 4164420 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 732061 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 783222 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Unit process 868088 (sync) remains running after unit stopped.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: Stopped Snap Daemon.
Jun 01 18:09:47 dev-physical-0-19 systemd[1]: snapd.service: Triggering OnFailure= dependencies.

root@dev-physical-0-19:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           376Gi       130Gi       206Gi        19Gi        39Gi       225Gi
Swap:             0B          0B          0B

root@dev-physical-0-19:~# systemctl start snapd 
root@dev-physical-0-19:~# systemctl status snapd
● snapd.service - Snap Daemon
     Loaded: loaded (/lib/systemd/system/snapd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-06-01 19:02:23 CST; 10s ago
TriggeredBy: ● snapd.socket
   Main PID: 2106649 (snapd)
      Tasks: 51 (limit: 462584)
     Memory: 96.1M
     CGroup: /system.slice/snapd.service
             ├─ 732061 sync
             ├─ 783222 sync
             ├─ 868088 sync
             ├─1102884 sync
             ├─2106649 /usr/lib/snapd/snapd
             ├─2111485 sync
             ├─3440815 sync
             ├─3447357 sync
             └─4164420 sync

Jun 01 19:02:19 dev-physical-0-19 snapd[2106649]: AppArmor status: apparmor is enabled and all features are available
Jun 01 19:02:20 dev-physical-0-19 snapd[2106649]: AppArmor status: apparmor is enabled and all features are available
Jun 01 19:02:20 dev-physical-0-19 snapd[2106649]: overlord.go:263: Acquiring state lock file
Jun 01 19:02:20 dev-physical-0-19 snapd[2106649]: overlord.go:268: Acquired state lock file
Jun 01 19:02:21 dev-physical-0-19 snapd[2106649]: patch.go:63: Patching system state level 6 to sublevel 1...
Jun 01 19:02:21 dev-physical-0-19 snapd[2106649]: patch.go:63: Patching system state level 6 to sublevel 2...
Jun 01 19:02:21 dev-physical-0-19 snapd[2106649]: patch.go:63: Patching system state level 6 to sublevel 3...
Jun 01 19:02:21 dev-physical-0-19 snapd[2106649]: daemon.go:247: started snapd/2.55.5 (series 16; classic) ubuntu/21.10 (amd64) linux/5.13.0-30-generic.
Jun 01 19:02:21 dev-physical-0-19 snapd[2106649]: daemon.go:340: adjusting startup timeout by 1m0s (pessimistic estimate of 30s plus 5s per snap)
Jun 01 19:02:23 dev-physical-0-19 systemd[1]: Started Snap Daemon.

root@dev-physical-0-19:~# snap logs lxd
2022-06-01T11:00:38+08:00 systemd[1]: Starting Service for snap application lxd.activate...
2022-06-01T11:00:38+08:00 lxd.activate[380563]: The LXD snap was unable to run aa-exec, this usually indicates a LXD sideload.
2022-06-01T11:00:38+08:00 lxd.activate[380563]: When sideloading, make sure to manually connect all interfaces.
2022-06-01T11:00:38+08:00 systemd[1]: snap.lxd.activate.service: Deactivated successfully.
2022-06-01T11:00:38+08:00 systemd[1]: Finished Service for snap application lxd.activate.
2022-06-01T11:23:23+08:00 systemd[1]: Starting Service for snap application lxd.activate...
2022-06-01T11:23:24+08:00 lxd.activate[844538]: The LXD snap was unable to run aa-exec, this usually indicates a LXD sideload.
2022-06-01T11:23:24+08:00 lxd.activate[844538]: When sideloading, make sure to manually connect all interfaces.
2022-06-01T11:23:24+08:00 systemd[1]: snap.lxd.activate.service: Deactivated successfully.
2022-06-01T11:23:24+08:00 systemd[1]: Finished Service for snap application lxd.activate.
root@dev-physical-0-19:~# snap changes
ID   Status  Spawn                   Ready               Summary
37   Undone  yesterday at 21:04 CST  today at 09:57 CST  Auto-refresh snap "lxd"
38   Done    today at 09:50 CST      today at 09:50 CST  Install "core" snap
39   Undone  today at 10:02 CST      today at 10:33 CST  Auto-refresh snap "lxd"
40   Done    today at 10:21 CST      today at 10:21 CST  Refresh all snaps: no updates
41   Undone  today at 10:40 CST      today at 10:59 CST  Refresh "lxd" snap
42   Done    today at 11:00 CST      today at 11:00 CST  Running service command
43   Done    today at 11:01 CST      today at 11:01 CST  Running service command
44   Undone  today at 11:14 CST      today at 11:23 CST  Refresh "lxd" snap
45   Doing   today at 11:24 CST      -                   Refresh "lxd" snap

root@dev-physical-0-19:~# snap tasks 45
Status  Spawn               Ready               Summary
Done    today at 11:24 CST  today at 11:24 CST  Ensure prerequisites for "lxd" are available
Done    today at 11:24 CST  today at 11:24 CST  Download snap "lxd" (23155) from channel "latest/stable/ubuntu-21.10"
Done    today at 11:24 CST  today at 11:24 CST  Fetch and check assertions for snap "lxd" (23155)
Done    today at 11:24 CST  today at 11:24 CST  Mount snap "lxd" (23155)
Done    today at 11:24 CST  today at 11:24 CST  Run pre-refresh hook of "lxd" snap if present
Done    today at 11:24 CST  today at 11:24 CST  Stop snap "lxd" services
Done    today at 11:24 CST  today at 11:24 CST  Remove aliases for snap "lxd"
Done    today at 11:24 CST  today at 11:24 CST  Make current revision for snap "lxd" unavailable
Doing   today at 11:24 CST  -                   Copy snap "lxd" data
Do      today at 11:24 CST  -                   Setup snap "lxd" (23155) security profiles
Do      today at 11:24 CST  -                   Make snap "lxd" (23155) available to the system
Do      today at 11:24 CST  -                   Automatically connect eligible plugs and slots of snap "lxd"
Do      today at 11:24 CST  -                   Set automatic aliases for snap "lxd"
Do      today at 11:24 CST  -                   Setup snap "lxd" aliases
Do      today at 11:24 CST  -                   Run post-refresh hook of "lxd" snap if present
Do      today at 11:24 CST  -                   Start snap "lxd" (23155) services
Do      today at 11:24 CST  -                   Remove data for snap "lxd" (23001)
Do      today at 11:24 CST  -                   Remove snap "lxd" (23001) from the system
Do      today at 11:24 CST  -                   Clean up "lxd" (23155) install
Do      today at 11:24 CST  -                   Run configure hook of "lxd" snap if present
Do      today at 11:24 CST  -                   Run health check of "lxd" snap
Doing   today at 11:24 CST  -                   Handling re-refresh of "lxd" as needed

If I restart the physcail server, it will be help or not? Any solutions?

A system restart should solve it too. But I think those logs would be useful to the snapd team to try and solve this issue, can you post them to Bug #1943419 “LXD snap auto refresh stuck at copy snap data phas...” : Bugs : snapd as well please?

ok, I already post my comment, wish that will help them to fix the problem.

1 Like

sync processes stuck for a long time in I/O wait (D) state is typically the sign of a kernel issue. There’s no way to kill those processes so a service restart cannot succeed.