My previous disks used ext4 as the base filesystem, the new disks use zfs.
However, in both cases I created a btrfs pool for docker, as per the guide.
On the new disks, however, I cannot get docker to successfully run. When it tries to pull an image, it gets stuck on extracting, and the LXD daemon hangs for that container.
What information can I provide to help troubleshoot please?
uname -a
Linux server 6.5.0-21-generic #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
lxd --version
5.20
It seems the issue might be filesystem related? The docker pool usage never gets above 18MiB, and docker seems to get stuck either on downloading or extracting.
htop shows containerd in the D (disk sleep) state, and a CPU core is maxed out to 100%
The container is idmapped, and from recollection on the old system I had enabled shiftfs for this. Could this be the issue? I read a post suggesting that shiftfs was no longer needed, as idmapping was supported directly in the kernel, but does this work for docker using overlay2?
Yeah, this forum is for Linux Container projects. LXD was forked a while ago. The fork is called Incus.
I think your best bet is to use this as an opportunity to migrate to Incus. You will get a lot of support in this forum for Incus questions and issues. You will also benefit from all the cool new features that have been added to recent versions of Incus. There is also the issue of images. In the near future LXD will not have access to the image servers that the Linux Containers project runs.
If you want to stay with LXD, then the Ubuntu forums is the place to ask questions. Canonical pulled LXD out of the Linux Containers project. This is what lead to the fork.
incus storage list
+--------+--------+---------------------------------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+--------+--------+---------------------------------+-------------+---------+---------+
| docker | btrfs | /var/lib/incus/disks/docker.img | | 1 | CREATED |
+--------+--------+---------------------------------+-------------+---------+---------+
| lxd | zfs | rpool/lxd | | 10 | CREATED |
+--------+--------+---------------------------------+-------------+---------+---------+
journalctl -o short-precise -k -b -1
...
Feb 25 20:50:08.876589 server kernel: tmpfs: Bad value for 'uid'
Feb 25 20:51:02.372538 server kernel: INFO: task kworker/u16:7:222 blocked for more than 362 seconds.
Feb 25 20:51:02.372631 server kernel: Tainted: P O 6.5.0-21-generic #21~22.04.1-Ubuntu
Feb 25 20:51:02.372652 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 25 20:51:02.372668 server kernel: task:kworker/u16:7 state:D stack:0 pid:222 ppid:2 flags:0x00004000
Feb 25 20:51:02.372685 server kernel: Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
Feb 25 20:51:02.372698 server kernel: Call Trace:
Feb 25 20:51:02.372713 server kernel: <TASK>
Feb 25 20:51:02.372727 server kernel: __schedule+0x2cc/0x750
Feb 25 20:51:02.372741 server kernel: schedule+0x63/0x110
Feb 25 20:51:02.372753 server kernel: schedule_preempt_disabled+0x15/0x30
Feb 25 20:51:02.372781 server kernel: __mutex_lock.constprop.0+0x3f8/0x7a0
Feb 25 20:51:02.372799 server kernel: __mutex_lock_slowpath+0x13/0x20
Feb 25 20:51:02.372811 server kernel: mutex_lock+0x3c/0x50
Feb 25 20:51:02.372824 server kernel: btrfs_start_delalloc_roots+0xb3/0x2c0 [btrfs]
Feb 25 20:51:02.372837 server kernel: shrink_delalloc+0x116/0x2c0 [btrfs]
Feb 25 20:51:02.372851 server kernel: ? __btrfs_end_transaction+0x102/0x250 [btrfs]
Feb 25 20:51:02.372864 server kernel: flush_space+0x172/0x2e0 [btrfs]
Feb 25 20:51:02.372877 server kernel: btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs]
Feb 25 20:51:02.372889 server kernel: process_one_work+0x23d/0x450
Feb 25 20:51:02.372904 server kernel: worker_thread+0x50/0x3f0
Feb 25 20:51:02.372918 server kernel: ? __pfx_worker_thread+0x10/0x10
Feb 25 20:51:02.372930 server kernel: kthread+0xef/0x120
Feb 25 20:51:02.372943 server kernel: ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.372965 server kernel: ret_from_fork+0x44/0x70
Feb 25 20:51:02.372980 server kernel: ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.372993 server kernel: ret_from_fork_asm+0x1b/0x30
Feb 25 20:51:02.373005 server kernel: </TASK>
Feb 25 20:51:02.373019 server kernel: INFO: task kworker/u16:1:21882 blocked for more than 362 seconds.
Feb 25 20:51:02.373033 server kernel: Tainted: P O 6.5.0-21-generic #21~22.04.1-Ubuntu
Feb 25 20:51:02.373045 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 25 20:51:02.373058 server kernel: task:kworker/u16:1 state:D stack:0 pid:21882 ppid:2 flags:0x00004000
Feb 25 20:51:02.373072 server kernel: Workqueue: writeback wb_workfn (flush-btrfs-2)
Feb 25 20:51:02.373084 server kernel: Call Trace:
Feb 25 20:51:02.373096 server kernel: <TASK>
Feb 25 20:51:02.373108 server kernel: __schedule+0x2cc/0x750
Feb 25 20:51:02.373120 server kernel: ? mempool_alloc_slab+0x15/0x20
Feb 25 20:51:02.373144 server kernel: ? __pfx_wbt_inflight_cb+0x10/0x10
Feb 25 20:51:02.373159 server kernel: schedule+0x63/0x110
Feb 25 20:51:02.373171 server kernel: io_schedule+0x46/0x80
Feb 25 20:51:02.373188 server kernel: rq_qos_wait+0xc1/0x160
Feb 25 20:51:02.373202 server kernel: ? __pfx_wbt_cleanup_cb+0x10/0x10
Feb 25 20:51:02.373215 server kernel: ? __pfx_rq_qos_wake_function+0x10/0x10
Feb 25 20:51:02.373228 server kernel: ? __pfx_wbt_inflight_cb+0x10/0x10
Feb 25 20:51:02.373240 server kernel: wbt_wait+0xb3/0x100
Feb 25 20:51:02.373254 server kernel: __rq_qos_throttle+0x25/0x40
Feb 25 20:51:02.373266 server kernel: blk_mq_get_new_requests+0xcc/0x190
Feb 25 20:51:02.373279 server kernel: blk_mq_submit_bio+0x352/0x570
Feb 25 20:51:02.373290 server kernel: __submit_bio+0xb3/0x1c0
Feb 25 20:51:02.373311 server kernel: submit_bio_noacct_nocheck+0x13c/0x1f0
Feb 25 20:51:02.373326 server kernel: submit_bio_noacct+0x17c/0x5f0
Feb 25 20:51:02.373338 server kernel: submit_bio+0x6c/0x80
Feb 25 20:51:02.373351 server kernel: btrfs_submit_dev_bio+0xf9/0x1e0 [btrfs]
Feb 25 20:51:02.373364 server kernel: __btrfs_submit_bio+0x12f/0x170 [btrfs]
Feb 25 20:51:02.373379 server kernel: btrfs_submit_chunk+0x166/0x530 [btrfs]
Feb 25 20:51:02.373391 server kernel: btrfs_submit_bio+0x1b/0x30 [btrfs]
Feb 25 20:51:02.373405 server kernel: submit_one_bio+0x3a/0x60 [btrfs]
Feb 25 20:51:02.373417 server kernel: extent_writepages+0xe6/0x130 [btrfs]
Feb 25 20:51:02.373429 server kernel: ? __pfx_end_bio_extent_writepage+0x10/0x10 [btrfs]
Feb 25 20:51:02.373443 server kernel: btrfs_writepages+0xe/0x20 [btrfs]
Feb 25 20:51:02.373455 server kernel: do_writepages+0xcd/0x1e0
Feb 25 20:51:02.373468 server kernel: __writeback_single_inode+0x44/0x290
Feb 25 20:51:02.373486 server kernel: writeback_sb_inodes+0x218/0x500
Feb 25 20:51:02.373502 server kernel: __writeback_inodes_wb+0x54/0x100
Feb 25 20:51:02.373516 server kernel: ? queue_io+0x115/0x120
Feb 25 20:51:02.373528 server kernel: wb_writeback+0x2a8/0x320
Feb 25 20:51:02.373547 server kernel: wb_do_writeback+0x1f1/0x2a0
Feb 25 20:51:02.373561 server kernel: wb_workfn+0x5f/0x230
Feb 25 20:51:02.373575 server kernel: ? finish_task_switch.isra.0+0x85/0x2a0
Feb 25 20:51:02.373590 server kernel: ? __schedule+0x2d4/0x750
Feb 25 20:51:02.373604 server kernel: process_one_work+0x23d/0x450
Feb 25 20:51:02.373617 server kernel: worker_thread+0x50/0x3f0
Feb 25 20:51:02.373629 server kernel: ? __pfx_worker_thread+0x10/0x10
Feb 25 20:51:02.373641 server kernel: kthread+0xef/0x120
Feb 25 20:51:02.373654 server kernel: ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.373673 server kernel: ret_from_fork+0x44/0x70
Feb 25 20:51:02.373687 server kernel: ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.373700 server kernel: ret_from_fork_asm+0x1b/0x30
Feb 25 20:51:02.373713 server kernel: </TASK>
The above is spammed multiple times in the lead-up to me shutting down the system.
Interesting that this has only surfaced after rebuilding the system on new disks. Essentially the new system is a replica other than the base filesystem being zfs rather than ext4 on mdadm+lvm
Anywhere I can go to figure this out, do you know? Thanks
This may not be a wise approach, but I simply removed the btrfs disk from the container, and just using the zfs backing filesystem directly. Docker now by default uses overlay2 with zfs.