Docker not working in Incus container

I’ve recently shifted my server to new disks, and as part of that have been migrating LXD containers.

On the previous system, I ran nested docker containers in LXD containers, and that worked fine.

I set it up as per the Ubuntu tutorial: https://ubuntu.com/tutorials/how-to-run-docker-inside-lxd-containers#3-install-docker

My previous disks used ext4 as the base filesystem, the new disks use zfs.

However, in both cases I created a btrfs pool for docker, as per the guide.

On the new disks, however, I cannot get docker to successfully run. When it tries to pull an image, it gets stuck on extracting, and the LXD daemon hangs for that container.

What information can I provide to help troubleshoot please?

I also seem a bit dense. I’ve had such a smooth ride with LXD for so long I didn’t notice it had been pulled from the Linux containers project…

Will support questions still be answered here or should I head over to the Ubuntu discussion forum?

uname -a
Linux server 6.5.0-21-generic #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb  9 13:32:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
lxd --version
5.20

It seems the issue might be filesystem related? The docker pool usage never gets above 18MiB, and docker seems to get stuck either on downloading or extracting.

htop shows containerd in the D (disk sleep) state, and a CPU core is maxed out to 100%

The container is idmapped, and from recollection on the old system I had enabled shiftfs for this. Could this be the issue? I read a post suggesting that shiftfs was no longer needed, as idmapping was supported directly in the kernel, but does this work for docker using overlay2?

Hi Greelan.

Yeah, this forum is for Linux Container projects. LXD was forked a while ago. The fork is called Incus.

I think your best bet is to use this as an opportunity to migrate to Incus. You will get a lot of support in this forum for Incus questions and issues. You will also benefit from all the cool new features that have been added to recent versions of Incus. There is also the issue of images. In the near future LXD will not have access to the image servers that the Linux Containers project runs.

If you want to stay with LXD, then the Ubuntu forums is the place to ask questions. Canonical pulled LXD out of the Linux Containers project. This is what lead to the fork.

Yeah, I’ve been reading since about Incus. Might make the move when it goes LTS. In the meantime, I need to sort my issue, so off the Ubuntu forums.

The first LTS of Incus will be in the next few months.

Soon you soon. :slight_smile:

Sooner than you think xD

I’ve done the migration. Exactly the same issue persists with Incus.

Eg if I do a pull in the container, first I get an error, then it just hangs:

root@ampdev:~# docker pull cubecoders/ampbase
Using default tag: latest
latest: Pulling from cubecoders/ampbase
1f7ce2fa46ab: Already exists 
4a500e9aa46f: Extracting [==================================================>]  230.4MB/230.4MB
8f9283f00ee5: Download complete 
failed to register layer: error creating overlay mount to /var/lib/docker/overlay2/69d3458cdcae5368e080ee150882a9c077fa89fd03952fae2e07a5a19d164aae/merged: no such file or directory

root@ampdev:~# docker pull cubecoders/ampbase
Using default tag: latest
latest: Pulling from cubecoders/ampbase
1f7ce2fa46ab: Pull complete 
4a500e9aa46f: Extracting [==============================================>    ]  213.9MB/230.4MB
8f9283f00ee5: Download complete 

htop in the container shows dockerd with D (disk sleep) status.

I have to do a hard reboot of the system to recover access to the container.

Hoping now for some hints xD

Cool. Could you change the title of your post? @stgraber, or another moderator, could you change the tags and category?

I am not sure what the issue is, but maybe someone else in the forum can help with the specifics.

I can’t change the title

When that happens, is there anything useful in dmesg on the host?

Maybe you showed it before but can you show incus config show --expanded NAME and also incus storage list ?

1 Like

Thanks Stéphane.

As requested:

incus config show --expanded amp dev
architecture: x86_64
config:
  boot.autostart: "true"
  image.architecture: amd64
  image.description: ubuntu 22.04 LTS amd64 (release) (20230302)
  image.label: release
  image.os: ubuntu
  image.release: jammy
  image.serial: "20230302"
  image.type: squashfs
  image.version: "22.04"
  security.idmap.base: "1524288"
  security.idmap.isolated: "true"
  security.nesting: "true"
  security.protection.delete: "true"
  security.syscalls.intercept.mknod: "true"
  security.syscalls.intercept.setxattr: "true"
  volatile.base_image: 72565f3fbae414d317b90569b6d7aa308c482fdf562aaf0c2eaa6e50fa39747b
  volatile.cloud-init.instance-id: c57a4852-c4cf-402e-abad-32873ab85b3c
  volatile.eth0.host_name: vethef963ad7
  volatile.idmap.base: "1524288"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1524288,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid"
:true,"Hostid":1524288,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1524288,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":tr
ue,"Hostid":1524288,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: adaf0fb6-b312-45cb-960c-c5ecc7c54185
  volatile.uuid.generation: adaf0fb6-b312-45cb-960c-c5ecc7c54185
devices:
  docker:
    path: /var/lib/docker
    pool: docker
    source: ampdev
    type: disk
  eth0:
    hwaddr: 00:16:3e:16:33:b1
    name: eth0
    nictype: bridged
    parent: br66
    security.mac_filtering: "true"
    type: nic
  root:
    path: /
    pool: lxd
    type: disk
ephemeral: false
profiles:
- default
- dmz
stateful: false
description: ""
incus storage list
+--------+--------+---------------------------------+-------------+---------+---------+
|  NAME  | DRIVER |             SOURCE              | DESCRIPTION | USED BY |  STATE  |
+--------+--------+---------------------------------+-------------+---------+---------+
| docker | btrfs  | /var/lib/incus/disks/docker.img |             | 1       | CREATED |
+--------+--------+---------------------------------+-------------+---------+---------+
| lxd    | zfs    | rpool/lxd                       |             | 10      | CREATED |
+--------+--------+---------------------------------+-------------+---------+---------+
journalctl -o short-precise -k -b -1
...
Feb 25 20:50:08.876589 server kernel: tmpfs: Bad value for 'uid'
Feb 25 20:51:02.372538 server kernel: INFO: task kworker/u16:7:222 blocked for more than 362 seconds.
Feb 25 20:51:02.372631 server kernel:       Tainted: P           O       6.5.0-21-generic #21~22.04.1-Ubuntu
Feb 25 20:51:02.372652 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 25 20:51:02.372668 server kernel: task:kworker/u16:7   state:D stack:0     pid:222   ppid:2      flags:0x00004000
Feb 25 20:51:02.372685 server kernel: Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
Feb 25 20:51:02.372698 server kernel: Call Trace:
Feb 25 20:51:02.372713 server kernel:  <TASK>
Feb 25 20:51:02.372727 server kernel:  __schedule+0x2cc/0x750
Feb 25 20:51:02.372741 server kernel:  schedule+0x63/0x110
Feb 25 20:51:02.372753 server kernel:  schedule_preempt_disabled+0x15/0x30
Feb 25 20:51:02.372781 server kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
Feb 25 20:51:02.372799 server kernel:  __mutex_lock_slowpath+0x13/0x20
Feb 25 20:51:02.372811 server kernel:  mutex_lock+0x3c/0x50
Feb 25 20:51:02.372824 server kernel:  btrfs_start_delalloc_roots+0xb3/0x2c0 [btrfs]
Feb 25 20:51:02.372837 server kernel:  shrink_delalloc+0x116/0x2c0 [btrfs]
Feb 25 20:51:02.372851 server kernel:  ? __btrfs_end_transaction+0x102/0x250 [btrfs]
Feb 25 20:51:02.372864 server kernel:  flush_space+0x172/0x2e0 [btrfs]
Feb 25 20:51:02.372877 server kernel:  btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs]
Feb 25 20:51:02.372889 server kernel:  process_one_work+0x23d/0x450
Feb 25 20:51:02.372904 server kernel:  worker_thread+0x50/0x3f0
Feb 25 20:51:02.372918 server kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 25 20:51:02.372930 server kernel:  kthread+0xef/0x120
Feb 25 20:51:02.372943 server kernel:  ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.372965 server kernel:  ret_from_fork+0x44/0x70
Feb 25 20:51:02.372980 server kernel:  ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.372993 server kernel:  ret_from_fork_asm+0x1b/0x30
Feb 25 20:51:02.373005 server kernel:  </TASK>
Feb 25 20:51:02.373019 server kernel: INFO: task kworker/u16:1:21882 blocked for more than 362 seconds.
Feb 25 20:51:02.373033 server kernel:       Tainted: P           O       6.5.0-21-generic #21~22.04.1-Ubuntu
Feb 25 20:51:02.373045 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 25 20:51:02.373058 server kernel: task:kworker/u16:1   state:D stack:0     pid:21882 ppid:2      flags:0x00004000     
Feb 25 20:51:02.373072 server kernel: Workqueue: writeback wb_workfn (flush-btrfs-2)
Feb 25 20:51:02.373084 server kernel: Call Trace:
Feb 25 20:51:02.373096 server kernel:  <TASK>
Feb 25 20:51:02.373108 server kernel:  __schedule+0x2cc/0x750 
Feb 25 20:51:02.373120 server kernel:  ? mempool_alloc_slab+0x15/0x20
Feb 25 20:51:02.373144 server kernel:  ? __pfx_wbt_inflight_cb+0x10/0x10
Feb 25 20:51:02.373159 server kernel:  schedule+0x63/0x110
Feb 25 20:51:02.373171 server kernel:  io_schedule+0x46/0x80
Feb 25 20:51:02.373188 server kernel:  rq_qos_wait+0xc1/0x160 
Feb 25 20:51:02.373202 server kernel:  ? __pfx_wbt_cleanup_cb+0x10/0x10
Feb 25 20:51:02.373215 server kernel:  ? __pfx_rq_qos_wake_function+0x10/0x10
Feb 25 20:51:02.373228 server kernel:  ? __pfx_wbt_inflight_cb+0x10/0x10
Feb 25 20:51:02.373240 server kernel:  wbt_wait+0xb3/0x100
Feb 25 20:51:02.373254 server kernel:  __rq_qos_throttle+0x25/0x40
Feb 25 20:51:02.373266 server kernel:  blk_mq_get_new_requests+0xcc/0x190
Feb 25 20:51:02.373279 server kernel:  blk_mq_submit_bio+0x352/0x570
Feb 25 20:51:02.373290 server kernel:  __submit_bio+0xb3/0x1c0
Feb 25 20:51:02.373311 server kernel:  submit_bio_noacct_nocheck+0x13c/0x1f0
Feb 25 20:51:02.373326 server kernel:  submit_bio_noacct+0x17c/0x5f0
Feb 25 20:51:02.373338 server kernel:  submit_bio+0x6c/0x80
Feb 25 20:51:02.373351 server kernel:  btrfs_submit_dev_bio+0xf9/0x1e0 [btrfs]
Feb 25 20:51:02.373364 server kernel:  __btrfs_submit_bio+0x12f/0x170 [btrfs]
Feb 25 20:51:02.373379 server kernel:  btrfs_submit_chunk+0x166/0x530 [btrfs]
Feb 25 20:51:02.373391 server kernel:  btrfs_submit_bio+0x1b/0x30 [btrfs]
Feb 25 20:51:02.373405 server kernel:  submit_one_bio+0x3a/0x60 [btrfs]
Feb 25 20:51:02.373417 server kernel:  extent_writepages+0xe6/0x130 [btrfs]
Feb 25 20:51:02.373429 server kernel:  ? __pfx_end_bio_extent_writepage+0x10/0x10 [btrfs]
Feb 25 20:51:02.373443 server kernel:  btrfs_writepages+0xe/0x20 [btrfs]
Feb 25 20:51:02.373455 server kernel:  do_writepages+0xcd/0x1e0
Feb 25 20:51:02.373468 server kernel:  __writeback_single_inode+0x44/0x290
Feb 25 20:51:02.373486 server kernel:  writeback_sb_inodes+0x218/0x500
Feb 25 20:51:02.373502 server kernel:  __writeback_inodes_wb+0x54/0x100
Feb 25 20:51:02.373516 server kernel:  ? queue_io+0x115/0x120 
Feb 25 20:51:02.373528 server kernel:  wb_writeback+0x2a8/0x320
Feb 25 20:51:02.373547 server kernel:  wb_do_writeback+0x1f1/0x2a0
Feb 25 20:51:02.373561 server kernel:  wb_workfn+0x5f/0x230
Feb 25 20:51:02.373575 server kernel:  ? finish_task_switch.isra.0+0x85/0x2a0
Feb 25 20:51:02.373590 server kernel:  ? __schedule+0x2d4/0x750
Feb 25 20:51:02.373604 server kernel:  process_one_work+0x23d/0x450
Feb 25 20:51:02.373617 server kernel:  worker_thread+0x50/0x3f0
Feb 25 20:51:02.373629 server kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 25 20:51:02.373641 server kernel:  kthread+0xef/0x120
Feb 25 20:51:02.373654 server kernel:  ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.373673 server kernel:  ret_from_fork+0x44/0x70
Feb 25 20:51:02.373687 server kernel:  ? __pfx_kthread+0x10/0x10
Feb 25 20:51:02.373700 server kernel:  ret_from_fork_asm+0x1b/0x30
Feb 25 20:51:02.373713 server kernel:  </TASK>

The above is spammed multiple times in the lead-up to me shutting down the system.

Okay, so you’re dealing with a btrfs kernel bug of some kind…

:smiling_face_with_tear:

Interesting that this has only surfaced after rebuilding the system on new disks. Essentially the new system is a replica other than the base filesystem being zfs rather than ext4 on mdadm+lvm

Anywhere I can go to figure this out, do you know? Thanks

You probably want to reboot your system, make sure the docker container doesn’t start, then do something like:

mount /var/lib/incus/disks/docker.img /mnt
btrfs scrub start /mnt

Or whatever the usual commands are to fully analyze and repair a filesystem.

Tried btrfs scrub, also tried btrfs check, no errors reported. I’d though expected it would be OK since it was created by LXD/Incus?

No difference unfortunately.

Guess I am stuck with this issue?

This may not be a wise approach, but I simply removed the btrfs disk from the container, and just using the zfs backing filesystem directly. Docker now by default uses overlay2 with zfs.

Docker now works fine. Fingers crossed.

1 Like

Are you using zfs 2.2?

Yes