Weird error for zfs storage backend

Hello everyone,

First, I’m having a fantastic time using lxd so far and I’d like to express some gratitude to everyone who’s working on it and answers questions on this forum.

I just installed NixOS on my new notebook (XPS 13 7390 2-in-1) and had to tweak some things to get my machine working, like using a newer kernel (5.3.7). When I set up LXD and tried to launch a container, I got the following error:

$ lxc launch ubuntu:18.04 test-container
Error: Failed instance creation: Create container from image: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/lxd-zpool/images/309080474/rootfs -n /var/lib/lxd/images/d6f281a2e523674bcd9822f3f61be337c51828fb0dc94c8a200ab216d12a0fff.rootfs: FATAL ERROR:write_file: failed to create file /var/lib/lxd/storage-pools/lxd-zpool/images/309080474/rootfs/usr/lib/git-core/git-credential-cache--daemon, because Too many open files.

The docs mentioned something about “Too many open files” errors, so I adjusted my ulimits like so, but to no avail:

$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       127490
-n: file descriptors                1048576
-l: locked-in-memory size (kbytes)  unlimited
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 127490
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

Switching the storage backend to dir fixes my problem, but I’d love to use zfs. Does anybody have a clue of what’s wrong here?

Here’s my lxd preseed:

config: {}
cluster: null
networks:
- config:
    ipv4.address: auto
    ipv6.address: none
  description: ""
  managed: false
  name: lxdbr0
  type: ""
storage_pools:
- name: lxd-zpool
  driver: zfs
  config:
    source: rpool/root/lxd
profiles:
- config:
    environment.TZ: Asia/Tokyo
  description: ""
  devices:
    eth0:
      name: eth0
      nictype: bridged
      parent: lxdbr0
      type: nic
    root:
      path: /
      pool: lxd-zpool
      type: disk
  name: default

and zfs info:

$ zfs list
NAME                              USED  AVAIL     REFER  MOUNTPOINT
rpool                            94.3G   828G       96K  none
rpool/home                       63.1G   828G     63.1G  legacy
rpool/root                       31.2G   828G       96K  none
rpool/root/lxd                    672K   828G       96K  none
rpool/root/lxd/containers          96K   828G       96K  none
rpool/root/lxd/custom              96K   828G       96K  none
rpool/root/lxd/custom-snapshots    96K   828G       96K  none
rpool/root/lxd/deleted             96K   828G       96K  none
rpool/root/lxd/images              96K   828G       96K  none
rpool/root/lxd/snapshots           96K   828G       96K  none
rpool/root/nixos                 31.2G   828G     31.2G  legacy
$ zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool   952G  94.3G   858G        -         -     0%     9%  1.00x    ONLINE  -

Hiya!

To increase the number of open files, you need to set this on a system level (perhaps in /etc/security/limits.conf) and not on a user level. You may run lxc from your user account, but the LXD hypervisor does not inherit your user account settings.

There is some discussion on this at Not able to set ulimit ( max open files) inside vms (but was not marked as solved, so you need to read it through).

To increase the number of open files, you need to set this on a system level (perhaps in /etc/security/limits.conf) and not on a user level.

Thank you! I was aware of this, so I (thought I) set this parameter across the system, but it turns out I was setting the wrong thing; fixing it fixed my problem.

For NixOS users: the correct place to fix the ulimit is not security.pam.loginLimits, but systemd.extraConfig, as lxd runs as a systemd service. My config looks like the following;

  # c.f. https://github.com/lxc/lxd/blob/master/doc/production-setup.md
  boot.kernel.sysctl = {
    "fs.inotify.max_queued_events" = 1048576;
    "fs.inotify.max_user_instances" = 1048576;
    "fs.inotify.max_user_watches" = 1048576;
  };
  systemd.extraConfig = ''
    DefaultLimitNOFILE=1048576
    DefaultLimitMEMLOCK=infinity
  '';