playing around with a TrueNAS scale 24.04 (Debian Bookworm) if I can install and run Incus on it by using their developer mode.
Installation works as expected by following the Zably instructions. Able to launch a new container and it starts as usual:
+--------+---------+-----------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+--------+---------+-----------------------+------+-----------+-----------+
| gentoo | RUNNING | 10.108.180.218 (eth0) | | CONTAINER | 0 |
+--------+---------+-----------------------+------+-----------+-----------+
incus exec gentoo bash:
bash: /root/.bashrc: Permission denied
gentto ~ #
Issue is as soon as I try to jump on that new container I get the Permission denied error above. Using Gentoo is just one example other distros report exactly the same error. Sometimes they even donât get an IP address or other services fail to start. Changing the security settings to priviledged = true make the issue going away but that is a bad workaround
Given that everything is working in priviledged mode it seems like something is missing or need to be configured to allow unpriviledged mode. Kind of have the feeling it has something todo with ID mapping but Iâm not quite sure how to debug this further. Hope someone can give me a hint to solve it.
yes I get a shell and can perform basic stuff but as soon as I try to start any additional services they also report a âpermission deniedâ. It actually starts already during container boot, take a look at the first lines of console.log:
INIT: version 3.09 booting
OpenRC 0.54 is starting up Gentoo Linux (x86_64) [LXC]
* Mounting /run ... [ ok ]
* Caching service dependencies ... [ ok ]
mount: /sys/fs/cgroup: none already mounted on /dev.
dmesg(1) may have more information after failed mount system call.
/etc/init.d/cgroups: line 92: echo: write error: Device or resource busy
[ !! ]
* ERROR: sysctl failed to start
* Creating user login records ... [ ok ]
* Wiping /tmp directory ... [ ok ]
* Bringing up network interface lo ...RTNETLINK answers: File exists
[ ok ]
* Updating /etc/mtab ... * Creating mtab symbolic link
[ ok ]
* Create Volatile Files and Directories ... [ ok ]
Starts perfectly fine as it should. Now look at the output of the failing instance:
INIT: version 3.09 booting
OpenRC 0.54 is starting up Gentoo Linux (x86_64) [LXC]
* Mounting /run ... [ ok ]
* Caching service dependencies ... [ ok ]
mount: /sys/fs/cgroup: none already mounted on /dev.
dmesg(1) may have more information after failed mount system call.
/etc/init.d/cgroups: line 92: echo: write error: Device or resource busy
[ !! ]
* ERROR: sysctl failed to start
mkdir: cannot create directory ?~@~X/var/lib/misc?~@~Y: Permission denied
* failed to create needed directory /var/lib/misc
[ ok ]
* Updating /etc/mtab ... * /etc is not writable; unable to create /etc/mtab
[ !! ]
* Create Volatile Files and Directories ... [ ok ]
INIT: Entering runlevel: 3
[ !! ]
* ERROR: sysctl failed to start
mkdir: cannot create directory ?~@~X/var/lib/misc?~@~Y: Permission denied
* failed to create needed directory /var/lib/misc
Reduced the output to show the differences. The second log contains a lot of more âPermission deniedâ lines. So the issue starts already during container start and each OS fails at different places.
As mentioned before something needs to be re-configured on the OS. Checked both /etc/subuid and /etc/subgid they contain
root:1000000:1000000000
However, it might be that the host system doesnât read or ignore them? But why can the container be started using:
You would run here lxc-checkconfig (install LXC), which is a script that shows if something big is missing and does not allow Incus (or LXC) to function properly.
Then, post the output.
As requested here is the output of lxc-checkconfig:
root@truenas[~]# lxc-checkconfig
LXC version 5.0.2
Kernel configuration not found at /proc/config.gz; searching...
Kernel configuration found at /boot/config-6.6.20-production+truenas
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
--- Control groups ---
Cgroups: enabled
Cgroup namespace: enabled
Cgroup v1 mount points:
Cgroup v2 mount points:
- /sys/fs/cgroup
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled
--- Misc ---
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, not loaded
FUSE (for use with lxcfs): enabled, not loaded
--- Checkpoint/Restore ---
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities: enabled
Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /bin/lxc-checkconfig
No errors from what I can see. Meanwhile I installed the TrueNAs RC1 version and there everything is working perfectly as designed. In other words something has changed between RC1 and release version.
Both are basic installs using âDIRâ as storage driver which should rule out any issues in this direction.
Running a few more tests and comparing the working and non working systems it finally turns out the issue is related to the ZFS kernel module. The working Bookworm system is a standard netinstall using default settings (single boot partition) where as TrueNAS has multiple ZFS partitions. So they are not the same but got me one step further checking deeper on ZFS compatibility. I came across the following forum posts:
Which was pointing me in the right direction to update the zfs module on the non working system. TrueNAS was released with version 2.2.3-1 which should contain full id-mapping support but doesnât obviously work with incus (properly because of TrueNAS modifications). So I followed the instructions from @stgraber at ZFS builds which installed 2.2.4, rebooted and the permissions denied issue was gone.
Success, now I have a working TrueNAS with Incus LTS which is pretty cool!
Leaves one obvious question what is the difference between @stgraber zfs sources compared the TrueNAS tree? ID mapping is still a new feature in zfs and as such it will take some more time for stabilisation? May be @stgraber can give some useful input which area to concentrate on to find the needle in the haystack.
TrueNAS or better IX-Systems made a lot of improvements on ZFS to allow better memory allocation / usage and properly more for their needs. One of these changes broke something in ID mapping.
Properly stay on upstream sources for now as it still all seem to work.
There are indeed a lot of changes in the TrueNAS ZFS sources. Was able to track it down to a security function they have added which isnât fully namespace aware or better doesnât work as expected. Changed one line of code to use the incoming correct variable from the function call made it work again. Will watch their upcoming release if this will be solved.
Sure, Iâm happy to share what the issue is and how I have fixed it (workaround).
To be clear this is related to TrueNAS (IX-Systems) ZFS sources. Main stream ZFS works last time I tested it.
As mentioned there are a lot of specific security / permissions enhancements in the TrueNAS ZFS sources to support their advanced usage. The issue boils down to one function âzpl_permissionâ in file module/os/linux/zfs/zpl_xattr.c lines 1517-1544:
/*
* If NFSv4 ACLs are not being used, go back to
* generic_permission(). If ACL is trivial and the
* mask is representable by POSIX permissions, then
* also go back to generic_permission().
*/
if ((ITOZSB(ip)->z_acl_type != ZFS_ACLTYPE_NFSV4) ||
((ITOZ(ip)->z_pflags & ZFS_ACL_TRIVIAL && GENERIC_MASK(mask)))) {
#if (defined(HAVE_IOPS_PERMISSION_USERNS) || \
defined(HAVE_IOPS_PERMISSION_IDMAP))
/*
* For some reason generic_permission doen't work for namespace mounts at this level
* just allow access to avoid perm denied issue
*/
if (idmap != &nop_mnt_idmap) {
return (generic_permission(idmap, ip, mask));
}
return (generic_permission(zfs_init_idmap, ip, mask));
#else
return (generic_permission(ip, mask));
#endif
}
for (i = 0; i < ARRAY_SIZE(mask2zfs); i++) {
if (mask & mask2zfs[i].kmask) {
to_check |= mask2zfs[i].zfsperm;
}
}
/*
* We're being asked to check something that doesn't contain an
* NFSv4 ACE. Pass back to default kernel permissions check.
*/
if (to_check == 0) {
#if (defined(HAVE_IOPS_PERMISSION_USERNS) || \
defined(HAVE_IOPS_PERMISSION_IDMAP))
/*
* For some reason generic_permission doen't work for namespace mounts at this level
* just allow access to avoid perm denied issue
*/
if (idmap != &nop_mnt_idmap) {
return (generic_permission(idmap, ip, mask));
}
return (generic_permission(zfs_init_idmap, ip, mask));
#else
return (generic_permission(ip, mask));
#endif
}
I put a comment into the code to remember why I put it in. After this change all is working nicely again. Iâm far away from being an expert on this but form my understanding the original code doesnât honor the âuser_namespaceâ if it is passed in if it makes sense in that way?