How to access disk devices (e.g. /dev/xvdb) and partitions (e.g. /dev/xvdb1) from inside container?

prakashsurya · April 26, 2018, 10:14pm

Hi,

I’m trying to run a privileged container in such a way that it will have direct access to the disks available to the host system, but I can’t seem to figure out how to do this reliably.

So far, I’ve tried to use: lxc config device add c1 xvdb disk source=/dev/xvdb

Which works for exposing /dev/xvdb to the container, but then when I try to use it to create a ZFS pool, I receive an error because the two partitions that are created by ZFS are not exposed inside the container.

For example:

(host)$ lxc launch dxos-dev c1
Creating c1
Starting c1

(host)$ ls -l /dev/xvdb*
brw-rw---- 1 root disk 202, 16 Apr 26 22:01 /dev/xvdb

(host)$ lxc config device add c1 xvdb unix-block source=/dev/xvdb
Device xvdb added to c1

(host)$ lxc exec c1 /bin/bash

(container)# ls -l /dev/xvdb*
brw-rw---- 1 root root 202, 16 Apr 26 22:02 /dev/xvdb

(container)# zpool create tank /dev/xvdb
cannot label 'xvdb': failed to detect device partitions on '/dev/xvdb1': 19

(container)# ls -l /dev/xvdb*
brw-rw---- 1 root root 202, 16 Apr 26 22:04 /dev/xvdb

(container)# exit
exit

(host)$ ls -l /dev/xvdb*
brw-rw---- 1 root disk 202, 16 Apr 26 22:04 /dev/xvdb
brw-rw---- 1 root disk 202, 17 Apr 26 22:04 /dev/xvdb1
brw-rw---- 1 root disk 202, 25 Apr 26 22:04 /dev/xvdb9

The issue appears to be that the partitions generated by ZFS when zpool create was called, do not automatically get exposed inside the container, thus causing zpool create to fail.

As can be seen in the last command (ran on the host), the disk was properly partitioned by the zpool create command.

Is there a way to make it so these disk devices from the host’s /dev directory get automatically exposed inside the container?

In case it’s useful, I’ve configured my default profile like the following:

(host)$ lxc profile show default
config:
  raw.lxc: |
    lxc.apparmor.profile = unconfined
    lxc.cgroup.devices.allow = a
    lxc.mount.auto = proc:rw
    lxc.mount.auto = sys:rw
    lxc.mount.auto = cgroup-full:rw
  security.privileged: "true"
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
  zfs:
    source: /dev/zfs
    type: unix-char
name: default
used_by:
- /1.0/containers/c1

Both the host VM running in AWS, and the container, are running the same OS (i.e. same rootfs contents) which is based on a recent Ubuntu 18.04 beta release.

prakashsurya · April 27, 2018, 3:01am

I’ve been doing some more reading of various things I’ve found via Google searches…

Does udev work inside containers? I get the feeling that udev doesn’t work in the container, which I think would explain my issue (the device nodes for the new partitions not automatically getting created in the container).

And if I’m understanding things correctly, LXD doesn’t support any sort of “hot plug” mechanism for automatically adding the partitions after they’re created? Is that right?

If I manually run the following on the host:

(host)$ lxc config device add c1 xvdb1 unix-block source=/dev/xvdb1
(host)$ lxc config device add c1 xvdb9 unix-block source=/dev/xvdb9

and then re-attempt the “zpool create” command inside the container, it works:

(host)$ lxc exec c1 /bin/bash
(container)# zpool create tank /dev/xvdb
(container)# zpool list
NAME      SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
default  14.9G  5.88G  8.99G         -    16%    39%  1.00x  ONLINE  -
rpool    69.5G  37.1G  32.4G         -    25%    53%  1.00x  ONLINE  -
tank     7.94G   282K  7.94G         -     0%     0%  1.00x  ONLINE  -

So what are my options? So far I’m thinking that I can either:

Create some udev rules on the host, to dymically call “lxc config device add …” as the host detects new devices and partitions, or…
Manually create the devices nodes I expect using “mknod” from within the container

Does this sound about right? Am I missing anything, and/or completely mistaken in my analysis so far?

brauner · April 27, 2018, 11:01am

Does udev work inside containers? I get the feeling that udev doesn’t work in the container, which I think would explain my issue (the device nodes for the new partitions not automatically getting created in the container).

It doesn’t in unprivileged unprivileged containers but I have kernel patches to do some enablement there. For privileged containers that don’t drop CAP_MKNOD it will work. Note however, that udev itself doesn’t create device nodes on newer systems. That is done by the kernel via devtmpfs

21 26 0:6 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=4001752k,nr_inodes=1000438,mode=755

which isn’t namespaced which means newly created device nodes will show up in the host’s devtmpfs mount and not inside the container which mounts a tmpfs and “mocks” a devtmpfs. This also explains what you’re mentioning in your first part of the post. So without LXD’s assistance you won’t be able to get away with plain udev unfortunately.

And if I’m understanding things correctly, LXD doesn’t support any sort of “hot plug” mechanism for automatically adding the partitions after they’re created? Is that right?

LXD 3.0 has support for hotplugging devices on demand even for unix-{char,block}. If you know the name of the new device node that is going to show up in advance you can set required=false as a property when adding the device to the container’s config, i.e.

(host)$ lxc config device add c1 xvdb1 unix-block source=/dev/xvdb1 required=false

at which point LXD should hotplug the node as soon as it shows up.

prakashsurya · April 27, 2018, 3:05pm

Thank you for the reply!

Is it possible to bind mount the host’s “devtmpfs” into the container? I tried to simply mount the host’s “/dev” directory to “/dev” in the container, and that mostly worked, but then it wouldn’t let me “lxc exec” which made me think this was the wrong way to accomplish what I want.

I can grant this container any privileges that it needs. I’m investigating the benefits of moving an enterprise appliance/application that manages storage, ZFS, NFS, iSCSI, etc. into a container; hopefully for better upgrade, maintainability, etc. Currently this application runs directly on a host system, without using containers, and has root access to allow it to manage everything it needs. If I’m able to move it into a container, it would be the only container running on the host, and I can specifically tailor both the host and the container to meet my needs.

Using the “required=false” functionality from LXD 3.0 sounds like it may help, but it means that I need to maintain a list of all disk devices that could be hot-plugged into the host (e.g. xvd*, sd*, etc.), so they can be passed into the container using “lxc config device add”. I’m not sure if I can come up with such a list, or not. With that said, given a disk device, I’ll know the partition numbers that ZFS will generate, so I think this would work well for the device partitions.

Thanks for the help so far; I really appreciate it. This is my first time trying out LXC/LXD, so I’m learning as go. I’ll keep trying out the different options, and try to weight the benefits and drawbacks of each. If there’s any other possible approaches that I can try, please let me know.

Thanks again!

stgraber · April 27, 2018, 8:59pm

I wouldn’t recommend bind-mounting the host’s devtmpfs or mounting a new copy of it. The main problem you’ll hit there is that all the tty/pts devices used for input/ouput of all container processes will now point to the host, with a number of interesting side effects…

https://github.com/lxc/lxd/issues/1750

Sounds like it may do what you want, though combing up with the right filter may be tricky and it’s not a feature we have implemented yet.

Oh and one thing to be careful about, ZFS’ understanding of namespaces is very lacking. A zpool created from inside your container will be visible on the host and setting the mountpoint property of resulting datasets may or may not be relative to the host (I’d expect rebooting the host to lead to all of them becoming relative to the host). This may cause some major headaches…

The ZFS on Linux team has been planning to make ZFS more namespace aware in general, including administrative delegation down the line, but there’s been very little progress on this over the past few years.