Hidepid=2 not working in lxc


#1

Hi Community

I am having problems with setting hidepid=2 in LXC containers version 3.0.0. This appears on different systems. One is a Debian stretch running a proxmox kernel (4.15.17-3-pve) and the other is Ubuntu 18.04 with kernel 4.15.0-20-generic.

You can find more information about my problem in the proxmox forum (this post is from us):

Does anyone has any other hint than already written in the proxmox forum?

This is a problem which we have since about August 2016. It is a big security issue for us because multiple customers do have access to the same servers and they should not be able to see the processes of the others. Because it exists since a long time we are really looking forward to have a solution.

If you need more information, please ask! :slight_smile:

Thanks for your responses in advanced!


(Christian Brauner) #2

I need to hear exactly what is failing or not working. It’s not obvious to me what the issue is. Furthermore, I need at least:

  • the container’s config file
  • the trace log (lxc-start <container-name> -l trace -o <container-name>.log

#3

Hi

Thanks for your response!

We have created the apparmor config to be able to remount /proc:

# /etc/apparmor.d/lxc/lxc-default-cgns-with-proc-remount
profile lxc-default-cgns-with-proc-remount flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>

  # these are copied from lxc-container-default-cgns:
  deny mount fstype=devpts,
  mount fstype=cgroup -> /sys/fs/cgroup/**,

  # This will allow remounting /proc, eg to change hidepid
  mount options=(rw, nosuid, nodev, noexec, remount, silent, relatime) -> /proc/,
}

then reloaded apparmor:

# apparmor_parser -r -W -T /etc/apparmor.d/lxc-containers

now added the profile to the container:

lxc.apparmor.profile = lxc-default-cgns-with-proc-remount

So the container config looks like:

arch: amd64
cores: 2
cpulimit: 2
cpuunits: 1024
hostname: hostname.example.com
memory: 4096
net0: name=eth0,bridge=vmbr0,gw=46.231.201.193,hwaddr=AA:A1:3A:9D:41:31,ip=46.231.201.198/26,type=veth
onboot: 1
ostype: debian
rootfs: zfsvols:subvol-198-disk-1,acl=1
swap: 1024
lxc.apparmor.profile: lxc-container-default-cgns-with-proc-remount

after that it should work to use:

$ mount -o remount,hidepid=2 /proc
mount: cannot remount block device proc read-write, is write-protected

but after that mountstill gives the same output:

$ mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

You can find the trace on:
https://my.swissstash.com/public/5ba4cdb409

I hope with this information you can give me a hint. :slight_smile:


(Stéphane Graber) #4

What does the dmesg output look like after the failure? I’d expect to see an apparmor denial in there which would show you exactly what flags need to be included in the profile.


#5

Nothing is reported in dmesg from apparmor. The only entries on the host during the start are:

[Wed Jul 11 14:37:20 2018] IPv6: ADDRCONF(NETDEV_UP): veth189i0: link is not ready
[Wed Jul 11 14:37:21 2018] vmbr0: port 3(veth189i0) entered blocking state
[Wed Jul 11 14:37:21 2018] vmbr0: port 3(veth189i0) entered disabled state
[Wed Jul 11 14:37:21 2018] device veth189i0 entered promiscuous mode
[Wed Jul 11 14:37:21 2018] eth0: renamed from veth1RQE7S
[Wed Jul 11 14:37:21 2018] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[Wed Jul 11 14:37:21 2018] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[Wed Jul 11 14:37:21 2018] vmbr0: port 3(veth189i0) entered blocking state
[Wed Jul 11 14:37:21 2018] vmbr0: port 3(veth189i0) entered forwarding state

The only apparmor thing which is frequently reported is this:

[Wed Jul 11 14:39:10 2018] audit: type=1400 audit(1531312748.197:153): apparmor="DENIED" operation="file_lock" profile="lxc-container-default-cgns-with-proc-remount" pid=25613 comm="(ionclean)" family="unix" sock_type="dgram" protocol=0 addr=none

But this is not reported during the remount or start of the container.


(Christian Brauner) #6

Sorry, I missed this somehow. I’m looking.


(Christian Brauner) #7

So as a first guess I suspect that the remount rule for /proc needs to be:

  mount options=(rw, nosuid, nodev, noexec, remount, silent, relatime, hidepid) -> /proc/,

but I suspect that AppArmor doesn’t yet handle hidepid. If that’s the case though this should be a very easy fix.


(Philip Iezzi) #8

Thanks for the hint. But we tried that long time ago and I just tried it again, not supported - apparmor_parser already complains loading the lxc-default-cgns-with-proc-remount profile:

$ apparmor_parser -r -W -T /etc/apparmor.d/lxc-containers
  unsupported mount options

(Christian Brauner) #9

Ok, there’s some kernel nonsense going on related to remounting I guess. So what you can do as a workaround is to set:

lxc.mount.auto = cgroup:rw sys:rw
lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0

in the container’s config file. That should give you a /proc in the container with hidepid=2 with neither needing nor being able to remount later.


(Christian Brauner) #10

Actually, thinking about it it should suffice to only add:

lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0

(Philip Iezzi) #11

hi @brauner
can’t get this running. Container won’t startup. I set the following in pct config (running LXC on Proxmox VE / Debian Stretch):

lxc.mount.entry: proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0

Startup container using pct (lxc tools wrapper provided by Proxmox VE):

$ pct start 198
Job for pve-container@198.service failed because the control process exited with error code.
See "systemctl status pve-container@198.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@198' failed: exit code 1

Starting in foreground produces:

$ lxc-start -F -n 198
lxc-start: 198: cgroups/cgfsng.c: create_path_for_hierarchy: 1752 Path "/sys/fs/cgroup/pids//lxc/198" already existed.
                                                                                                                      lxc-start: 198: cgroups/cgfsng.c: cgfsng_create: 1862 Failed to create cgroup "/sys/fs/cgroup/pids//lxc/198"
                                                                          lxc-start: 198: cgroups/cgfsng.c: create_path_for_hierarchy: 1752 Path "/sys/fs/cgroup/cpuset//lxc/198-1" already existed.
                                            lxc-start: 198: cgroups/cgfsng.c: cgfsng_create: 1862 Failed to create cgroup "/sys/fs/cgroup/cpuset//lxc/198-1"
    lxc-start: 198: cgroups/cgfsng.c: create_path_for_hierarchy: 1752 Path "/sys/fs/cgroup/hugetlb//lxc/198-2" already existed.
                                                                                                                               lxc-start: 198: cgroups/cgfsng.c: cgfsng_create: 1862 Failed to create cgroup "/sys/fs/cgroup/hugetlb//lxc/198-2"
                                                                                        lxc-start: 198: cgroups/cgfsng.c: create_path_for_hierarchy: 1752 Path "/sys/fs/cgroup/cpuset//lxc/198-3" already existed.
                                                          lxc-start: 198: cgroups/cgfsng.c: cgfsng_create: 1862 Failed to create cgroup "/sys/fs/cgroup/cpuset//lxc/198-3"
                  lxc-start: 198: cgroups/cgfsng.c: create_path_for_hierarchy: 1752 Path "/sys/fs/cgroup/hugetlb//lxc/198-4" already existed.
                                                                                                                                             lxc-start: 198: cgroups/cgfsng.c: cgfsng_create: 1862 Failed to create cgroup "/sys/fs/cgroup/hugetlb//lxc/198-4"
                                                                                                      lxc-start: 198: cgroups/cgfsng.c: create_path_for_hierarchy: 1752 Path "/sys/fs/cgroup/cpuset//lxc/198-5" already existed.
                                                                        lxc-start: 198: cgroups/cgfsng.c: cgfsng_create: 1862 Failed to create cgroup "/sys/fs/cgroup/cpuset//lxc/198-5"
                                lxc-start: 198: utils.c: safe_mount: 1669 Device or resource busy - Failed to mount proc onto /usr/lib/x86_64-linux-gnu/lxc/rootfs/proc
               lxc-start: 198: conf.c: mount_entry: 1926 Device or resource busy - Failed to mount "proc" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/proc"
                                                                                                                                                       lxc-start: 198: conf.c: lxc_setup: 3407 Failed to setup mount entries
                                                                    lxc-start: 198: start.c: do_start: 1198 Failed to setup container "198"
                                                                                                                                           lxc-start: 198: sync.c: __sync_wait: 57 An error occurred in another process (expected sequence number 5)
                                                                                            lxc-start: 198: start.c: __lxc_start: 1883 Failed to spawn container "198"
              The container failed to start.
Additional information can be obtained by setting the --logfile and --logpriority options.

content of /var/lib/lxc/198/config:

lxc.arch = amd64
lxc.include = /usr/share/lxc/config/debian.common.conf
lxc.monitor.unshare = 1
lxc.tty.max = 2
lxc.environment = TERM=linux
lxc.uts.name = vtest.onlime.ch
lxc.cgroup.memory.limit_in_bytes = 4294967296
lxc.cgroup.memory.memsw.limit_in_bytes = 5368709120
lxc.cgroup.cpu.cfs_period_us = 100000
lxc.cgroup.cpu.cfs_quota_us = 200000
lxc.cgroup.cpu.shares = 1024
lxc.rootfs.path = /var/lib/lxc/198/rootfs
lxc.net.0.type = veth
lxc.net.0.veth.pair = veth198i0
lxc.net.0.hwaddr = AA:A1:3A:9D:41:31
lxc.net.0.name = eth0
lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0
lxc.cgroup.cpuset.cpus = 4,6

If I set lxc.mount.entry = proc /proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0, the container will startup correctly. But /proc still is not mounted with hidepid=2 inside container.


(Christian Brauner) #12

Seems like /proc is already mounted. Try to set:

lxc.mount.auto =
lxc.mount.auto = sys:mixed cgroup:mixed
lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0

and try again.


(Philip Iezzi) #13

Wow @brauner, you made my day! Thank you so much!

Your solution seems to work:

lxc.mount.auto = 
lxc.mount.auto = sys:mixed cgroup:mixed
lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0

Great!

Inside LXC container:

$ mount | grep 'type proc'
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime,hidepid=2)

I am just wondering if this has no negative side effects, as previously (without this workaround) there were a bunch of /proc/* mounts around which now no longer exist:

--- mounts-orig.txt	2018-07-22 23:23:53.000000000 +0200
+++ mounts-hidepid2.txt	2018-07-22 23:23:45.000000000 +0200
@@ -1,19 +1,10 @@
 rpool/zfsdisks/subvol-198-disk-1 on / type zfs (rw,noatime,xattr,posixacl)
 none on /dev type tmpfs (rw,relatime,size=492k,mode=755)
-proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
-proc on /proc/sys/net type proc (rw,nosuid,nodev,noexec,relatime)
-proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
-proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
+proc on /proc type proc (rw,nosuid,nodev,noexec,relatime,hidepid=2)
 sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
 sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
 sysfs on /sys/devices/virtual/net type sysfs (rw,relatime)
 sysfs on /sys/devices/virtual/net type sysfs (rw,nosuid,nodev,noexec,relatime)
-lxcfs on /proc/cpuinfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
-lxcfs on /proc/diskstats type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
-lxcfs on /proc/meminfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
-lxcfs on /proc/stat type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
-lxcfs on /proc/swaps type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
-lxcfs on /proc/uptime type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
 fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
 devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
 devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,max=1024)

Before I run this on a large production host, can you please explain why all those /proc/* got mounted previously and are missing now? With your workaround I can still e.g. cat /proc/cpuinfo and get correct information.


(Christian Brauner) #14

Can you show - as root in the container - cat /proc/1/mountinfo, please.


(Philip Iezzi) #15

with your workaround applied:

$ cat /proc/1/mountinfo
471 172 0:51 / / rw,noatime master:36 - zfs rpool/zfsdisks/subvol-198-disk-1 rw,xattr,posixacl
472 471 0:61 / /dev rw,relatime - tmpfs none rw,size=492k,mode=755
473 471 0:62 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
474 473 0:62 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs rw
475 474 0:62 / /sys/devices/virtual/net rw,relatime - sysfs sysfs rw
476 475 0:62 /devices/virtual/net /sys/devices/virtual/net rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
477 474 0:44 / /sys/fs/fuse/connections rw,relatime master:30 - fusectl fusectl rw
478 471 0:60 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw,hidepid=2
479 472 0:22 /0 /dev/console rw,nosuid,noexec,relatime master:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
173 472 0:63 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666,max=1024
174 472 0:63 /ptmx /dev/ptmx rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666,max=1024
175 472 0:63 /0 /dev/tty1 rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666,max=1024
176 472 0:63 /1 /dev/tty2 rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666,max=1024
177 472 0:64 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw
178 471 0:65 / /run rw,nosuid,nodev - tmpfs tmpfs rw,mode=755
179 178 0:66 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=5120k
180 474 0:67 / /sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,mode=755
181 180 0:28 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd
182 180 0:31 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory
183 180 0:30 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,rdma
184 180 0:38 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset
185 180 0:40 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio
186 180 0:34 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct
209 180 0:32 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio
210 180 0:37 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb
211 180 0:33 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer
212 180 0:36 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids
213 180 0:35 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event
214 180 0:39 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices
215 472 0:59 / /dev/mqueue rw,relatime - mqueue mqueue rw
216 472 0:224 / /dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M
217 478 0:54 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime - binfmt_misc binfmt_misc rw
108 178 0:58 / /run/user/1000 rw,nosuid,nodev,relatime - tmpfs tmpfs rw,size=4944332k,mode=700,uid=1000,gid=1000

(Christian Brauner) #16

The remount in the lxc.mount.entry line for proc will cause the additional lxcfs mounts to be stripped so you’d be missing those. And also the additonal proc mounts but you can workaround this by doing:

lxc.mount.auto =
lxc.mount.auto = cgroup:rw:force sys:rw
lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0
lxc.mount.entry = proc/sys proc/sys proc ro,bind,relative 0 0
lxc.mount.entry = proc/sys/net proc/sys/net proc rw,bind,relative 0 0
lxc.mount.entry = proc/sysrq-trigger proc/sysrq-trigger proc ro,bind,relative 0 0

(Christian Brauner) #17

To copy exactly what LXC is doing right now you’d need:

lxc.mount.auto =
lxc.mount.auto = cgroup:rw:force sys:rw
lxc.mount.entry = proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0
lxc.mount.entry = proc/sys proc/sys proc ro,bind,relative 0 0
lxc.mount.entry = proc/sys/net proc/sys/net proc rw,bind,relative 0 0
lxc.mount.entry = proc/sysrq-trigger proc/sysrq-trigger proc ro,bind,relative 0 0

lxc.mount.entry = /var/lib/lxcfs/proc/cpuinfo proc/cpuinfo none bind,optional 0 0
lxc.mount.entry = /var/lib/lxcfs/proc/diskstats proc/diskstats none bind,optional 0 0
lxc.mount.entry = /var/lib/lxcfs/proc/meminfo proc/meminfo none bind,optional 0 0
lxc.mount.entry = /var/lib/lxcfs/proc/stat proc/stat none bind,optional 0 0
lxc.mount.entry = /var/lib/lxcfs/proc/swaps proc/swaps none bind,optional 0 0
lxc.mount.entry = /var/lib/lxcfs/proc/uptime proc/uptime none bind,optional 0 0

(Philip Iezzi) #18

hi @brauner - Thanks a lot for your workaround which works great since 3 weeks now. To match the original mounts I had to use lxc.mount.auto: sys:mixed cgroup:mixed instead of lxc.mount.auto = cgroup:rw:force sys:rw.

Final solution:

lxc.mount.auto: 
lxc.mount.auto: sys:mixed cgroup:mixed
lxc.mount.entry: proc proc proc rw,remount,nodev,nosuid,noexec,relatime,hidepid=2 0 0
lxc.mount.entry: proc/sys proc/sys proc ro,bind,relative 0 0
lxc.mount.entry: proc/sys/net proc/sys/net proc rw,bind,relative 0 0
lxc.mount.entry: proc/sysrq-trigger proc/sysrq-trigger proc ro,bind,relative 0 0
lxc.mount.entry: /var/lib/lxcfs/proc/cpuinfo proc/cpuinfo none bind,optional 0 0
lxc.mount.entry: /var/lib/lxcfs/proc/diskstats proc/diskstats none bind,optional 0 0
lxc.mount.entry: /var/lib/lxcfs/proc/meminfo proc/meminfo none bind,optional 0 0
lxc.mount.entry: /var/lib/lxcfs/proc/stat proc/stat none bind,optional 0 0
lxc.mount.entry: /var/lib/lxcfs/proc/swaps proc/swaps none bind,optional 0 0
lxc.mount.entry: /var/lib/lxcfs/proc/uptime proc/uptime none bind,optional 0 0

I am just considering this as a workaround and hopefully remounting /proc with hidepid mount option will be possible again in the future with the initial proposed AppArmor profile or similar. So if any lxc/lxcfs dev reads this, please investigate further.