Sockets + VFS idmap shifting = error

On Ubuntu 22.04 with kernel 6.2.0-39 as a host VFS idmap shifting works for folders and single files (although with a little quirk). But when I try to add disk device with shift=true for a socket:

incus launch images:ubuntu/jammy/cloud test
incus config device add test wayland_socket disk shift=true source=/run/user/1000/wayland-0 path=/mnt/wayland-0

I get an error:

Error: Failed to start device "wayland_socket": Required idmapping abilities not available

Is my kernel just too old? Or there is some other reason why shift=true works only for folders and single files?

With raw.idmap set manually and no shift=true container starts smoothly, but I’m trying to avoid raw.idmap.

I think it’s better to use proxy device type when dealing with sockets. Proxy devices can handle communication for Unix sockets as well.

I don’t have an exact explanation on why it’s not working, I guess unix sockets have a special handling for this kind of scenario. Using some kind of pipe between those seems to helps to cross boundaries between namespaces since it’s much like socat between 2 sockets.

Unfortunately, when using proxy device for wayland socket, apps like Chrome started in container will quit immediately. This is a known issue and the workaround is to use disk device instead.

So the question remains - is the kernel in Ubuntu 22.04 too old? Or is there some other reason why shift=true works only for folders and single files?

I’d give a newer kernel a shot. Support for VFS idmap shifting on tmpfs which is what’s behind /run has been introduced in January/February of this year, so quite possibly not supported in your current kernel.

Alternatively, the proxy device definitely is supposed to work. The behavior you’re describing sounds like an issue caused by your proxy device not using the security.uid and security.gid options (not to be confused with uid/gid). security.uid and security.gid should be set to the user on the host from which the traffic should appear to come from (likely 1000 for both), uid and gid are instead for who in the container should be able to connect to the socket, this may similarly be 1000 for both.

If not set, security.uid and security.gid will default to 0 which will make it look like to wayland as if the connections come from the root user, this may get rejected causing the issues you’re mentioning.

1 Like

I tried what you suggested, but in my case, the proxy device didn’t work for wayland socket and needed two workarounds for pulseaudio socket. Let me show you two profiles I made for testing. One uses proxy devices and breaks Chrome, the other uses disk devices and everything works smoothly.

gui_proxy profile - Chrome won’t start

config:
  cloud-init.user-data: |
    #cloud-config
    package_update: true
    package_upgrade: true
    package_reboot_if_required: true
    packages:
      - pulseaudio-utils
    write_files:
    - path: /var/lib/cloud/scripts/per-boot/set_up_sockets.sh
      permissions: 0755
      content: |
        #!/bin/bash
        user="ubuntu"
        uid=$( id ${user} -u )
        gid=$( id ${user} -g )
        if [[ ${uid} =~ ^[0-9]+$ ]]; then
          mnt_dir=/mnt
          tmp_dir=/tmp/.X11-unix
          run_dir=/run/user/${uid}
          [[ ! -d "${tmp_dir}" ]] && mkdir -p "${tmp_dir}" && chmod 777 "${tmp_dir}"
          [[ ! -d "${run_dir}" ]] && mkdir -p "${run_dir}" && chmod 700 "${run_dir}" && chown ${uid}:${gid} "${run_dir}"
          [[ ! -d "${run_dir}/pulse" ]] && mkdir -p "${run_dir}/pulse" && chmod 700 "${run_dir}/pulse" && chown ${uid}:${gid} "${run_dir}/pulse"
          [[ -e "${mnt_dir}/X0" ]] && [[ -d "${tmp_dir}" ]] && [[ ! -e "${tmp_dir}/X0" ]] && touch "${tmp_dir}/X0" && sudo mount --bind "${mnt_dir}/X0" "${tmp_dir}/X0"
          [[ -e "${mnt_dir}/wayland-0" ]] && [[ -d "${run_dir}" ]] && [[ ! -e "${run_dir}/wayland-0" ]] && touch "${run_dir}/wayland-0" && sudo mount --bind "${mnt_dir}/wayland-0" "${run_dir}/wayland-0"
          [[ -e "${mnt_dir}/native" ]]  && [[ -d "${run_dir}/pulse" ]] && [[ ! -e "${run_dir}/pulse/native" ]] && touch "${run_dir}/pulse/native" && sudo mount --bind "${mnt_dir}/native" "${run_dir}/pulse/native"
        fi
    - path: /var/lib/cloud/scripts/per-once/set_up_env_vars.sh
      permissions: 0755
      content: |
        #!/bin/bash
        profile="/home/ubuntu/.profile"
        if [[ -f "${profile}" ]]; then
          echo "export WAYLAND_DISPLAY=wayland-0" >> "${profile}"
          echo "export XDG_SESSION_TYPE=wayland" >> "${profile}"
          echo "export QT_QPA_PLATFORM=wayland" >> "${profile}"
          echo "export DISPLAY=:0" >> "${profile}"
        fi
description: GUI Wayland and X11 profile with pulseaudio
devices:
  gpu:
    type: gpu
    gid: 44
  wayland_socket:
    bind: container
    connect: unix:/run/user/1000/wayland-0
    listen: unix:/mnt/wayland-0
    security.gid: "1000"
    security.uid: "1000"
    uid: "1000"
    gid: "1002"
    mode: "0775"
    type: proxy
  x11_socket:
    bind: container
    connect: unix:/tmp/.X11-unix/X0
    listen: unix:/mnt/X0
    security.gid: "1000"
    security.uid: "1000"
    uid: "1000"
    gid: "1002"
    mode: "0775"
    type: proxy
  pulseaudio_socket:
    bind: container
    connect: unix:/run/user/1000/pulse/native
    listen: unix:/mnt/native
    security.gid: "1000"
    security.uid: "1000"
    uid: "1000"
    gid: "1002"
    mode: "0666"
    type: proxy

This profile puts sockets in their respective folders and sets up necessary environment variables. But you need to restart container for environment variables to take effect. So:

incus launch images:ubuntu/jammy/cloud -p default -p gui_proxy test

Wait a minute, restart container and log in:

incus restart test
incus exec test -- sudo --user ubuntu --login

Now all the sockets have a proper owner and exactly the same permissions as on the host:

$ ll /mnt/
srwxrwxr-x  1 ubuntu ubuntu    0 Dec 27 21:37 X0=
srw-rw-rw-  1 ubuntu ubuntu    0 Dec 27 21:37 native=
srwxrwxr-x  1 ubuntu ubuntu    0 Dec 27 21:37 wayland-0=

$ ll /tmp/.X11-unix/X?
srwxrwxr-x 1 ubuntu ubuntu 0 Dec 27 21:37 /tmp/.X11-unix/X0=

$ ll /run/user/*/wayland-?
srwxrwxr-x 1 ubuntu ubuntu 0 Dec 27 21:37 /run/user/1000/wayland-0=

$ ll /run/user/*/pulse/native
srw-rw-rw- 1 ubuntu ubuntu 0 Dec 27 21:37 /run/user/1000/pulse/native=

$ printenv | grep -i display
WAYLAND_DISPLAY=wayland-0
DISPLAY=:0

But when I install Chrome, it won’t start (packages libegl1 and upower are to minimize errors thrown by Chrome):

sudo apt install wget
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ~/google-chrome-stable_current_amd64.deb
sudo apt install libegl1 upower

Now when I start Chrome in wayland mode (X11 mode works fine) I get some errors and it quits immediately:

$ google-chrome --enable-features=UseOzonePlatform --ozone-platform=wayland

[324:324:1227/210435.327390:ERROR:viz_main_impl.cc(196)] Exiting GPU process due to errors during initialization
[390:390:1227/210435.474112:ERROR:viz_main_impl.cc(196)] Exiting GPU process due to errors during initialization
[361:7:1227/210435.578647:ERROR:command_buffer_proxy_impl.cc(127)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.

[269:269:1227/210435.604420:ERROR:wayland_event_watcher.cc(43)] libwayland: wl_display@1: error 1: invalid arguments for wl_shm@5.create_pool

[1227/210435.620741:ERROR:file_io_posix.cc(145)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[1227/210435.621123:ERROR:file_io_posix.cc(145)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2)
Trace/breakpoint trap (core dumped)

The first three errors are the same as when I run Chrome in container using disk devices. The libwayland error is probably the critical one.

On top of that, PulseAudio won’t work properly without two workarounds. When tested with pactl info command 10+ times in a row, it’ll show access denied some of the time. So first, you need to copy pulse cookie from host to container:

incus file push -p --mode=600 --gid=1000 --uid=1000 ~/.config/pulse/cookie test/home/ubuntu/.config/pulse/

and disable shared memory for PulseAudio inside container:

sudo sed -i "s/; enable-shm = yes/enable-shm = no/g" /etc/pulse/client.conf

All those problems go away when using disk devices for sockets.

gui_disk profile - everything works fine

config:
  raw.idmap: |-
    uid 1000 1000
    gid 1000 1002
  cloud-init.user-data: |
    #cloud-config
    package_update: true
    package_upgrade: true
    package_reboot_if_required: true
    packages:
      - pulseaudio-utils
    write_files:
    - path: /var/lib/cloud/scripts/per-boot/set_up_sockets.sh
      permissions: 0755
      content: |
        #!/bin/bash
        user="ubuntu"
        uid=$( id ${user} -u )
        gid=$( id ${user} -g )
        if [[ ${uid} =~ ^[0-9]+$ ]]; then
          mnt_dir=/mnt
          tmp_dir=/tmp/.X11-unix
          run_dir=/run/user/${uid}
          [[ ! -d "${tmp_dir}" ]] && mkdir -p "${tmp_dir}" && chmod 777 "${tmp_dir}"
          [[ ! -d "${run_dir}" ]] && mkdir -p "${run_dir}" && chmod 700 "${run_dir}" && chown ${uid}:${gid} "${run_dir}"
          [[ ! -d "${run_dir}/pulse" ]] && mkdir -p "${run_dir}/pulse" && chmod 700 "${run_dir}/pulse" && chown ${uid}:${gid} "${run_dir}/pulse"
          [[ -e "${mnt_dir}/X0" ]] && [[ -d "${tmp_dir}" ]] && [[ ! -e "${tmp_dir}/X0" ]] && touch "${tmp_dir}/X0" && sudo mount --bind "${mnt_dir}/X0" "${tmp_dir}/X0"
          [[ -e "${mnt_dir}/wayland-0" ]] && [[ -d "${run_dir}" ]] && [[ ! -e "${run_dir}/wayland-0" ]] && touch "${run_dir}/wayland-0" && sudo mount --bind "${mnt_dir}/wayland-0" "${run_dir}/wayland-0"
          [[ -e "${mnt_dir}/native" ]]  && [[ -d "${run_dir}/pulse" ]] && [[ ! -e "${run_dir}/pulse/native" ]] && touch "${run_dir}/pulse/native" && sudo mount --bind "${mnt_dir}/native" "${run_dir}/pulse/native"
        fi
    - path: /var/lib/cloud/scripts/per-once/set_up_env_vars.sh
      permissions: 0755
      content: |
        #!/bin/bash
        profile="/home/ubuntu/.profile"
        if [[ -f "${profile}" ]]; then
          echo "export WAYLAND_DISPLAY=wayland-0" >> "${profile}"
          echo "export XDG_SESSION_TYPE=wayland" >> "${profile}"
          echo "export QT_QPA_PLATFORM=wayland" >> "${profile}"
          echo "export DISPLAY=:0" >> "${profile}"
        fi
description: GUI Wayland and X11 profile with pulseaudio
devices:
  gpu:
    type: gpu
    gid: 44
  wayland_socket:
    source: /run/user/1000/wayland-0
    path: /mnt/wayland-0
    type: disk
  x11_socket:
    source: /tmp/.X11-unix/X0
    path: /mnt/X0
    type: disk
  pulseaudio_socket:
    source: /run/user/1000/pulse/native
    path: /mnt/native
    type: disk

The only difference between the two profiles is that this one uses disk devices and raw.idmap for proper shifting. Chrome still throws some errors, but works, and you don’t have to use any workarounds for PulseAudio.

Is there something I’m missing when making profile with proxy devices? Why would disk devices work so much better for sockets?

Ubuntu 22.04 has a new kernel 6.5.0, so I decided to check if shift=true option works now for sockets in folders using tmpfs. I have good news and bad news.

Shifting on wayland and pulse sockets in /run works fine. Pulse requires copying a cookie (see previous post), but that’s just a minor inconvenience.

On the other hand, X11 socket X0 in /tmp doesn’t work. Applications that use it will throw an error:

Authorization required, but no authorization protocol specified
Error: Can't open display: :0

xWayland socket X1 (env var DISPLAY=:1) instead of showing this error will hang the application.

Steps to replicate X11 socket behavior with shift=true on disk device are:

incus launch images:ubuntu/jammy/cloud test
incus config device add test x11_socket disk shift=true source=/tmp/.X11-unix/X0 path=/mnt/X0
incus exec test -- sudo --user ubuntu --login

touch "/tmp/.X11-unix/X0"
sudo mount --bind "/mnt/X0" "/tmp/.X11-unix/X0"
export DISPLAY=:0
sudo apt update
sudo apt install x11-apps
xclock

You can replace touch and mount combo with ln -sf "/mnt/X0" "/tmp/.X11-unix/X0", but the effect is the same.

When using raw.idmap instead of shift=true on disk device, everything works fine:

incus launch images:ubuntu/jammy/cloud test
printf "uid $(id -u) 1000\ngid $(id -g) 1002" | incus config set test raw.idmap -
incus config device add test x11_socket disk source=/tmp/.X11-unix/X0 path=/mnt/X0
incus exec test -- sudo --user ubuntu --login

touch "/tmp/.X11-unix/X0"
sudo mount --bind "/mnt/X0" "/tmp/.X11-unix/X0"
export DISPLAY=:0
sudo apt update
sudo apt install x11-apps
xclock

Using proxy device for abstract unix socket also works fine:

incus launch images:ubuntu/jammy/cloud test
incus config device add test x11_socket proxy bind=container connect=unix:@/tmp/.X11-unix/X0 listen=unix:@/tmp/.X11-unix/X0 security.uid=$(id -u) security.gid=$(id -g)
incus exec test -- sudo --user ubuntu --login

export DISPLAY=:0
sudo apt update
sudo apt install x11-apps
xclock

We’ll see what changes Ubuntu 24.04 will bring with an even newer kernel.