Container error after changing shiftfs (false/true)

The situation is:

  1. I created a container with shiftfs already being enabled - everything worked fine for months now, until:
  2. For some reason sometimes during kernel updates the builds of dkms modules fail (reasons not important here), including the one for shiftfs (without telling me; arch linux problem :roll_eyes:).
  3. LXD then disables shiftfs (also without telling me), because it can’t find the module.
  4. When I start the container, LXD automatically tries to “remap” the container (it says “remapping”).
  5. It fails.

Now even after I re-enable shiftfs, the regular container start does not work anymore.
Errors reported were:

  1. Error: Error occurred when starting proxy device: Error: remove /mnt/wayland1/wayland-0: permission denied
  2. Error: Error occurred when starting proxy device: Error: Failed to receive fd from listener process

Container Log shows:

lxc gaming1 20200823144555.732 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - file already exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.gaming1"
lxc gaming1 20200823144555.734 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - file already exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.gaming1"
lxc gaming1 20200823144555.754 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1570 - file or folder not found  - Failed to fchownat(17, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )

Note: file already exists & file or folder not found are my translations.


Solutions I tried:

  • Computer restart
  • setting security.privileged true and unset it again, to apply shiftfs ones again.

Interestingly the container starts fine with security.privileged true.
But fails when I unset security.privileged.

I will probably try to remove all profiles with the proxy devices and apply them again.


Regarding general behaviour of LXD:
I would like LXD to warn and ask me before disabling shiftfs and remapping my container.


Additional information:

Other containers start well, but they did not get remapped (were not started during shiftfs=false) and most of them don’t have proxy devices or disk devices with shiftfs.

Container config:

architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Archlinux  x86_64 (20200723_1910)
  image.name: archlinux--x86_64-default-20200723_1910
  image.os: archlinux
  image.serial: "20200723_1910"
  image.variant: default
  security.idmap.isolated: "true"
  volatile.base_image: 68bb944fc0055f0d07dc18cfc513ca011356095cd68000b8fbf61e44d769553e
  volatile.eth0.hwaddr: 00:16:3e:fa:4c:45
  volatile.idmap.base: "1065536"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
devices:
  Pulseovernetwork:
    bind: container
    connect: tcp:127.0.0.1:4713
    listen: tcp:127.0.0.1:4713
    type: proxy
  Wayland0:
    bind: container
    connect: unix:/run/user/1000/wayland-0
    gid: "1000"
    listen: unix:/mnt/wayland1/wayland-0
    mode: "0777"
    security.gid: "1000"
    security.uid: "1000"
    type: proxy
    uid: "1000"
  XWayland1:
    bind: container
    connect: unix:/tmp/.X11-unix/X1
    gid: "1000"
    listen: unix:/mnt/wayland1/X1
    mode: "0777"
    security.gid: "1000"
    security.uid: "1000"
    type: proxy
    uid: "1000"
  eth0:
    name: eth0
    nictype: macvlan
    parent: enp3s0
    type: nic
  gamesold:
    path: /media/Games
    shift: "true"
    source: /media/Games
    type: disk
  linuxgamedata:
    path: /media/[...]/linux_games/
    shift: "true"
    source: /media/[...]/linux_games/
    type: disk
  mygpu:
    type: gpu
  root:
    path: /
    pool: one
    type: disk
  steamdata:
    path: /media/[...]/steam1_stuff
    shift: "true"
    source: /media/[...]/steam1_stuff
    type: disk
  winedata2:
    path: /media/[...]/winedata2/
    shift: "true"
    source: /media/[...]/winedata2/
    type: disk
  winedata3:
    path: /media/[...]/winedata3/
    shift: "true"
    source: /media/[...]/winedata3/
    type: disk
ephemeral: false
profiles:
- macvlan1
- gamingplus
- wayland1
- xwayland1
stateful: false
description: ""

Note: For privacy reasons I replaced parts of the paths with […], but these are regular paths which always worked fine.

lxc info:

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    [...]
    
    -----END CERTIFICATE-----
  certificate_fingerprint: [...]
  driver: lxc
  driver_version: 4.0.4
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.7.12-arch1-1
  lxc_features:
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
  os_name: Arch Linux
  os_version: ""
  project: default
  server: lxd
  server_clustered: false
  server_name: dalaran
  server_pid: 938
  server_version: "4.4"
  storage: btrfs
  storage_version: "5.7"

Update:

After removing profiles for both wayland sockets (which get applied with unix-proxy devices), the container starts again.

But now it shows “nobody” as owner for almost every main folder:

-rw-r--r--   1 nobody nobody 1276 Jul  1 08:42 README
lrwxrwxrwx   1 nobody nobody    7 May 20 00:42 bin -> usr/bin
drwxr-xr-x   1 nobody nobody    0 Jul 28 23:08 boot
drwxr-xr-x   9 root   root    500 Aug 23 17:44 dev
drwxr-xr-x   1 nobody nobody 1982 Aug 21 17:14 etc
drwxr-xr-x   1 nobody nobody   12 Jul 28 22:05 home
lrwxrwxrwx   1 nobody nobody    7 May 20 00:42 lib -> usr/lib
lrwxrwxrwx   1 nobody nobody    7 May 20 00:42 lib64 -> usr/lib
drwxr-xr-x   1 nobody nobody   36 Jul 31 03:35 media
drwxr-xr-x   1 nobody nobody   16 Jul 31 17:45 mnt
drwxr-xr-x   1 nobody nobody    0 May 20 00:42 opt
dr-xr-xr-x 359 nobody nobody    0 Aug 23 17:44 proc
drwxr-x---   1 nobody nobody  160 Aug 19 22:57 root
drwxr-xr-x   8 root   root    220 Aug 23 17:44 run
lrwxrwxrwx   1 nobody nobody    7 May 20 00:42 sbin -> usr/bin
drwxr-xr-x   1 nobody nobody   14 Jul 23 21:10 srv
dr-xr-xr-x  13 nobody nobody    0 Aug 23 17:44 sys
drwxrwxrwt   3 root   root     60 Aug 23 17:44 tmp
drwxr-xr-x   1 nobody nobody   80 Aug 21 17:09 usr
drwxr-xr-x   1 nobody nobody  116 Aug 21 17:14 var

Do you still happen to have the error it showed while shifting?

With the container running, can you do ls -lh /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/default/containers/NAME/rootfs/ so we can see what the actual uid/gid is?

And lxc config show --expanded NAME too.

The shifting error would be useful to have as otherwise configuring LXD to have it shift the container again will likely just result in the same failure.

You mean the “remapping”?
Sry I guess not, the lxd.log shows nothing suspicious and the container logs only contain the messages I got when I tried to start them.
(I should have copied it before I rebootet and restarted, but I didn’t :slightly_frowning_face:. )
Is there a way it could still be somewhere?

I use the version build for arch, so I hope thats the right thing:

cd /var/lib/lxd/storage-pools/one/containers/gaming1
ls -lh rootfs                                                             (08-23 20:48)
insgesamt 20K
lrwxrwxrwx 1 1065536 1065536    7 20. Mai 00:42 bin -> usr/bin/
drwxr-xr-x 1 1065536 1065536    0 28. Jul 23:08 boot/
drwxr-xr-x 1 1065536 1065536    0 23. Jul 21:11 dev/
drwxr-xr-x 1 1065536 1065536 2,0K 21. Aug 17:14 etc/
drwxr-xr-x 1 1065536 1065536   12 28. Jul 22:05 home/
lrwxrwxrwx 1 1065536 1065536    7 20. Mai 00:42 lib -> usr/lib/
lrwxrwxrwx 1 1065536 1065536    7 20. Mai 00:42 lib64 -> usr/lib/
drwxr-xr-x 1 1065536 1065536   36 31. Jul 03:35 media/
drwxr-xr-x 1 1065536 1065536   16 31. Jul 17:45 mnt/
drwxr-xr-x 1 1065536 1065536    0 20. Mai 00:42 opt/
dr-xr-xr-x 1 1065536 1065536    0  1. Jul 08:42 proc/
-rw-r--r-- 1 1065536 1065536 1,3K  1. Jul 08:42 README
drwxr-x--- 1 1065536 1065536  160 19. Aug 22:57 root/
drwxr-xr-x 1 1065536 1065536    0  1. Jul 08:42 run/
lrwxrwxrwx 1 1065536 1065536    7 20. Mai 00:42 sbin -> usr/bin/
drwxr-xr-x 1 1065536 1065536   14 23. Jul 21:10 srv/
dr-xr-xr-x 1 1065536 1065536    0  1. Jul 08:42 sys/
drwxrwxrwt 1 root    root       0  1. Jul 08:42 tmp/
drwxr-xr-x 1 1065536 1065536   80 21. Aug 17:09 usr/
drwxr-xr-x 1 1065536 1065536  116 21. Aug 17:14 var/

See “Container config” in post 1.

Ok, cool, so your container appears to be shifted on the filesystem with a base host uid of 1065536.

What’s lxc config show --expanded gaming1 showing you?

Sounds promising.

Update:
Seems it’s the same ipbase:

 volatile.idmap.base: "1065536"

Note: I replaced some parts of the paths with […] for privacy reasons.

lxc config show --expanded gaming1              (08-25 16:39)
architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Archlinux  x86_64 (20200723_1910)
  image.name: archlinux--x86_64-default-20200723_1910
  image.os: archlinux
  image.serial: "20200723_1910"
  image.variant: default
  security.idmap.isolated: "true"
  volatile.base_image: 68bb944fc0055f0d07dc18cfc513ca011356095cd68000b8fbf61e44d769553e
  volatile.eth0.hwaddr: 00:16:3e:fa:4c:45
  volatile.idmap.base: "1065536"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
devices:
  Pulseovernetwork:
    bind: container
    connect: tcp:127.0.0.1:4713
    listen: tcp:127.0.0.1:4713
    type: proxy
  eth0:
    name: eth0
    nictype: macvlan
    parent: enp3s0
    type: nic
  gamesold:
    path: /media/Games
    shift: "true"
    source: /media/Games
    type: disk
  linuxgamedata:
    path: /media/[...]/linux_games/
    shift: "true"
    source: /media/[...]/linux_games/
    type: disk
  mygpu:
    type: gpu
  root:
    path: /
    pool: one
    type: disk
  steamdata:
    path: /media/[...]/steam_stuff
    shift: "true"
    source: /media/[...]/steam_stuff
    type: disk
  winedata2:
    path: /media/[...]/winedata2/
    shift: "true"
    source: /media/[...]/winedata2/
    type: disk
  winedata3:
    path: /media/[...]/winedata3/
    shift: "true"
    source: /media/[...]/winedata3/
    type: disk
ephemeral: false
profiles:
- macvlan1
- gamingplus
stateful: false
description: ""

Ok, so looks like you can fix this with:

  • lxc config set gaming1 volatile.last_state.idmap ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1065536,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:1065536,“Nsid”:0,“Maprange”:65536}]’
  • lxc start gaming1

volatile.last_state.idmap represents the on-disk map, so making that line up with the state of the filesystem should make things behave.

1 Like

Yes that worked, thank you.

Still this situation looks strange to me, any clue what might happened?