Can not start nested LXD container

If i try to run nested lxd containers some containers fail.

lxc launch ubuntu/20.04 -c security.nesting=true -c security.privileged=true

And in there I init lxd with lxd init --auto.
Now it is possible to run basic containers like this:

lxc launch images:debian/buster/i386

But for some containers which have a non standard idmap config like this container:

"raw.idmap": "uid 0 0\ngid 0 0",

starting this container leads to this issue

lxc start test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954/lxc.conf: 
Try `lxc info --show-log test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954` for more info

And the interesting logs are these:

lxc test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954 20210615131528.154 ERROR    conf - conf.c:write_id_mapping:2917 - Invalid argument - Failed to write uid mapping to "/proc/9218/uid_map"
lxc test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954 20210615131528.171 ERROR    conf - conf.c:lxc_map_ids:3092 - Failed to write mapping: 0 0 1
1 1000001 4999
5000 0 1
5001 1005001 999994999

lxc test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954 20210615131528.171 ERROR    start - start.c:lxc_spawn:1783 - Failed to set up id mapping.
lxc test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954 20210615131528.171 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:869 - Received container state "ABORTING" instead of "RUNNING"

Is a config like this possible?

Can you try without your raw.idmap?

1 Like

Can you also show cat /proc/self/uid_map from within the parent container?

Yes idmap is the problematic config.

and this is the uid_map inside the parten container:

 cat /proc/self/uid_map
         0          0 4294967295

@brauner any ideas? Looks like it’s the kernel rejecting the map but as that’s done from within a privileged container, I’m confused as to why it would.

1 Like

The information is not enough to go on. What distro, what kernel version is this? Can you please provide the output of lxc info?

Host config:

config:
  core.https_address: '[::]:8443'
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 10.112.0.56:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
  certificate_fingerprint: xxx
  driver: lxc | qemu
  driver_version: 4.0.9 | 5.2.0
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "false"
    seccomp_listener: "false"
    seccomp_listener_continue: "false"
    shiftfs: "false"
    uevent_injection: "false"
    unpriv_fscaps: "true"
  kernel_version: 4.15.0-143-generic
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "false"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "18.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: pcname
  server_pid: 2479
  server_version: "4.15"
  storage: dir
  storage_version: "1"

Privileged container config:

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- resources_system
- usedby_consistency
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- storage_rsync_compression
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_state_vlan
- gpu_sriov
- migration_stateful
- disk_state_quota
- storage_ceph_features
- gpu_mig
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
  certificate_fingerprint: c5c078ec2c1be53d591b279e8e34bda650dba39d9a7b99f24ac7454c478a84f2
  driver: lxc | qemu
  driver_version: 4.0.9 | 5.2.0
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "false"
    seccomp_listener: "false"
    seccomp_listener_continue: "false"
    shiftfs: "false"
    uevent_injection: "false"
    unpriv_fscaps: "true"
  kernel_version: 4.15.0-143-generic
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: divine-hyena
  server_pid: 1042
  server_version: 4.0.6
  storage: dir
  storage_version: "1"

Please also show

lxc info <nested-container-name

sure:

lxc info test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954
Name: test-rico-fc9ea929-5a4b-44d8-91be-46d173d90954
Location: none
Remote: unix://
Architecture: i686
Created: 2021/06/15 12:01 UTC
Status: Stopped
Type: container
Profiles: default

Oh sorry, I meant

lxc config show <nested-container-name>

Even better would be

lxc config show --expanded <nested-container> 

and ideally the lxc.conf from the container’s log folder in the lxd log directory.

The initial post from @l33tname shows the actual config and problem I think.

			"raw.idmap": "uid 0 0\ngid 0 0\nuid 0 5000\ngid 0 5000",

This shows a raw.idmap of:

uid 0 0
gid 0 0
uid 0 5000
gid 0 5000

Which is an incorrect map. You cannot map two uid/gid to the same host id, that’s why the kernel rejects it.

Yeah, that’s why I showed you on irc. It would still be nice to see the config to figure out if there’s anything else that went wrong.

architecture: i686
config:
  image.architecture: i386
  image.build: "20210615_05:24"
  image.description: Debian buster i386 (20210615_05:24)
  image.distribution: debian
  image.name: debian-buster-i386-default-20210615_05:24
  image.os: debian
  image.release: buster
  image.serial: "20210615_05:24"
  image.variant: default
  limits.kernel.core: unlimited
  limits.kernel.memlock: unlimited
  limits.kernel.nice: "-20"
  limits.kernel.nofile: "65536"
  limits.kernel.rtprio: unlimited
  raw.idmap: |-
    uid 0 0
    gid 0 0
    uid 0 5000
    gid 0 5000
  raw.lxc: lxc.apparmor.profile=unconfined
  user.user-data: |
    #cloud-config
    apt_preserve_sources_list: true
    runcmd:
    - - systemctl
      - enable
      - net0-route-override.service
    - - systemctl
      - daemon-reload
    - - systemctl
      - start
      - net0-route-override.service
    users:
    - name: root
      shell: /bin/bash
      ssh-authorized-keys: []
      sudo:
      - ALL=(ALL) NOPASSWD:ALL
      uid: '5000'
    write_files:
    - content: |-
        [Unit]
        Description=net0 route override rule
        After=systemd-networkd.service
        Requires=systemd-networkd.service
        PartOf=systemd-networkd.service

        [Service]
        ExecStart=/bin/ip route del default via 162.132.242.252
        Type=oneshot
        [Install]
        WantedBy=multi-user.target
        Also=systemd-networkd.socket
      path: /lib/systemd/system/net0-route-override.service
  volatile.base_image: 9a1cca2266c0ae5f28b5e56f1bc2b69d2fd40b675edb1fdd9605a69efb786b0f
  volatile.eth0.host_name: vethf7c57199
  volatile.eth0.hwaddr: 00:16:3e:e2:9f:04
  volatile.eth10.host_name: veth8f96005d
  volatile.eth10.hwaddr: 00:16:3e:4c:ec:84
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":0,"Nsid":0,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000001,"Nsid":1,"Maprange":4999},{"Isuid":true,"Isgid":false,"Hostid":0,"Nsid":5000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1005001,"Nsid":5001,"Maprange":999994999},{"Isuid":false,"Isgid":true,"Hostid":0,"Nsid":0,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000001,"Nsid":1,"Maprange":4999},{"Isuid":false,"Isgid":true,"Hostid":0,"Nsid":5000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1005001,"Nsid":5001,"Maprange":999994999}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":0,"Nsid":0,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000001,"Nsid":1,"Maprange":4999},{"Isuid":true,"Isgid":false,"Hostid":0,"Nsid":5000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1005001,"Nsid":5001,"Maprange":999994999},{"Isuid":false,"Isgid":true,"Hostid":0,"Nsid":0,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000001,"Nsid":1,"Maprange":4999},{"Isuid":false,"Isgid":true,"Hostid":0,"Nsid":5000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1005001,"Nsid":5001,"Maprange":999994999}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":0,"Nsid":0,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000001,"Nsid":1,"Maprange":4999},{"Isuid":true,"Isgid":false,"Hostid":0,"Nsid":5000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1005001,"Nsid":5001,"Maprange":999994999},{"Isuid":false,"Isgid":true,"Hostid":0,"Nsid":0,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000001,"Nsid":1,"Maprange":4999},{"Isuid":false,"Isgid":true,"Hostid":0,"Nsid":5000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1005001,"Nsid":5001,"Maprange":999994999}]'
  volatile.last_state.power: STOPPED
  volatile.net0.host_name: vethb28fc644
  volatile.net0.hwaddr: 00:16:3e:e6:f5:1f
  volatile.uuid: 9bbb2ec2-57e6-4ace-a7dd-829bd04b1c78
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  eth10:
    name: eth10
    nictype: bridged
    parent: lxdbr0
    type: nic
  mount-results:
    path: results
    source: /tmp/pytest-of-root/pytest-0/test_rico_creates_logfiles0/results
    type: disk
  net0:
    name: net0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Ah, you seem to be talking about an older version of this forum post. Thanks!

Yep, that’s the issue. You can’t map both container uid 5000 and container uid 0 to host uid 0. It needs to be isomorphic.

1 Like

Yeah that makes sense. I add "raw.idmap": "uid 0 0\ngid 0 0" by config and the other two entries mapping to 5000 are generated by LXD i guess and because it runs with root on some systems it fails.

LXD doesn’t modify config keys for you, the config in Can not start nested LXD container - #14 by l33tname clearly shows that you have raw.idmap set to both map 0 and 5000 causing this issue.

1 Like

I went trough our code and found the issues. You where correct that LXD does not modify any keys.

Thanks for your help @stgraber and @brauner :tada:

Excellent :slight_smile:

1 Like