Some Question about LXD

Hi.

As I was testing on LXD, I have some questions and want to ask a question.

  1. What does the security.idmap.isolated option mean in the operating environment?
    By setting this option and “/ etc / sub {u, g} id”, restarting the lxd daemon, and creating or starting containers and virtual machines, i can see that each container and virtual machine has a different owner.
    Please explain the advantages from a security standpoint as each container has a different owner.

  2. What are the advantages of using shiftfs besides saving boot time on container startup?

  3. In the case of shiftfs, i know that the kernel version is supported by 5 or higher. The host is rhel 8.2 and the kernel version is 4.x. Is there a way to use shiftfs in centos?
    (Moving up the centos kernel version to the latest 5.6 does not load shiftfs with modprobe shiftfs.)

  4. Please tell me the minimum kernel version to use LXD’s system call interception feature.
    When lxc info in centos7 version, it was confirmed that seccomp is not supported, and in kernel version 5.x, it was supported.

Thank you for your response.

  1. This prevents any cross-container attacks, typically DoS based on kuid/kgid
  2. It allows for volume/path sharing between isolated containers
  3. You would need to build it using the dkms version, shiftfs isn’t in the mainline kernel
  4. 5.3 or so I believe, possibly 5.4 for all the bits needed by mount interception

Thank you Stephane Graber.

Let me ask you some questions.

  1. You answered that shiftfs can share volume / path between isolated containers.
    Does this mean the same concept as volume sharing in Docker containers?
    or does it mean that 2 containers can see the same volume at the same time, like gpfs?
    I would like to do a test, but please give me an approximate procedure to follow.

  2. If you have any data on how to build shiftfs using the dkms version, please share it.

  3. Can I use npiv (HBA virtualization) in LXD?

  4. I checked that mount interception works well through testing.
    What are some useful syscall interceptions?
    Please give me some guidelines to test useful interception.

Thank you

  • lxc storage volume create default my-volume
  • lxc storage volume attach default my-volume my-container1 blah /blah
  • lxc storage volume attach default my-volume my-container2 blah /blah
  1. https://github.com/toby63/shiftfs-dkms

  2. I’m not familiar with this, but I suspect it would require namespacing of device mapper and block devices to be able to dynamically show up in a container, which isn’t a thing the kernel supports at this point. We’re doing some work on block devices in containers through upstream kernel work on loopfs at this point, which may set precedence to allow more block handling, but we’re likely years away from having much more available there.

  3. setxattr/mknod are used primarily to trun Docker containers, main use of mount interception is to allow mounting trusted block devices or setup redirection of mount over to FUSE.

Thank you for your answer.

After enabling lxd’s shiftfs function, I checked that the created volume is shared by 2 containers.

However, after disabling the shiftfs function of lxd, I did the same test.
The result is that the created volume is shared by 2 containers, regardless of whether shiftfs is enabled or not.

I created a file in one container for a shared volume, and deleted a file created in another container. It worked normally.

Am I misunderstanding your words, or did I do the wrong test?

Shiftfs is supported from the kernel version as shown below.

image

If your two containers are isolated then LXD would have refused to attach your volume because the uid/gid range wouldn’t have any overlap.

Thanks stgraber.

First look at the picture below.

  1. When the privileged mode of the two containers is true, the volume sharing of the two containers is successful for the volume regardless of whether the shiftfs function is activated.

  2. If the privileged mode of two containers is false while the shifts function is enabled, an error occurs as shown in the figure above.

Please explain to me.

Privileged containers do not use uid/gid maps, that’s what makes them privileged, so shiftfs is irrelevant in that case.

Unprivileged containers can be isolated, making them use separate maps and indeed failing as above.

So it’s all correct so far. Now in the unprivileged case, if your two containers were created on a system where shiftfs is enabled lxc info | grep shiftfs, then the sharing would be allowed too.

Thanks Stgraber.

Because the memory is not correct, I tried to share the volume again by deleting the containter and re-creating 2 containers with shiftfs enabled.

The result failed with the same error.

What does lxc info show you?

1 Like

Thanks stgraber for your answer.

root@u2:~# lxc info

config:
core.https_address: 192.168.122.202
core.trust_password: true
api_extensions:

  • storage_zfs_remove_snapshots
  • container_host_shutdown_timeout
  • container_stop_priority
  • container_syscall_filtering
  • auth_pki
  • container_last_used_at
  • etag
  • patch
  • usb_devices
  • https_allowed_credentials
  • image_compression_algorithm
  • directory_manipulation
  • container_cpu_time
  • storage_zfs_use_refquota
  • storage_lvm_mount_options
  • network
  • profile_usedby
  • container_push
  • container_exec_recording
  • certificate_update
  • container_exec_signal_handling
  • gpu_devices
  • container_image_properties
  • migration_progress
  • id_map
  • network_firewall_filtering
  • network_routes
  • storage
  • file_delete
  • file_append
  • network_dhcp_expiry
  • storage_lvm_vg_rename
  • storage_lvm_thinpool_rename
  • network_vlan
  • image_create_aliases
  • container_stateless_copy
  • container_only_migration
  • storage_zfs_clone_copy
  • unix_device_rename
  • storage_lvm_use_thinpool
  • storage_rsync_bwlimit
  • network_vxlan_interface
  • storage_btrfs_mount_options
  • entity_description
  • image_force_refresh
  • storage_lvm_lv_resizing
  • id_map_base
  • file_symlinks
  • container_push_target
  • network_vlan_physical
  • storage_images_delete
  • container_edit_metadata
  • container_snapshot_stateful_migration
  • storage_driver_ceph
  • storage_ceph_user_name
  • resource_limits
  • storage_volatile_initial_source
  • storage_ceph_force_osd_reuse
  • storage_block_filesystem_btrfs
  • resources
  • kernel_limits
  • storage_api_volume_rename
  • macaroon_authentication
  • network_sriov
  • console
  • restrict_devlxd
  • migration_pre_copy
  • infiniband
  • maas_network
  • devlxd_events
  • proxy
  • network_dhcp_gateway
  • file_get_symlink
  • network_leases
  • unix_device_hotplug
  • storage_api_local_volume_handling
  • operation_description
  • clustering
  • event_lifecycle
  • storage_api_remote_volume_handling
  • nvidia_runtime
  • container_mount_propagation
  • container_backup
  • devlxd_images
  • container_local_cross_pool_handling
  • proxy_unix
  • proxy_udp
  • clustering_join
  • proxy_tcp_udp_multi_port_handling
  • network_state
  • proxy_unix_dac_properties
  • container_protection_delete
  • unix_priv_drop
  • pprof_http
  • proxy_haproxy_protocol
  • network_hwaddr
  • proxy_nat
  • network_nat_order
  • container_full
  • candid_authentication
  • backup_compression
  • candid_config
  • nvidia_runtime_config
  • storage_api_volume_snapshots
  • storage_unmapped
  • projects
  • candid_config_key
  • network_vxlan_ttl
  • container_incremental_copy
  • usb_optional_vendorid
  • snapshot_scheduling
  • container_copy_project
  • clustering_server_address
  • clustering_image_replication
  • container_protection_shift
  • snapshot_expiry
  • container_backup_override_pool
  • snapshot_expiry_creation
  • network_leases_location
  • resources_cpu_socket
  • resources_gpu
  • resources_numa
  • kernel_features
  • id_map_current
  • event_location
  • storage_api_remote_volume_snapshots
  • network_nat_address
  • container_nic_routes
  • rbac
  • cluster_internal_copy
  • seccomp_notify
  • lxc_features
  • container_nic_ipvlan
  • network_vlan_sriov
  • storage_cephfs
  • container_nic_ipfilter
  • resources_v2
  • container_exec_user_group_cwd
  • container_syscall_intercept
  • container_disk_shift
  • storage_shifted
  • resources_infiniband
  • daemon_storage
  • instances
  • image_types
  • resources_disk_sata
  • clustering_roles
  • images_expiry
  • resources_network_firmware
  • backup_compression_algorithm
  • ceph_data_pool_name
  • container_syscall_intercept_mount
  • compression_squashfs
  • container_raw_mount
  • container_nic_routed
  • container_syscall_intercept_mount_fuse
  • container_disk_ceph
  • virtual-machines
  • image_profiles
  • clustering_architecture
  • resources_disk_id
  • storage_lvm_stripes
  • vm_boot_priority
  • unix_hotplug_devices
  • api_filtering
  • instance_nic_network
  • clustering_sizing
  • firewall_driver
  • projects_limits
  • container_syscall_intercept_hugetlbfs
  • limits_hugepages
  • container_nic_routed_gateway
  • projects_restrictions
  • custom_volume_snapshot_expiry
  • volume_snapshot_scheduling
  • trust_ca_certificates
  • snapshot_disk_usage
  • clustering_edit_roles
  • container_nic_routed_host_address
  • container_nic_ipvlan_gateway
  • resources_usb_pci
  • resources_cpu_threads_numa
  • resources_cpu_core_die
  • api_os
  • resources_system
    api_status: stable
    api_version: “1.0”
    auth: trusted
    public: false
    auth_methods:
  • tls
    environment:
    addresses:
    • 192.168.122.202:8443
      architectures:
    • x86_64
    • i686
      certificate: |
      -----BEGIN CERTIFICATE-----
      MIIB+TCCAX6gAwIBAgIRAJC5QhpZUl2npCeIL7zbe84wCgYIKoZIzj0EAwMwMDEc
      MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEQMA4GA1UEAwwHcm9vdEB1MjAe
      Fw0yMDA1MDMxNjM0MzRaFw0zMDA1MDExNjM0MzRaMDAxHDAaBgNVBAoTE2xpbnV4
      Y29udGFpbmVycy5vcmcxEDAOBgNVBAMMB3Jvb3RAdTIwdjAQBgcqhkjOPQIBBgUr
      gQQAIgNiAATZAPVllv6/03DAV6o25Hknd248PRVCjnjdQL2mlL+vVuQs+7tJBlkr
      7M+IYa6mPUyrxQTwWiYtXiCr+c/LuaN3+1HsTvvAGmYGRG4KqEZI7nubxRFnI4eg
      6yhDSe/9kJejXDBaMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcD
      ATAMBgNVHRMBAf8EAjAAMCUGA1UdEQQeMByCAnUyhwR/AAABhxAAAAAAAAAAAAAA
      AAAAAAABMAoGCCqGSM49BAMDA2kAMGYCMQDz+0K6qQHkXWRxXYTzndT+RZeK1rv5
      YCyhu8isYsds6WauWeSzQ+Ki9InaMqs80icCMQDfAWReukzSXXryPh611f78vrZg
      FrMGIiT8VLgxgUt2G3nZNxHFLGUKRUq4jz2nTj8=
      -----END CERTIFICATE-----
      certificate_fingerprint:
      a77cc96d49549568cf133eeae7179a8772e18a202971c116fdbd76c2f5c7fb25
      driver: lxc
      driver_version: 4.0.2
      firewall: xtables
      kernel: Linux
      kernel_architecture: x86_64
      kernel_features:
      netnsid_getifaddrs: “true”
      seccomp_listener: “true”
      seccomp_listener_continue: “true”
      shiftfs: “true”
      uevent_injection: “true”
      unpriv_fscaps: “true”
      kernel_version: 5.4.0-29-generic
      lxc_features:
      cgroup2: “true”
      mount_injection_file: “true”
      network_gateway_device_route: “true”
      network_ipvlan: “true”
      network_l2proxy: “true”
      network_phys_macvlan_mtu: “true”
      network_veth_router: “true”
      seccomp_notify: “true”
      os_name: Ubuntu
      os_version: “20.04”
      project: default
      server: lxd
      server_clustered: false
      server_name: u2
      server_pid: 1043
      server_version: 4.0.1
      storage: btrfs
      storage_version: 4.15.1

Oh, I forgot, you’ll need to set security.shifted=true on the volume (I think that’s the config option name).

Thank you very much Stgraber.

shared volume working fine^^