Some Question about LXD

lbg74 · May 3, 2020, 6:49am

Hi.

As I was testing on LXD, I have some questions and want to ask a question.

What does the security.idmap.isolated option mean in the operating environment?
By setting this option and “/ etc / sub {u, g} id”, restarting the lxd daemon, and creating or starting containers and virtual machines, i can see that each container and virtual machine has a different owner.
Please explain the advantages from a security standpoint as each container has a different owner.
What are the advantages of using shiftfs besides saving boot time on container startup?
In the case of shiftfs, i know that the kernel version is supported by 5 or higher. The host is rhel 8.2 and the kernel version is 4.x. Is there a way to use shiftfs in centos?
(Moving up the centos kernel version to the latest 5.6 does not load shiftfs with modprobe shiftfs.)
Please tell me the minimum kernel version to use LXD’s system call interception feature.
When lxc info in centos7 version, it was confirmed that seccomp is not supported, and in kernel version 5.x, it was supported.

Thank you for your response.

stgraber · May 3, 2020, 11:07pm

This prevents any cross-container attacks, typically DoS based on kuid/kgid
It allows for volume/path sharing between isolated containers
You would need to build it using the dkms version, shiftfs isn’t in the mainline kernel
5.3 or so I believe, possibly 5.4 for all the bits needed by mount interception

lbg74 · May 4, 2020, 12:15am

Thank you Stephane Graber.

Let me ask you some questions.

You answered that shiftfs can share volume / path between isolated containers.
Does this mean the same concept as volume sharing in Docker containers?
or does it mean that 2 containers can see the same volume at the same time, like gpfs?
I would like to do a test, but please give me an approximate procedure to follow.
If you have any data on how to build shiftfs using the dkms version, please share it.
Can I use npiv (HBA virtualization) in LXD?
I checked that mount interception works well through testing.
What are some useful syscall interceptions?
Please give me some guidelines to test useful interception.

Thank you

stgraber · May 4, 2020, 1:46pm

lxc storage volume create default my-volume
lxc storage volume attach default my-volume my-container1 blah /blah
lxc storage volume attach default my-volume my-container2 blah /blah

https://github.com/toby63/shiftfs-dkms
I’m not familiar with this, but I suspect it would require namespacing of device mapper and block devices to be able to dynamically show up in a container, which isn’t a thing the kernel supports at this point. We’re doing some work on block devices in containers through upstream kernel work on loopfs at this point, which may set precedence to allow more block handling, but we’re likely years away from having much more available there.
setxattr/mknod are used primarily to trun Docker containers, main use of mount interception is to allow mounting trusted block devices or setup redirection of mount over to FUSE.

lbg74 · May 4, 2020, 2:37pm

Thank you for your answer.

After enabling lxd’s shiftfs function, I checked that the created volume is shared by 2 containers.

However, after disabling the shiftfs function of lxd, I did the same test.
The result is that the created volume is shared by 2 containers, regardless of whether shiftfs is enabled or not.

I created a file in one container for a shared volume, and deleted a file created in another container. It worked normally.

Am I misunderstanding your words, or did I do the wrong test?

Shiftfs is supported from the kernel version as shown below.

stgraber · May 5, 2020, 1:01am

If your two containers are isolated then LXD would have refused to attach your volume because the uid/gid range wouldn’t have any overlap.

lbg74 · May 5, 2020, 6:31am

Thanks stgraber.

First look at the picture below.

When the privileged mode of the two containers is true, the volume sharing of the two containers is successful for the volume regardless of whether the shiftfs function is activated.
If the privileged mode of two containers is false while the shifts function is enabled, an error occurs as shown in the figure above.

Please explain to me.

stgraber · May 5, 2020, 1:42pm

Privileged containers do not use uid/gid maps, that’s what makes them privileged, so shiftfs is irrelevant in that case.

Unprivileged containers can be isolated, making them use separate maps and indeed failing as above.

So it’s all correct so far. Now in the unprivileged case, if your two containers were created on a system where shiftfs is enabled lxc info | grep shiftfs, then the sharing would be allowed too.

lbg74 · May 5, 2020, 2:04pm

Thanks Stgraber.

Because the memory is not correct, I tried to share the volume again by deleting the containter and re-creating 2 containers with shiftfs enabled.

The result failed with the same error.

stgraber · May 5, 2020, 10:55pm

What does lxc info show you?

lbg74 · May 6, 2020, 12:10am

Thanks stgraber for your answer.

root@u2:~# lxc info

config:
core.https_address: 192.168.122.202
core.trust_password: true
api_extensions:

storage_zfs_remove_snapshots
container_host_shutdown_timeout
container_stop_priority
container_syscall_filtering
auth_pki
container_last_used_at
etag
patch
usb_devices
https_allowed_credentials
image_compression_algorithm
directory_manipulation
container_cpu_time
storage_zfs_use_refquota
storage_lvm_mount_options
network
profile_usedby
container_push
container_exec_recording
certificate_update
container_exec_signal_handling
gpu_devices
container_image_properties
migration_progress
id_map
network_firewall_filtering
network_routes
storage
file_delete
file_append
network_dhcp_expiry
storage_lvm_vg_rename
storage_lvm_thinpool_rename
network_vlan
image_create_aliases
container_stateless_copy
container_only_migration
storage_zfs_clone_copy
unix_device_rename
storage_lvm_use_thinpool
storage_rsync_bwlimit
network_vxlan_interface
storage_btrfs_mount_options
entity_description
image_force_refresh
storage_lvm_lv_resizing
id_map_base
file_symlinks
container_push_target
network_vlan_physical
storage_images_delete
container_edit_metadata
container_snapshot_stateful_migration
storage_driver_ceph
storage_ceph_user_name
resource_limits
storage_volatile_initial_source
storage_ceph_force_osd_reuse
storage_block_filesystem_btrfs
resources
kernel_limits
storage_api_volume_rename
macaroon_authentication
network_sriov
console
restrict_devlxd
migration_pre_copy
infiniband
maas_network
devlxd_events
proxy
network_dhcp_gateway
file_get_symlink
network_leases
unix_device_hotplug
storage_api_local_volume_handling
operation_description
clustering
event_lifecycle
storage_api_remote_volume_handling
nvidia_runtime
container_mount_propagation
container_backup
devlxd_images
container_local_cross_pool_handling
proxy_unix
proxy_udp
clustering_join
proxy_tcp_udp_multi_port_handling
network_state
proxy_unix_dac_properties
container_protection_delete
unix_priv_drop
pprof_http
proxy_haproxy_protocol
network_hwaddr
proxy_nat
network_nat_order
container_full
candid_authentication
backup_compression
candid_config
nvidia_runtime_config
storage_api_volume_snapshots
storage_unmapped
projects
candid_config_key
network_vxlan_ttl
container_incremental_copy
usb_optional_vendorid
snapshot_scheduling
container_copy_project
clustering_server_address
clustering_image_replication
container_protection_shift
snapshot_expiry
container_backup_override_pool
snapshot_expiry_creation
network_leases_location
resources_cpu_socket
resources_gpu
resources_numa
kernel_features
id_map_current
event_location
storage_api_remote_volume_snapshots
network_nat_address
container_nic_routes
rbac
cluster_internal_copy
seccomp_notify
lxc_features
container_nic_ipvlan
network_vlan_sriov
storage_cephfs
container_nic_ipfilter
resources_v2
container_exec_user_group_cwd
container_syscall_intercept
container_disk_shift
storage_shifted
resources_infiniband
daemon_storage
instances
image_types
resources_disk_sata
clustering_roles
images_expiry
resources_network_firmware
backup_compression_algorithm
ceph_data_pool_name
container_syscall_intercept_mount
compression_squashfs
container_raw_mount
container_nic_routed
container_syscall_intercept_mount_fuse
container_disk_ceph
virtual-machines
image_profiles
clustering_architecture
resources_disk_id
storage_lvm_stripes
vm_boot_priority
unix_hotplug_devices
api_filtering
instance_nic_network
clustering_sizing
firewall_driver
projects_limits
container_syscall_intercept_hugetlbfs
limits_hugepages
container_nic_routed_gateway
projects_restrictions
custom_volume_snapshot_expiry
volume_snapshot_scheduling
trust_ca_certificates
snapshot_disk_usage
clustering_edit_roles
container_nic_routed_host_address
container_nic_ipvlan_gateway
resources_usb_pci
resources_cpu_threads_numa
resources_cpu_core_die
api_os
resources_system
api_status: stable
api_version: “1.0”
auth: trusted
public: false
auth_methods:
tls
environment:
addresses:
- 192.168.122.202:8443
  architectures:
- x86_64
- i686
  certificate: |
  -----BEGIN CERTIFICATE-----
  MIIB+TCCAX6gAwIBAgIRAJC5QhpZUl2npCeIL7zbe84wCgYIKoZIzj0EAwMwMDEc
  MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEQMA4GA1UEAwwHcm9vdEB1MjAe
  Fw0yMDA1MDMxNjM0MzRaFw0zMDA1MDExNjM0MzRaMDAxHDAaBgNVBAoTE2xpbnV4
  Y29udGFpbmVycy5vcmcxEDAOBgNVBAMMB3Jvb3RAdTIwdjAQBgcqhkjOPQIBBgUr
  gQQAIgNiAATZAPVllv6/03DAV6o25Hknd248PRVCjnjdQL2mlL+vVuQs+7tJBlkr
  7M+IYa6mPUyrxQTwWiYtXiCr+c/LuaN3+1HsTvvAGmYGRG4KqEZI7nubxRFnI4eg
  6yhDSe/9kJejXDBaMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcD
  ATAMBgNVHRMBAf8EAjAAMCUGA1UdEQQeMByCAnUyhwR/AAABhxAAAAAAAAAAAAAA
  AAAAAAABMAoGCCqGSM49BAMDA2kAMGYCMQDz+0K6qQHkXWRxXYTzndT+RZeK1rv5
  YCyhu8isYsds6WauWeSzQ+Ki9InaMqs80icCMQDfAWReukzSXXryPh611f78vrZg
  FrMGIiT8VLgxgUt2G3nZNxHFLGUKRUq4jz2nTj8=
  -----END CERTIFICATE-----
  certificate_fingerprint:
  a77cc96d49549568cf133eeae7179a8772e18a202971c116fdbd76c2f5c7fb25
  driver: lxc
  driver_version: 4.0.2
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
  netnsid_getifaddrs: “true”
  seccomp_listener: “true”
  seccomp_listener_continue: “true”
  shiftfs: “true”
  uevent_injection: “true”
  unpriv_fscaps: “true”
  kernel_version: 5.4.0-29-generic
  lxc_features:
  cgroup2: “true”
  mount_injection_file: “true”
  network_gateway_device_route: “true”
  network_ipvlan: “true”
  network_l2proxy: “true”
  network_phys_macvlan_mtu: “true”
  network_veth_router: “true”
  seccomp_notify: “true”
  os_name: Ubuntu
  os_version: “20.04”
  project: default
  server: lxd
  server_clustered: false
  server_name: u2
  server_pid: 1043
  server_version: 4.0.1
  storage: btrfs
  storage_version: 4.15.1

stgraber · May 6, 2020, 12:16am

Oh, I forgot, you’ll need to set security.shifted=true on the volume (I think that’s the config option name).

lbg74 · May 6, 2020, 12:37am

Thank you very much Stgraber.

shared volume working fine^^