LXD >=3.19 on CentOS 7 (via COPR)

Since a few years I’m packaging LXD and the latest version of LXC as RPM for Fedora and CentOS via COPR (ganto/lxc3). While there was never a big issue to build LXD on Fedora, CentOS 7 is reaching an age where this becomes more and more of an issue (e.g. #5993). The LXD developers always seem interested in such reports and always find a way to fix things as they popped up, but lxd-3.19 now introduce new filesystem and seccomp code that cannot be built anymore against a stock CentOS 7 build root.

I still found a way to build it by patching out support for MS_LAZYMOUNT and using newer kernel headers provided by the ELRepo repository but I’m unsure about the usability of the created binaries. I superficially tested the resulting binaries on a fresh CentOS 7 and I couldn’t find anything broken on first sight at least for containers. This might be different for virtual machines because qemu on CentOS 7 doesn’t understand the generated VM configuration, but I haven’t looked closely into that yet. Generally, I’m unsure about how to continue with LXD on CentOS 7. I’m personally not using it but it was once a much requested feature.

Are there (still) users who are using my COPR repository for CentOS 7? Would you mind if I simply remove support for CentOS 7 from my COPR?

On the other side, are the developers (@stgraber, @brauner) still interested in getting “bug” reports for CentOS 7. How much depends your code on the “missing” features on a stock CentOS 7 kernel? Is there any release that you would say is still safe to run on CentOS 7 but anything newer is likely broken? Btw. the info of an LXD installed via RPM running on a CentOS 7 (using stock kernel):

# lxc info          
config:                              
  images.auto_update_interval: "0"
api_extensions:               
- storage_zfs_remove_snapshots              
- container_host_shutdown_timeout  
- container_stop_priority
- container_syscall_filtering   
- auth_pki                                                                                                             
- container_last_used_at                                                                                               
- etag                                                                                                                 
- patch                                                                                                                
- usb_devices                                                                                                          
- https_allowed_credentials                                                                                            
- image_compression_algorithm                                                                                          
- directory_manipulation                                                                                               
- container_cpu_time                                                                                                   
- storage_zfs_use_refquota                                                                                             
- storage_lvm_mount_options                                                                                            
- network                   
- profile_usedby              
- container_push                                                                                                       
- container_exec_recording
- certificate_update                                       
- container_exec_signal_handling
- gpu_devices                                                                                                                                                                                                                                 
- container_image_properties                               
- migration_progress                   
- id_map                                 
- network_firewall_filtering          
- network_routes                     
- storage                         
- file_delete                 
- file_append                               
- network_dhcp_expiry              
- storage_lvm_vg_rename  
- storage_lvm_thinpool_rename   
- network_vlan                                                                                                         
- image_create_aliases                                                                                                 
- container_stateless_copy                                                                                             
- container_only_migration                                                                                             
- storage_zfs_clone_copy                                                                                               
- unix_device_rename                                                                                                   
- storage_lvm_use_thinpool                                                                                             
- storage_rsync_bwlimit                                                                                                
- network_vxlan_interface                                                                                              
- storage_btrfs_mount_options                                                                                          
- entity_description                                                                                                   
- image_force_refresh       
- storage_lvm_lv_resizing     
- id_map_base                                                                                                          
- file_symlinks
- container_push_target                                    
- network_vlan_physical         
- storage_images_delete                                                                                                                                                                                                                       
- container_edit_metadata                                  
- container_snapshot_stateful_migration
- storage_driver_ceph                    
- storage_ceph_user_name              
- resource_limits                    
- storage_volatile_initial_source 
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs            
- resources                        
- kernel_limits          
- storage_api_volume_rename     
- macaroon_authentication                                                                                              
- network_sriov                                                                                                        
- console                                                                                                              
- restrict_devlxd                                                                                                      
- migration_pre_copy                                                                                                   
- infiniband                                                                                                           
- maas_network                                                                                                         
- devlxd_events                                                                                                        
- proxy                                                                                                                
- network_dhcp_gateway                                                                                                 
- file_get_symlink                                                                                                     
- network_leases            
- unix_device_hotplug         
- storage_api_local_volume_handling                                                                                    
- operation_description
- clustering                                               
- event_lifecycle               
- storage_api_remote_volume_handling                                                                                                                                                                                                          
- nvidia_runtime                                           
- container_mount_propagation          
- container_backup                       
- devlxd_images                       
- container_local_cross_pool_handling
- proxy_unix                      
- proxy_udp                   
- clustering_join                           
- proxy_tcp_udp_multi_port_handling
- network_state          
- proxy_unix_dac_properties     
- container_protection_delete                                                                                          
- unix_priv_drop                                                                                                       
- pprof_http                                                                                                           
- proxy_haproxy_protocol                                                                                               
- network_hwaddr                                                                                                       
- proxy_nat                                                                                                            
- network_nat_order                                                                                                    
- container_full                                                                                                       
- candid_authentication                                                                                                
- backup_compression                                                                                                   
- candid_config                                                                                                        
- nvidia_runtime_config     
- storage_api_volume_snapshots
- storage_unmapped                                                                                                     
- projects
- candid_config_key                                        
- network_vxlan_ttl             
- container_incremental_copy                                                                                                                                                                                                                  
- usb_optional_vendorid                                    
- snapshot_scheduling                  
- container_copy_project                 
- clustering_server_address           
- clustering_image_replication       
- container_protection_shift      
- snapshot_expiry             
- container_backup_override_pool            
- snapshot_expiry_creation         
- network_leases_location
- resources_cpu_socket          
- resources_gpu                                                                                                        
- resources_numa                                                                                                       
- kernel_features                                                                                                      
- id_map_current                                                                                                       
- event_location                                                                                                       
- storage_api_remote_volume_snapshots                                                                                  
- network_nat_address                                                                                                  
- container_nic_routes                                                                                                 
- rbac                                                                                                                 
- cluster_internal_copy                                                                                                
- seccomp_notify                                                                                                       
- lxc_features              
- container_nic_ipvlan        
- network_vlan_sriov                                                                                                   
- storage_cephfs
- container_nic_ipfilter                                   
- resources_v2                  
- container_exec_user_group_cwd                                                                                                                                                                                                               
- container_syscall_intercept                              
- container_disk_shift                 
- storage_shifted                        
- resources_infiniband                
- daemon_storage                     
- instances                       
- image_types                 
- resources_disk_sata                       
- clustering_roles                 
- images_expiry          
- resources_network_firmware    
- backup_compression_algorithm                                                                                         
- ceph_data_pool_name                                                                                                  
- container_syscall_intercept_mount                                                                                    
- compression_squashfs                                                                                                 
- container_raw_mount                                                                                                  
- container_nic_routed                                                                                                 
- container_syscall_intercept_mount_fuse                                                                               
- container_disk_ceph                                                                                                  
- virtual-machines                                                                                                     
- image_profiles                                                                                                       
- clustering_architecture                                                                                              
- resources_disk_id         
- storage_lvm_stripes         
- vm_boot_priority                                                                                                     
- unix_hotplug_devices
- api_filtering                                            
api_status: stable              
api_version: "1.0"                                                                                                                                                                                                                            
auth: trusted                                              
public: false                          
auth_methods:                            
- tls                                 
environment:                         
  addresses: []                   
  architectures:              
  - x86_64                                  
  - i686                           
  certificate: | [...]
  certificate_fingerprint: [...]
  driver: lxc                                                                                                                                                                                                                                 
  driver_version: 3.2.1                                    
  kernel: Linux                 
  kernel_architecture: x86_64                                                                                                                                                                                                                 
  kernel_features:                                         
    netnsid_getifaddrs: "false"        
    seccomp_listener: "false"            
    seccomp_listener_continue: "false"
    shiftfs: "false"                 
    uevent_injection: "false"     
    unpriv_fscaps: "true"     
  kernel_version: 3.10.0-1062.9.1.el7.x86_64
  lxc_features:                    
    cgroup2: "false"     
    mount_injection_file: "true"
    network_gateway_device_route: "true"                                                                               
    network_ipvlan: "true"                                                                                             
    network_l2proxy: "true"                                                                                            
    network_phys_macvlan_mtu: "true"                                                                                   
    network_veth_router: "true"                                                                                        
    seccomp_notify: "true"                                                                                             
  project: default                                                                                                     
  server: lxd                                                                                                          
  server_clustered: false                                                                                              
  server_name: localhost.localdomain                                                                                   
  server_pid: 2042                                                                                                     
  server_version: "3.20"    
  storage: dir                
  storage_version: "1"

Thanks a lot for your feedback.
Cheers, ganto

2 Likes

This looks quite fixable.
Rather than patching out MS_LAZYTIME you can probably get away with just defining it in the C part with:

#ifndef MS_LAZYTIME
#define MS_LAZYTIME     (1<<25)
#endif

This would be suitable for inclusion in LXD for sure.

The go-md2man is probably best kept as a patch. We’ve run into that problem in other environments too but we’d rather not mangle the code coming from external repositories.

For the net_test.go part, we could add a check to see whether the name can be resolved on the system and skip if it can’t, that would take care of this issue.

If you can confirm that such fixes work for you, then we’d be quite happy to include them.

Thanks a lot for you fast response. I guess you’re still interested in feedback from CentOS 7 then:

  • Regarding the MS_LAZYTIME, I already sent a PR (#6825)
  • The go-md2man issue was annoying to figure out, but now that I have the patch no problem
  • I’m not sure what’s going with the localhost lookup in net_test.go. I tried to debug it for a moment. First I was unsure if LookupHost() even considers /etc/hosts entries. But I couldn’t find any related DNS requests, so I guess it does. The Fedora/CentOS hosts file properly defines localhost in the hosts file (although not ip6-localhost) but the corresponding test is still failing. I’ll open a Github issue with the details. Maybe you have some ideas.
  • My biggest concern is actually that lxd/main_checkfeature.go includes linux/kcmp.h. The kernel-headers-3.10.0 package in CentOS 7 doesn’t ship this header. So I was building against the kernel-lt-headers-4.4 package from the ELRepo. I understood your concept to gracefully disable features and code paths in LXD in case the necessary kernel support is missing. On the other side I once learned that it’s a bad idea to build against “wrong” kernel headers. Do I need to force people to use (at least) kernel-4.4 to run this LXD binary on CentOS 7 or is it still safe to build it like this and run it on a stock kernel and some features are then properly detected to be missing?
  • As said before, there are also some incompatibilities with qemu in CentOS 7 (Fedora works fine :smiley:) and the default configuration used by LXD. I still have to investigate if this can be mitigated by a customized profile or if this needs some code adjustments to. I’ll try that in the coming weeks.

For the checkfeature bit, we may be able to do something around that using some related defines and ifdefs to only include the header if it’s likely to exist. In general building with more recent kernel headers should be fine.

We build the snap using either 4.4 or 4.15 headers and then run the result on anything from stock Centos 7 kernel all the way to the latest development versions of various distributions.

Thanks for this reply. This makes me feel more comfortable to build the RPMs that can be used by the public.

In the meantime I figured out and reported the localhost test issue under #6842.

The kcmp.h header issue is reported under #6843.

Thanks a lot for your support. I now released the lxd-3.20 RPMs (including the patches added upstream to fix the CentOS 7 build) on COPR so that everyone can give it a try.

1 Like

@ganto, thanks for your work!

Is the LXD COPR package available and solid on CentOS 8?

Would you recommend this LXD deployment method on production server?

I’m inclined to use a rpm to deploy LXD on CentOS 8, rather than snap - just to have less moving parts.

But what do you think would be the more stable option for a production server for a web application (deployed via LXD and with Ubuntu running inside container)?

(as you can see I am not familiar with COPR)

To be honest, in a production setup I would run LXD on Ubuntu. That’s what the developers tested and developed it for.

I’m only using LXD on Fedora for quickly trying out stuff in different distribution containers. The CentOS RPMs are built because people asked for it and because I can. I’ve never thoroughly tested it on CentOS nor can I guarantee that there ever will be an update in the future. As long as I’m using it personally and as long as my spare time permits, I’m willing to invest the effort to package it. But during the more than three years I’m doing this now no one ever volunteered to help me with it. That’s why I’m happy with COPR and always rejected the requests to move it to the official Fedora or EPEL repositories (see ganto/copr-lxd #6 and ganto/copr-lxc3 #14).

Generally I can say that LXD is super stable on multiple one node setups where I’m using it (never tried the cluster mode). And if there were issues with new releases upstream was super helpful to iron out the rough edges. But e.g. regarding the brand new virtual machine support I already found a lot of incompatibilities on CentOS because QEMU is simply too old and/or not compiled with all the features required by LXD. Containers (also rootless) however should work fine.

If I were you, use Ubuntu if you can or if you’re strictly CentOS oriented, have a look at Podman with host directory volumes for persistent data.

2 Likes

So one thing that’s worth noting here is that while we indeed develop and run most of our CI on Ubuntu, we actually have about as many users (actually slightly more) on non-Ubuntu than on Ubuntu. The bulk of that is Chromebooks which run ChromeOS and use a variant of the Gentoo native package of LXD on a custom kernel, but we also have a non-negligible user base on Debian, Arch, CentOS and Gentoo.

Making sure LXD works with a variety of Linux kernels is quite important to us and we make sure that LXD detects available features and degrades cleanly as needed.

1 Like