Lxd daemon high cpu usage processing inotify watch on /dev/net/tun

Can you also look in /var/snap/lxd/common/lxd/logs/lxd.log for a message like Failed to initialize fanotify, falling back on fsnotify

Because if you can make your system offer fanotify then you wont have the same issue as it doesn’t look for MODIFY events

its debian 11.5 bullseye, kernel 5.10.0-19-amd64

lxc info:

config:
  cluster.https_address: qbs01:8443
  core.https_address: qbs01:8443
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source                                                                                                                                                       [305/41265]
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool                                                                                                                                                        [244/41265]
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles                                                                                                                                                                 [183/41265]
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update                                                                                                                                                           [122/41265]
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication                                                                                                                                                                 [61/41265]
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - qbs01:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICADCCAYegAwIBAgIRAKsjuP4UkKOM3yDYb//yNQkwCgYIKoZIzj0EAwMwMzEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzETMBEGA1UEAwwKcm9vdEBxYnMw
    MTAeFw0yMjExMDMwODE5NDFaFw0zMjEwMzEwODE5NDFaMDMxHDAaBgNVBAoTE2xp
    bnV4Y29udGFpbmVycy5vcmcxEzARBgNVBAMMCnJvb3RAcWJzMDEwdjAQBgcqhkjO
    PQIBBgUrgQQAIgNiAAQxEUeY2Yyj4kPxZ8Lo95JhDE2DR+SHgx37o+S4UIKvlINV
    hfUdL3PH7pFOow2Gn5LC3R3tM9e3AwnaB1ahAF+yJIUDVh1TFuApFc3JtXeShXTL
    3oGunVbwYhaHIjcSSLWjXzBdMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggr
    BgEFBQcDATAMBgNVHRMBAf8EAjAAMCgGA1UdEQQhMB+CBXFiczAxhwR/AAABhxAA
    AAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2cAMGQCMDbVCeEsIQAWpX7GCFIe
    IZ+840r/vapLX/WTZzj1+FaC4+dhDMOLa/4gce5ieQgtiwIwEW0RmGh8x3JaiPMT
    MznMy4s+bS6HSuwDWicVLY83jUtJZsMg8TWeN3EyI1fh2XuZ
    -----END CERTIFICATE-----
  certificate_fingerprint: 84f825d06b5db6935feab6edfdd8ccd1633839c0ed90cd30946459837b7a427e
  driver: lxc | qemu
  driver_version: 5.0.1 | 7.1.0
  firewall: nftables                                                                                                                                                                      [0/41265]
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "false"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.10.0-19-amd64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "11"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: qbs01
  server_pid: 4024973
  server_version: "5.7"
  storage: zfs
  storage_version: 2.0.3-9
  storage_supported_drivers:
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: ceph
    version: 15.2.16
    remote: true
  - name: cephfs
    version: 15.2.16
    remote: true
  - name: cephobject
    version: 15.2.16
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.43.0
    remote: false
  - name: zfs
    version: 2.0.3-9
    remote: false

indeed there is an error message in the log

$ sudo fgrep notify /var/snap/lxd/common/lxd/logs/lxd.log
time="2022-11-23T14:33:41+01:00" level=warning msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"

but, fanotify seems to be compiled in the kernel, why is lxd complaining then?

$ sudo cat /boot/config-$(uname -r) | grep FANOTIFY
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y

If /dev doesn’t exist, falling back to fsnotify will probably not help, is it?

I can recreate the problem inside a LXD VM using images:debian/11 and /dev does exist.

The error doesn’t mean /dev doesn’t exist, sometimes syscalls return errors for things like “not found” but they come across as giving a different meaning when out of context.

1 Like

FYI, I’m using Ubuntu on the host and in the containers and I used to get that error too:

root@log:~# grep 'Failed to initialize fanotify' /var/log/archives/2022/2022-*syslog
/var/log/archives/2022/2022-01-syslog:Jan  4 20:53:28 ocelot lxd.daemon[1603]: t=2022-01-04T20:53:28+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan  4 22:45:09 ocelot lxd.daemon[1623]: t=2022-01-04T22:45:09+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan  6 22:28:45 ocelot lxd.daemon[1626]: t=2022-01-06T22:28:45+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan  9 20:05:47 xeon lxd.daemon[3892352]: t=2022-01-09T20:05:47+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan  9 20:54:49 c2d lxd.daemon[541879]: t=2022-01-09T20:54:49+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan  9 20:54:24 ocelot lxd.daemon[3789341]: t=2022-01-09T20:54:24+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan 11 16:11:09 ocelot lxd.daemon[1613]: t=2022-01-11T16:11:09+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"
/var/log/archives/2022/2022-01-syslog:Jan 13 17:42:13 ocelot lxd.daemon[1633]: t=2022-01-13T17:42:13+0000 lvl=warn msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to watch directory \"/dev\": no such device"

Those are from January and I was probably using 20.04 at that time, while I’m now using 22.04.

Yeah so seems like a kernel thing then. fannotify isn’t new, but LXD uses FAN_REPORT_DFID_NAME which only became available in LXD 5.9. That explains Focal then.

https://man7.org/linux/man-pages/man2/fanotify_init.2.html

But not Debian, which is using 5.10, unless it didn’t get enabled/included.

Back in January, I was running 20.04 with the then current HWE kernel 5.11.0 :confused:

I dont know, looks like we are needing something more recent.

@monstermunchkin may be able to shed some light on this as he added that functionality.

I’ll look into this. I don’t remember why FAN_REPORT_DFID_NAME had to be used.

1 Like

Most likely ENODEV comes from here:

It’s only required when one of FAN_REPORT_FID | FAN_REPORT_DFID_NAME bits is set.

It means that filesystem which is mounted on /dev is not supporting exportfs operations (open_by_handle_at/name_to_handle_at syscalls).

@Aleks, @sdeziel couldn’t you show mount | grep "/dev" from your systems?

Yeah I was thinking that on IRC last night with @stgraber as /dev is devtmpfs

1 Like

On my system /dev is also devtmpfs but fanotifies are working well.

The reason that I have this patch in my kernel:
https://lore.kernel.org/all/20210322173944.449469-3-amir73il@gmail.com/

It enables shmem/tmpfs/devtmpfs filesystems to report statfs f_fsid appropriately.

1 Like

Right, so that is why we need a recent kernel to support fanotify, thanks.

On your small test program above, what PATH were using?

LXD appears to only add watchers on directories, so /dev/net and not directly on /dev/net/tun.

So I’m wondering why listening for MODIFY watchers on a directory also gets events when files inside that directory get written to.

Were you watching /dev/net/tun directly?

Can you try and recreate the issue using LXD rather than your own test program please?
I’ve not got openvpn setup, but have just tried writing to a block device under /dev and with fsnotify and do not see the same issue.

Were you watching /dev/net/tun directly?

I’ve played with different options /dev/net, /dev/net/tun. The same result.

Edit: /dev is not working with my simple program

Ah I have logged the events and can see writes to /dev/tty now. So yes I see the problem.

So I’m wondering why listening for MODIFY watchers on a directory also gets events when files inside that directory get written to.

As far as I understand, if inotify watch is set on the directory then it looks for events for all files/directories in this directory, but only for 1 level. So, if watch is set to /dev it will not see /dev/net/tun, but we are recursively adding watches for nested directories:
https://github.com/lxc/lxd/blob/master/lxd/fsmonitor/drivers/driver_fsnotify.go#L70