Setting up IncusOS cluster with a separate internal network

ykazakov · March 21, 2026, 10:14am

I am trying to configure an IncusOS cluster with the cluster network cluster.https_address that is different from the API network core.https_address. I am trying to follow this tutorial, but have some problems with validation of x509 certificates. All servers are configured with one public IPv4 network 10.0.0.1/24 that is used for Incus clients and another private network fd00:10::1/64 that I want to use for internal cluster communication. (The server names and IP addresses have been modified)

% incus cluster join my-cluster: server2:
What IP address or DNS name should be used to reach this server? [default=10.0.0.12]: fd00:10::12
What member name should be used to identify this server in the cluster? [default=4c4c4544-0043-5410-8033-c8c04f503034]: server2
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Error connecting to existing cluster member "[fd00:10::11]:8443": Get "https://[fd00:10::11]:8443": Unable to connect to: [fd00:10::11]:8443 ([dial tcp [fd00:10::11]:8443: i/o timeout])
Error: Failed to join cluster: Failed to setup cluster trust: Failed to add server cert to cluster: Post "https://[fd00:10::11]:8443/1.0/certificates": tls: failed to verify certificate: x509: cannot validate certificate for fd00:10::11 because it doesn't contain any IP SANs

How do I resolve this problem? Do I need to provide certificates for the internal addresses with the installation seed?

Initially, I was trying to crate a cluster using operations center, but this also fails when I try to use a network with role cluster that is different from the network with role management. I am not sure how else to specify which network should be used for cluster.https_address.

Since I cannot change cluster.https_address after the cluster is created, I need to provide the final settings during creating of the cluster.

ykazakov · March 21, 2026, 10:30am

So, on my first node server1 I tried to change core.https_address from the default value :8443 to the IP of the node 10.0.0.11:8443 and now I get a different error when trying to join the cluster:

Error: Failed to join cluster: Failed to setup cluster trust: Failed to add server cert to cluster: Post "https://[fd00:10::11]:8443/1.0/certificates": Unable to connect to: [fd00:10::11]:8443 ([dial tcp [fd00:10::11]:8443: connect: connection refused])

ykazakov · March 21, 2026, 10:49am

OK, I restarted the incus application on the first node:

incus admin os application restart incus

and I now get the original error message:

tls: failed to verify certificate: x509: cannot validate certificate for fd00:10::11 because it doesn't contain any IP SANs

stgraber · March 21, 2026, 5:11pm

Can you run incus cluster list my-cluster:?

I’ve seen that join error in the past when the CLI doesn’t have a direct route to the joining server’s address before, but that didn’t actually prevent it from joining for me.

ykazakov · March 21, 2026, 9:18pm

I see. Does it mean that cluster.https_address of server1 must be reachable from the client from which I run the incus commands? In my case, the network fd00:10::1/64 is completely isolated. (It is managed by a switch without external internet connectivity).

I already wiped my cluster. I now repeated the installation from scratch and I think, I managed to get the cluster formed despite the final error message:

% incus config set server1: cluster.https_address=[fd00:10::11]:8443 # internal network
% incus cluster enable server1: server1
Clustering enabled
% incus remote add my-cluster 10.0.0.11:8443 # use address reachable from the client
Certificate fingerprint: 198b620cb1b2f3b6aae5085c9e83bd8204ca110ab55091b9e496d55c32514866
ok (y/n/[fingerprint])? y
% incus remote rm server1
% incus cluster list my-cluster:
+---------+----------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| NAME    |            URL             |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATUS |      MESSAGE      |
+---------+----------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| server1 | https://[fd00:10::11]:8443 | database-leader | x86_64       | default        |             | ONLINE | Fully operational |
|         |                            | database        |              |                |             |        |                   |
+---------+----------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
% incus cluster join my-cluster: server2:
What IP address or DNS name should be used to reach this server? [default=10.0.0.12]: fd00:10::12 # !! This will be set as `cluster.https_address` of `server2` !! 
What member name should be used to identify this server in the cluster? [default=4c4c4544-0044-5410-8033-b2c04f503034]: server2
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Error connecting to existing cluster member "[fd00:10::11]:8443": Get "https://[fd00:10::11]:8443": Unable to connect to: [fd00:10::11]:8443 ([dial tcp [fd00:10::11]:8443: i/o timeout])
% incus cluster list my-cluster:
+---------+----------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| NAME    |            URL             |      ROLES       | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATUS |      MESSAGE      |
+---------+----------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| server1 | https://[fd00:10::11]:8443 | database-leader  | x86_64       | default        |             | ONLINE | Fully operational |
|         |                            | database         |              |                |             |        |                   |
+---------+----------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| server2 | https://[fd00:10::12]:8443 | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+---------+----------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
% incus cluster join my-cluster: server3:
# ...

Despite the error, the cluster appears to be operational: I was managed to launch containers after adding the local ZFS pool and the bridge network.

Last time I set cluster.https_address of server2 directly before joining the cluster, which probably resulted in problems with certificates.

Would it still be possible to make incus cluster join to work without errors when cluster.https_address is not reachable from the incus client? Also, I guess, setting up a cluster from the operations center fails because this address is not reachable?

stgraber · March 22, 2026, 11:27pm

Yeah, that lines up with what I’ve seen before.

Basically the CLI attempt to confirm the cluster is functional at the end, but it can’t connect to the new server anymore because its certificate as changed to the cluster one.

But that’s happening after everything else succeeded so the cluster is still perfectly fine.

We don’t include any name or IP addresses in our certificates, we perform exact certificate matching instead and ignore all fields. You’re getting that kind of weird error when they’re not a perfect match, which here would likely be because the server you joined is now responding with the cluster-wide certificate.

ykazakov · March 23, 2026, 7:45am

I think, in my last setup, the error is not due to the certificates but because the client tries to connect to server1 using its cluster.https_address, which is unreachable from the client since the internal network is isolated.

It is not clear to me why Incus is trying to contact individual cluster members directly after the cluster is formed. Wouldn’t it make more sense to confirm that the cluster is functional using the cluster remote address?

stgraber · March 23, 2026, 10:45pm

I think it’s basically a race condition. Part of the cluster joining is sending a request and then waiting for an operation to complete. If the request makes enough progress before we get to attach to the operation, we get the connection error.

I’ve tried to reproduce it locally with some VMs and haven’t been able to, likely because network latency is low enough to hide the problem.

ykazakov · March 24, 2026, 12:29pm

I repeated the steps a few times and the error is consistently triggered in my case, also when using VMs. Are you sure that the internal network was not reachable from the client? If it is reachable, there is no error.

I described the detailed steps of my setup here:

github.com/lxc/incus

Error when joining a cluster with cluster.https_address on an isolated network

opened 12:19PM - 24 Mar 26 UTC

ykazakov

### Is there an existing issue for this? - [x] There is no existing issue for t…his bug ### Is this happening on an up to date version of Incus? - [x] This is happening on a supported version of Incus ### Incus system details ```yaml config: cluster.https_address: 20.0.0.11:8443 core.https_address: :8443 api_extensions: - storage_zfs_remove_snapshots - container_host_shutdown_timeout - container_stop_priority - container_syscall_filtering - auth_pki - container_last_used_at - etag - patch - usb_devices - https_allowed_credentials - image_compression_algorithm - directory_manipulation - container_cpu_time - storage_zfs_use_refquota - storage_lvm_mount_options - network - profile_usedby - container_push - container_exec_recording - certificate_update - container_exec_signal_handling - gpu_devices - container_image_properties - migration_progress - id_map - network_firewall_filtering - network_routes - storage - file_delete - file_append - network_dhcp_expiry - storage_lvm_vg_rename - storage_lvm_thinpool_rename - network_vlan - image_create_aliases - container_stateless_copy - container_only_migration - storage_zfs_clone_copy - unix_device_rename - storage_lvm_use_thinpool - storage_rsync_bwlimit - network_vxlan_interface - storage_btrfs_mount_options - entity_description - image_force_refresh - storage_lvm_lv_resizing - id_map_base - file_symlinks - container_push_target - network_vlan_physical - storage_images_delete - container_edit_metadata - container_snapshot_stateful_migration - storage_driver_ceph - storage_ceph_user_name - resource_limits - storage_volatile_initial_source - storage_ceph_force_osd_reuse - storage_block_filesystem_btrfs - resources - kernel_limits - storage_api_volume_rename - network_sriov - console - restrict_dev_incus - migration_pre_copy - infiniband - dev_incus_events - proxy - network_dhcp_gateway - file_get_symlink - network_leases - unix_device_hotplug - storage_api_local_volume_handling - operation_description - clustering - event_lifecycle - storage_api_remote_volume_handling - nvidia_runtime - container_mount_propagation - container_backup - dev_incus_images - container_local_cross_pool_handling - proxy_unix - proxy_udp - clustering_join - proxy_tcp_udp_multi_port_handling - network_state - proxy_unix_dac_properties - container_protection_delete - unix_priv_drop - pprof_http - proxy_haproxy_protocol - network_hwaddr - proxy_nat - network_nat_order - container_full - backup_compression - nvidia_runtime_config - storage_api_volume_snapshots - storage_unmapped - projects - network_vxlan_ttl - container_incremental_copy - usb_optional_vendorid - snapshot_scheduling - snapshot_schedule_aliases - container_copy_project - clustering_server_address - clustering_image_replication - container_protection_shift - snapshot_expiry - container_backup_override_pool - snapshot_expiry_creation - network_leases_location - resources_cpu_socket - resources_gpu - resources_numa - kernel_features - id_map_current - event_location - storage_api_remote_volume_snapshots - network_nat_address - container_nic_routes - cluster_internal_copy - seccomp_notify - lxc_features - container_nic_ipvlan - network_vlan_sriov - storage_cephfs - container_nic_ipfilter - resources_v2 - container_exec_user_group_cwd - container_syscall_intercept - container_disk_shift - storage_shifted - resources_infiniband - daemon_storage - instances - image_types - resources_disk_sata - clustering_roles - images_expiry - resources_network_firmware - backup_compression_algorithm - ceph_data_pool_name - container_syscall_intercept_mount - compression_squashfs - container_raw_mount - container_nic_routed - container_syscall_intercept_mount_fuse - container_disk_ceph - virtual-machines - image_profiles - clustering_architecture - resources_disk_id - storage_lvm_stripes - vm_boot_priority - unix_hotplug_devices - api_filtering - instance_nic_network - clustering_sizing - firewall_driver - projects_limits - container_syscall_intercept_hugetlbfs - limits_hugepages - container_nic_routed_gateway - projects_restrictions - custom_volume_snapshot_expiry - volume_snapshot_scheduling - trust_ca_certificates - snapshot_disk_usage - clustering_edit_roles - container_nic_routed_host_address - container_nic_ipvlan_gateway - resources_usb_pci - resources_cpu_threads_numa - resources_cpu_core_die - api_os - container_nic_routed_host_table - container_nic_ipvlan_host_table - container_nic_ipvlan_mode - resources_system - images_push_relay - network_dns_search - container_nic_routed_limits - instance_nic_bridged_vlan - network_state_bond_bridge - usedby_consistency - custom_block_volumes - clustering_failure_domains - resources_gpu_mdev - console_vga_type - projects_limits_disk - network_type_macvlan - network_type_sriov - container_syscall_intercept_bpf_devices - network_type_ovn - projects_networks - projects_networks_restricted_uplinks - custom_volume_backup - backup_override_name - storage_rsync_compression - network_type_physical - network_ovn_external_subnets - network_ovn_nat - network_ovn_external_routes_remove - tpm_device_type - storage_zfs_clone_copy_rebase - gpu_mdev - resources_pci_iommu - resources_network_usb - resources_disk_address - network_physical_ovn_ingress_mode - network_ovn_dhcp - network_physical_routes_anycast - projects_limits_instances - network_state_vlan - instance_nic_bridged_port_isolation - instance_bulk_state_change - network_gvrp - instance_pool_move - gpu_sriov - pci_device_type - storage_volume_state - network_acl - migration_stateful - disk_state_quota - storage_ceph_features - projects_compression - projects_images_remote_cache_expiry - certificate_project - network_ovn_acl - projects_images_auto_update - projects_restricted_cluster_target - images_default_architecture - network_ovn_acl_defaults - gpu_mig - project_usage - network_bridge_acl - warnings - projects_restricted_backups_and_snapshots - clustering_join_token - clustering_description - server_trusted_proxy - clustering_update_cert - storage_api_project - server_instance_driver_operational - server_supported_storage_drivers - event_lifecycle_requestor_address - resources_gpu_usb - clustering_evacuation - network_ovn_nat_address - network_bgp - network_forward - custom_volume_refresh - network_counters_errors_dropped - metrics - image_source_project - clustering_config - network_peer - linux_sysctl - network_dns - ovn_nic_acceleration - certificate_self_renewal - instance_project_move - storage_volume_project_move - cloud_init - network_dns_nat - database_leader - instance_all_projects - clustering_groups - ceph_rbd_du - instance_get_full - qemu_metrics - gpu_mig_uuid - event_project - clustering_evacuation_live - instance_allow_inconsistent_copy - network_state_ovn - storage_volume_api_filtering - image_restrictions - storage_zfs_export - network_dns_records - storage_zfs_reserve_space - network_acl_log - storage_zfs_blocksize - metrics_cpu_seconds - instance_snapshot_never - certificate_token - instance_nic_routed_neighbor_probe - event_hub - agent_nic_config - projects_restricted_intercept - metrics_authentication - images_target_project - images_all_projects - cluster_migration_inconsistent_copy - cluster_ovn_chassis - container_syscall_intercept_sched_setscheduler - storage_lvm_thinpool_metadata_size - storage_volume_state_total - instance_file_head - instances_nic_host_name - image_copy_profile - container_syscall_intercept_sysinfo - clustering_evacuation_mode - resources_pci_vpd - qemu_raw_conf - storage_cephfs_fscache - network_load_balancer - vsock_api - instance_ready_state - network_bgp_holdtime - storage_volumes_all_projects - metrics_memory_oom_total - storage_buckets - storage_buckets_create_credentials - metrics_cpu_effective_total - projects_networks_restricted_access - storage_buckets_local - loki - acme - internal_metrics - cluster_join_token_expiry - remote_token_expiry - init_preseed - storage_volumes_created_at - cpu_hotplug - projects_networks_zones - network_txqueuelen - cluster_member_state - instances_placement_scriptlet - storage_pool_source_wipe - zfs_block_mode - instance_generation_id - disk_io_cache - amd_sev - storage_pool_loop_resize - migration_vm_live - ovn_nic_nesting - oidc - network_ovn_l3only - ovn_nic_acceleration_vdpa - cluster_healing - instances_state_total - auth_user - security_csm - instances_rebuild - numa_cpu_placement - custom_volume_iso - network_allocations - zfs_delegate - storage_api_remote_volume_snapshot_copy - operations_get_query_all_projects - metadata_configuration - syslog_socket - event_lifecycle_name_and_project - instances_nic_limits_priority - disk_initial_volume_configuration - operation_wait - image_restriction_privileged - cluster_internal_custom_volume_copy - disk_io_bus - storage_cephfs_create_missing - instance_move_config - ovn_ssl_config - certificate_description - disk_io_bus_virtio_blk - loki_config_instance - instance_create_start - clustering_evacuation_stop_options - boot_host_shutdown_action - agent_config_drive - network_state_ovn_lr - image_template_permissions - storage_bucket_backup - storage_lvm_cluster - shared_custom_block_volumes - auth_tls_jwt - oidc_claim - device_usb_serial - numa_cpu_balanced - image_restriction_nesting - network_integrations - instance_memory_swap_bytes - network_bridge_external_create - network_zones_all_projects - storage_zfs_vdev - container_migration_stateful - profiles_all_projects - instances_scriptlet_get_instances - instances_scriptlet_get_cluster_members - instances_scriptlet_get_project - network_acl_stateless - instance_state_started_at - networks_all_projects - network_acls_all_projects - storage_buckets_all_projects - resources_load - instance_access - project_access - projects_force_delete - resources_cpu_flags - disk_io_bus_cache_filesystem - instance_oci - clustering_groups_config - instances_lxcfs_per_instance - clustering_groups_vm_cpu_definition - disk_volume_subpath - projects_limits_disk_pool - network_ovn_isolated - qemu_raw_qmp - network_load_balancer_health_check - oidc_scopes - network_integrations_peer_name - qemu_scriptlet - instance_auto_restart - storage_lvm_metadatasize - ovn_nic_promiscuous - ovn_nic_ip_address_none - instances_state_os_info - network_load_balancer_state - instance_nic_macvlan_mode - storage_lvm_cluster_create - network_ovn_external_interfaces - instances_scriptlet_get_instances_count - cluster_rebalance - custom_volume_refresh_exclude_older_snapshots - storage_initial_owner - storage_live_migration - instance_console_screenshot - image_import_alias - authorization_scriptlet - console_force - network_ovn_state_addresses - network_bridge_acl_devices - instance_debug_memory - init_preseed_storage_volumes - init_preseed_profile_project - instance_nic_routed_host_address - instance_smbios11 - api_filtering_extended - acme_dns01 - security_iommu - network_ipv4_dhcp_routes - network_state_ovn_ls - network_dns_nameservers - acme_http01_port - network_ovn_ipv4_dhcp_expiry - instance_state_cpu_time - network_io_bus - disk_io_bus_usb - storage_driver_linstor - instance_oci_entrypoint - network_address_set - server_logging - network_forward_snat - memory_hotplug - instance_nic_routed_host_tables - instance_publish_split - init_preseed_certificates - custom_volume_sftp - network_ovn_external_nic_address - network_physical_gateway_hwaddr - backup_s3_upload - snapshot_manual_expiry - resources_cpu_address_sizes - disk_attached - limits_memory_hotplug - disk_wwn - server_logging_webhook - storage_driver_truenas - container_disk_tmpfs - instance_limits_oom - backup_override_config - network_ovn_tunnels - init_preseed_cluster_groups - usb_attached - backup_iso - instance_systemd_credentials - cluster_group_usedby - bpf_token_delegation - file_storage_volume - network_hwaddr_pattern - storage_volume_full - storage_bucket_full - device_pci_firmware - resources_serial - ovn_nic_limits - storage_lvmcluster_qcow2 - oidc_allowed_subnets - file_delete_force - nic_sriov_select_ext - network_zones_dns_contact - nic_attached_connected - nic_sriov_security_trusted - direct_backup - instance_snapshot_disk_only_restore - unix_hotplug_pci - cluster_evacuating_restoring - projects_restricted_image_servers - storage_lvmcluster_size - authorization_scriptlet_cert - lvmcluster_remove_snapshots - daemon_storage_logs api_status: stable api_version: "1.0" auth: trusted public: false auth_methods: - tls auth_user_name: ccc2f65cdb189a3918f2fb7ad8d993af3ca6d3cdc033c23bccd635f23cbee98a auth_user_method: tls environment: addresses: - 20.0.0.11:8443 - 10.10.0.11:8443 architectures: - x86_64 - i686 certificate: | -----BEGIN CERTIFICATE----- MIICVzCCAd6gAwIBAgIRAJA88iDDFmvmiua7MZuojuMwCgYIKoZIzj0EAwMwTzEZ MBcGA1UEChMQTGludXggQ29udGFpbmVyczEyMDAGA1UEAwwpcm9vdEAyMTg3NDky OS01OWRjLTQ2YjktYjYwOC05YmY1MGQyNzJiOTIwHhcNMjYwMzI0MTEzMTQzWhcN MzYwMzIxMTEzMTQzWjBPMRkwFwYDVQQKExBMaW51eCBDb250YWluZXJzMTIwMAYD VQQDDClyb290QDIxODc0OTI5LTU5ZGMtNDZiOS1iNjA4LTliZjUwZDI3MmI5MjB2 MBAGByqGSM49AgEGBSuBBAAiA2IABMniC+YJjkqnMTPp8Za1/IrKavazTdCpl0vA jFlNMHMjJN9ETYEBX2flt1qJhFpJ+z3GmYy1gPEvA/fBnBR8qnF/NG5mQJYXld70 AYF5Gxwj2uF4C1M1Vd96vOmE/lUBE6N+MHwwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwRwYDVR0RBEAwPoIkMjE4NzQ5 MjktNTlkYy00NmI5LWI2MDgtOWJmNTBkMjcyYjkyhwR/AAABhxAAAAAAAAAAAAAA AAAAAAABMAoGCCqGSM49BAMDA2cAMGQCMEDhsT6forHWR35EhxXh7jk2Y+ulCtM6 jnFlcrJsehekihGROt8Sxjf+RpEcXZpqGgIwZ0PoP0qQ1DY9adFcYQtlEAqbF4LA nSy5iXMpK9vukpqHZv5NzuvZHh4Fefu2Q/8E -----END CERTIFICATE----- certificate_fingerprint: ce6a178e0f1447c7272e9b7fb71551fb707abb67a4d85d08da8616f030db91a6 driver: lxc | qemu driver_version: 6.0.6 | 10.2.2 firewall: nftables kernel: Linux kernel_architecture: x86_64 kernel_features: idmapped_mounts: "true" netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" uevent_injection: "true" unpriv_binfmt: "true" unpriv_fscaps: "true" kernel_version: 6.19.9-zabbly+ lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: IncusOS os_version: "202603240012" project: default server: incus server_clustered: true server_event_mode: full-mesh server_name: server1 server_pid: 1103 server_version: "6.22" storage: "" storage_version: "" storage_supported_drivers: - name: lvmcluster version: 2.03.31(2) (2025-02-27) / 1.02.205 (2025-02-27) / 4.50.0 remote: true - name: zfs version: 2.4.1-1 remote: false - name: btrfs version: "6.14" remote: false - name: dir version: "1" remote: false - name: lvm version: 2.03.31(2) (2025-02-27) / 1.02.205 (2025-02-27) / 4.50.0 remote: false ``` ### Instance details _No response_ ### Instance log _No response_ ### Current behavior I am trying to bootstrap an incus cluster from an external incus client (that is not one of the cluster members) using [this tutorial](https://linuxcontainers.org/incus-os/docs/main/tutorials/incus-cluster/) where `cluster.https_address` is on an isolated network switch, not reachable from the incus client. Specifically, assume that my Incus hosts have two networks: - the public (slow) network 10.10.0.0/24 using which all servers can be reached (from the incus client) - the isolated (fast) network 20.0.0.0/24 (on a layer 2 network switch) that should be used for internal cluster traffic, not reachable from the client Assuming that the incus client and servers have the following IP addresses: ``` incus ls -cns4t +---------+---------+------------------------+-----------------+ | NAME | STATE | IPV4 | TYPE | +---------+---------+------------------------+-----------------+ | client | RUNNING | 10.10.0.4 (enp5s0) | VIRTUAL-MACHINE | +---------+---------+------------------------+-----------------+ | server1 | RUNNING | 20.0.0.11 (_vinternal) | VIRTUAL-MACHINE | | | | 10.10.0.11 (_vuplink) | | +---------+---------+------------------------+-----------------+ | server2 | RUNNING | 20.0.0.12 (_vinternal) | VIRTUAL-MACHINE | | | | 10.10.0.12 (_vuplink) | | +---------+---------+------------------------+-----------------+ | server3 | RUNNING | 20.0.0.13 (_vinternal) | VIRTUAL-MACHINE | | | | 10.10.0.13 (_vuplink) | | +---------+---------+------------------------+-----------------+ ``` I am trying to form a cluster using the linked tutorial: ``` incus config set server1: cluster.https_address=20.0.0.11:8443 incus cluster enable server1: server1 > Clustering enabled incus remote add my-cluster 10.10.0.11:8443 > Certificate fingerprint: b6ff5c85aed196eee22ae7e7364dd5dd0d6700a72e09ce8f44b6c1861bf818fe > ok (y/n/[fingerprint])? y incus cluster join my-cluster: server2: > What IP address or DNS name should be used to reach this server? [default=20.0.0.12]: > What member name should be used to identify this server in the cluster? [default=35ef33da-ee1d-4277-a245-aa2d6baa7be5]: server2 > All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes > Error connecting to existing cluster member "20.0.0.11:8443": Get "https://20.0.0.11:8443": Unable to connect to: 20.0.0.11:8443 ([dial tcp 20.0.0.11:8443: i/o timeout]) incus cluster ls my-cluster: ``` It looks like the incus client is trying to send a request to `server1` over its `cluster.https_address`, which is unreachable from the client. The issue was originally reported [here](https://discuss.linuxcontainers.org/t/setting-up-incusos-cluster-with-a-separate-internal-network/26412). ### Expected behavior I expect the client to communicate with the cluster nodes only over the remote address that is set for this cluster: ``` incus remote add my-cluster 10.10.0.11:8443 ``` ### Steps to reproduce The issue can be reproduced using IncusOS VMs on any installation of Incus 6.22 as follows: 1. Create a separate project and two networks: for public traffic and for the internal traffic: ``` incus project create incus-os incus project switch incus-os incus network create uplink ipv4.address=10.10.0.1/24 ipv6.address=none # Create a network to simulate an isolated layer 2 network switch incus network create internal ipv4.address=none ipv6.address=none ipv6.address=none ipv4.nat=false ipv6.nat=false ``` 2. Update the default profile to use the uplink network ``` cat <<EOF | incus profile edit default description: Default Incus profile for project incus-os devices: eth0: name: eth0 network: uplink type: nic root: path: / pool: local type: disk EOF ``` 3. Create a new profile for IncusOS VMs that use the dual network setup ``` cat <<EOF | incus profile edit incus-os description: Default profile for IncusOS VMs devices: eth0: network: uplink type: nic eth1: network: internal type: nic root: path: / pool: local size: 50GiB type: disk vtpm: type: tpm EOF ``` 4. Bootstrap a VM for Incus client ``` incus launch images:debian/13 client --vm incus exec client -- bash # Inside container client apt update apt install curl dosfstools gpg -y # Following the instructions to install the stable version of Incus (6.22) # https://github.com/zabbly/incus mkdir -p /etc/apt/keyrings/ curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources Enabled: yes Types: deb URIs: https://pkgs.zabbly.com/incus/stable Suites: $(. /etc/os-release && echo ${VERSION_CODENAME}) Components: main Architectures: $(dpkg --print-architecture) Signed-By: /etc/apt/keyrings/zabbly.asc EOF apt update apt install incus-client -y exit ``` 5. Create an custom volume for IncusOS ISO ``` wget https://images.linuxcontainers.org/os/202603240012/x86_64/IncusOS_202603240012.iso.gz gunzip IncusOS_202603240012.iso.gz incus storage volume import local IncusOS_202603240012.iso IncusOS_202603240012.iso --type=iso ``` 6. Create the installation seed for server1 ``` incus storage volume create local seed --type=block size=1MB incus config device add client seed disk pool=local source=seed incus exec client -- bash # on client: echo 'type=c' | sfdisk /dev/sdb mkfs.vfat -F 32 -n "SEED_DATA" /dev/sdb1 mkdir -p /mnt/seed mount /dev/sdb1 /mnt/seed cat <<EOF > /mnt/seed/applications.yaml version: "1" applications: - name: incus EOF cat <<EOF > /mnt/seed/install.yaml force_install: true target: id: scsi-0QEMU_QEMU_HARDDISK_incus_root EOF cat <<EOF > /mnt/seed/incus.yaml version: "1" apply_defaults: false preseed: certificates: - name: admin type: client certificate: |- $(incus remote get-client-certificate | sed 's/^/ /') description: Initial admin client EOF cat <<EOF > /mnt/seed/network.yaml interfaces: - addresses: - 10.10.0.11/24 hwaddr: enp5s0 name: uplink routes: - to: 0.0.0.0/0 via: 10.10.0.1 roles: - management - addresses: - 20.0.0.11/24 hwaddr: enp6s0 name: internal roles: - cluster dns: nameservers: - 10.10.0.1 EOF umount /mnt/seed exit incus config device remove client seed ``` 7. Bootstrap server1 ``` incus init --empty --vm server1 -c security.secureboot=false -c limits.cpu=1 -c limits.memory=4GiB --profile incus-os incus config device add server1 boot-media disk pool=local source=IncusOS_202603240012.iso boot.priority=10 incus config device add server1 seed disk pool=local source=seed incus start server1 # Wait until the installation is finished incus config device remove server1 boot-media incus config device remove server1 seed ``` 8. Bootstrap server2 ``` incus config device add client seed disk pool=local source=seed incus exec client -- mount /dev/sdb1 /mnt/seed/ incus exec client -- vi /mnt/seed/network.yaml # modify addresses to 10.10.0.12 and 20.0.0.12 incus exec client -- umount /mnt/seed/ incus config device remove client seed incus init --empty --vm server2 -c security.secureboot=false -c limits.cpu=1 -c limits.memory=4GiB --profile incus-os incus config device add server2 boot-media disk pool=local source=IncusOS_202603240012.iso boot.priority=10 incus config device add server2 seed disk pool=local source=seed incus start server2 # Wait until the installation is finished incus config device remove server2 boot-media incus config device remove server2 seed ``` 9. Bootstrap server3 ``` incus config device add client seed disk pool=local source=seed incus exec client -- mount /dev/sdb1 /mnt/seed/ incus exec client -- vi /mnt/seed/network.yaml # modify addresses to 10.10.0.13 and 20.0.0.13 incus exec client -- umount /mnt/seed/ incus config device remove client seed incus init --empty --vm server3 -c security.secureboot=false -c limits.cpu=1 -c limits.memory=4GiB --profile incus-os incus config device add server3 boot-media disk pool=local source=IncusOS_202603240012.iso boot.priority=10 incus config device add server3 seed disk pool=local source=seed incus start server3 # Wait until the installation is finished incus config device remove server3 boot-media incus config device remove server3 seed ``` 10. Verify ``` # Confirm that all servers are running and have correct IP addresses incus ls -cns4t +---------+---------+------------------------+-----------------+ | NAME | STATE | IPV4 | TYPE | +---------+---------+------------------------+-----------------+ | client | RUNNING | 10.10.0.4 (enp5s0) | VIRTUAL-MACHINE | +---------+---------+------------------------+-----------------+ | server1 | RUNNING | 20.0.0.11 (_vinternal) | VIRTUAL-MACHINE | | | | 10.10.0.11 (_vuplink) | | +---------+---------+------------------------+-----------------+ | server2 | RUNNING | 20.0.0.12 (_vinternal) | VIRTUAL-MACHINE | | | | 10.10.0.12 (_vuplink) | | +---------+---------+------------------------+-----------------+ | server3 | RUNNING | 20.0.0.13 (_vinternal) | VIRTUAL-MACHINE | | | | 10.10.0.13 (_vuplink) | | +---------+---------+------------------------+-----------------+ # Verify that client CANNOT access the internal network incus exec client -- ping -c 3 20.0.0.11 > PING 20.0.0.11 (20.0.0.11) 56(84) bytes of data. > From 20.0.0.11 icmp_seq=1 Destination Host Unreachable > From 20.0.0.11 icmp_seq=2 Destination Host Unreachable > From 20.0.0.11 icmp_seq=3 Destination Host Unreachable ``` 11. [Optional] Create snapshots of the servers (in case something goes wrong) ``` incus stop server1 server2 server3 incus snapshot create server1 after-install incus snapshot create server2 after-install incus snapshot create server3 after-install incus start server1 server2 server3 ``` 12. Bootstrap IncusOS cluster using [the tutorial](https://linuxcontainers.org/incus-os/docs/main/tutorials/incus-cluster/) ``` incus exec client -- bash # on client: incus remote add server1 10.10.0.11 incus remote add server2 10.10.0.12 incus remote add server3 10.10.0.13 incus config set server1: cluster.https_address=20.0.0.11:8443 incus cluster enable server1: server1 > Clustering enabled incus remote add my-cluster 10.10.0.11:8443 > Certificate fingerprint: b6ff5c85aed196eee22ae7e7364dd5dd0d6700a72e09ce8f44b6c1861bf818fe > ok (y/n/[fingerprint])? y incus cluster join my-cluster: server2: > What IP address or DNS name should be used to reach this server? [default=20.0.0.12]: > What member name should be used to identify this server in the cluster? [default=35ef33da-ee1d-4277-a245-aa2d6baa7be5]: server2 > All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes > Error connecting to existing cluster member "20.0.0.11:8443": Get "https://20.0.0.11:8443": Unable to connect to: 20.0.0.11:8443 ([dial tcp 20.0.0.11:8443: i/o timeout]) ``` 13. Observe that despite the connection error the cluster appears to have formed ``` incus cluster ls my-cluster: +---------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+ | NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATUS | MESSAGE | +---------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+ | server1 | https://20.0.0.11:8443 | database-leader | x86_64 | default | | ONLINE | Fully operational | | | | database | | | | | | +---------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+ | server2 | https://20.0.0.12:8443 | database-standby | x86_64 | default | | ONLINE | Fully operational | +---------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+ exit ``` 14. Make the internal network reachable and join server3 ``` incus config device add client eth1 nic network=internal incus exec client -- bash # on client: ip addr add 20.0.0.10/24 dev enp6s0 ip link set enp6s0 up # Verify that the container can ping the internal network ping -c 3 20.0.0.11 > PING 20.0.0.11 (20.0.0.11) 56(84) bytes of data. > 64 bytes from 20.0.0.11: icmp_seq=1 ttl=64 time=1.57 ms > 64 bytes from 20.0.0.11: icmp_seq=2 ttl=64 time=0.792 ms > 64 bytes from 20.0.0.11: icmp_seq=3 ttl=64 time=0.676 ms # Add server3 to cluster incus cluster join my-cluster: server3: > What IP address or DNS name should be used to reach this server? [default=20.0.0.13]: > What member name should be used to identify this server in the cluster? [default=8f1cd2bf-ea20-4c51-a408-d1b6dac91e97]: server3 > All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes # Observe that no error takes place any more incus cluster ls my-cluster: +---------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+ | NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATUS | MESSAGE | +---------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+ | server1 | https://20.0.0.11:8443 | database-leader | x86_64 | default | | ONLINE | Fully operational | | | | database | | | | | | +---------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+ | server2 | https://20.0.0.12:8443 | database | x86_64 | default | | ONLINE | Fully operational | +---------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+ | server3 | https://20.0.0.13:8443 | database | x86_64 | default | | ONLINE | Fully operational | +---------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+ ```

Also, even if it is a race condition, I do not think the client should ever send a request to remote hosts using cluster.https_address.