Cephfs storage add gives Error: Failed to mount using "ceph": no route to host

phillippe · June 18, 2024, 12:36pm

Hello,
I’m trying to add storage to incus from an existing cephfs unsuccessfully.

incus storage create home-data cephfs source=home-data
Error: Failed to mount "[fd92:69ee:d36f::c8]:3300,[fd92:69ee:d36f::c8]:6789,[fd92:69ee:d36f::c9]:3300,[fd92:69ee:d36f::c9]:6789,[fd92:69ee:d36f::ca]:3300,[fd92:69ee:d36f::ca]:6789:/" on "/tmp/incus_cephfs_4273653901/mount" using "ceph": no route to host

Any hint on what I am missing will be greatly appreciated.

Here are the context elements:

sudo ceph orch status
Backend: cephadm
Available: Yes
Paused: No

sudo ceph orch host ls
HOST   ADDR                LABELS      STATUS
home0  fd92:69ee:d36f::c8  _admin,rgw
home1  fd92:69ee:d36f::c9  rgw
home2  fd92:69ee:d36f::ca
3 hosts in cluster

sudo ceph fs ls
name: home-data, metadata pool: home-data.meta, data pools: [home-data.data ]

sudo ceph fs status
home-data - 0 clients
=========
RANK  STATE            MDS               ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active  home-data.home0.ybgnuu  Reqs:    0 /s    10     13     12      0
     POOL         TYPE     USED  AVAIL
home-data.meta  metadata   152k   797G
home-data.data    data       0    797G
      STANDBY MDS
home-cephfs.home2.nljmjq
home-cephfs.home0.fqzvnz
home-cephfs.home1.klidjc
MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)

simos · June 18, 2024, 1:34pm

The exact error message shows several IPv6 addresses and then says no route to host. This is likely an IPv6 configuration issue.
You can test with ping6 to ping from the Incus host to the ceph server using the IPv6 addresses.

Therefore, first check that IPv6 works fine.

phillippe · June 18, 2024, 1:49pm

No trouble on the routing side apparently…
The ceph orch host ls should complain about host inaccessibility.
Moreover, processes are listening on ports 3300 and 6789 on all 3 hosts

The no route to host is only related to incus command. Should the network interfaces be managed by Incus before to issue incus storage ?

for addr in fd92:69ee:d36f::c8 fd92:69ee:d36f::c9 fd92:69ee:d36f::ca
└─▪ do
└─▪ ping -c2 ${addr}
└─▪ done
PING fd92:69ee:d36f::c8(fd92:69ee:d36f::c8) 56 data bytes
64 bytes from fd92:69ee:d36f::c8: icmp_seq=1 ttl=64 time=0.102 ms
64 bytes from fd92:69ee:d36f::c8: icmp_seq=2 ttl=64 time=0.144 ms

--- fd92:69ee:d36f::c8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1027ms
rtt min/avg/max/mdev = 0.102/0.123/0.144/0.021 ms
PING fd92:69ee:d36f::c9(fd92:69ee:d36f::c9) 56 data bytes
64 bytes from fd92:69ee:d36f::c9: icmp_seq=1 ttl=64 time=0.540 ms
64 bytes from fd92:69ee:d36f::c9: icmp_seq=2 ttl=64 time=0.865 ms

--- fd92:69ee:d36f::c9 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1015ms
rtt min/avg/max/mdev = 0.540/0.702/0.865/0.162 ms
PING fd92:69ee:d36f::ca(fd92:69ee:d36f::ca) 56 data bytes
64 bytes from fd92:69ee:d36f::ca: icmp_seq=1 ttl=64 time=3.73 ms
64 bytes from fd92:69ee:d36f::ca: icmp_seq=2 ttl=64 time=4.53 ms

--- fd92:69ee:d36f::ca ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 3.726/4.129/4.533/0.403 ms

phillippe · June 18, 2024, 1:58pm

There are also no filtering issues.

From home0:

telnet home1 3300
Trying fd92:69ee:d36f::c9...
Connected to home1.
Escape character is '^]'.
ceph v2
^]
telnet> quit
Connection closed.

telnet home2 6789
Trying fd92:69ee:d36f::ca...
Connected to home2.
Escape character is '^]'.
ceph v027
��i��o�
�j�i��o�^]
telnet> quit
Connection closed.

stgraber · June 18, 2024, 2:17pm

How’s Incus installed on that system?
Also, any error in dmesg?

phillippe · June 18, 2024, 3:00pm

Incus is installed from zabbly repo on Debian 12 hosts.

incus --version
6.2

Something weird happened while running journalctl -f -k.
I had dozens of systemd-journald[317]: /dev/kmsg buffer overrun, some messages lost.
I decided to reboot, and I know have:

incus storage ls
+-----------+--------+-------------+-------------+---------+
|    NOM    | PILOTE | DESCRIPTION | UTILISÉ PAR |  ÉTAT   |
+-----------+--------+-------------+-------------+---------+
| home-data | cephfs |             | 0           | CREATED |
+-----------+--------+-------------+-------------+---------+

incus storage delete home-data
Error: Failed to open "/etc/ceph/.conf": open /etc/ceph/.conf: no such file or directory

The config file is: `/etc/ceph/ceph.conf``

I would like to restart the process:

Delete cephfs entry with ceph command
Purge incus packages, and reinstall
rerun the incus storage ...

Should I add something?

stgraber · June 18, 2024, 3:20pm

Can you show incus storage show home-data?

phillippe · June 18, 2024, 5:49pm

Here it is:

incus storage show home-data
config: {}
description: ""
name: home-data
driver: cephfs
used_by: []
status: Created
locations:
- home0

On the Ceph side:

sudo ceph fs get home-data
Filesystem 'home-data' (5)
fs_name	home-data
epoch	94
flags	12 joinable allow_snaps allow_multimds_snaps
created	2024-06-16T09:39:08.379891+0000
modified	2024-06-18T14:38:54.021949+0000
tableserver	0
root	0
session_timeout	60
session_autoclose	300
max_file_size	1099511627776
required_client_features	{}
last_failure	0
last_failure_osd_epoch	437
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds	1
in	0
up	{0=184173}
failed
damaged
stopped
data_pools	[30]
metadata_pool	29
inline_data	disabled
balancer
bal_rank_mask	-1
standby_count_wanted	1
[mds.home-data.home0.ybgnuu{0:184173} state up:active seq 6 join_fscid=5 addr [v2:[fd92:69ee:d36f::c8]:6804/3927649977,v1:[fd92:69ee:d36f::c8]:6805/3927649977,v2:0.0.0.0:6806/3927649977,v1:0.0.0.0:6807/3927649977] compat {c=[1],r=[1],i=[7ff]}]

stgraber · June 18, 2024, 5:58pm

Yeah, so that seems to be the problem. I’m not sure why some of the config keys are missing there…

Can you try incus storage set home-data cephfs.cluster_name=ceph cephfs.user.name=admin ?

I don’t recall if we allow you to change that after the fact. If not I’ll give you the DB surgery equivalent.

phillippe · June 18, 2024, 6:10pm

No error message this time:

incus storage set home-data cephfs.cluster_name=ceph cephfs.user.name=admin

incus storage show home-data
config:
  cephfs.cluster_name: ceph
  cephfs.user.name: admin
description: ""
name: home-data
driver: cephfs
used_by: []
status: Created
locations:
- home0

stgraber · June 18, 2024, 6:18pm

Good, so with that, the incus storage delete should now work properly I’d expect.

phillippe · June 18, 2024, 6:36pm

I purged and reinstalled packages.
I am now able to reproduce the kernel messages after incus storage create command.

sudo incus storage create home-data cephfs source=home-data cephfs.cluster_name=ceph cephfs.user.name=admin

juin 18 20:30:11 home0 kernel: libceph: osdc handle_map corrupt msg
juin 18 20:30:11 home0 kernel: header: 00000000: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel: header: 00000010: 29 00 c4 00 04 00 ee 2f 00 00 00 00 00 00 00 00  )....../........
juin 18 20:30:11 home0 kernel: header: 00000020: 00 00 00 00 01 01 00 00 00 00 00 00 00 03 00 00  ................
juin 18 20:30:11 home0 kernel: header: 00000030: 00 55 61 ca d1                                   .Ua..
juin 18 20:30:11 home0 kernel:  front: 00000000: 18 dc c4 bc 23 75 11 ef ab 02 1c 69 7a 6a bd 96  ....#u.....izj..
juin 18 20:30:11 home0 kernel:  front: 00000010: 00 00 00 00 01 00 00 00 b8 01 00 00 c2 2f 00 00  ............./..
juin 18 20:30:11 home0 kernel:  front: 00000020: 08 07 bc 2f 00 00 09 01 11 11 00 00 18 dc c4 bc  .../............
juin 18 20:30:11 home0 kernel:  front: 00000030: 23 75 11 ef ab 02 1c 69 7a 6a bd 96 b8 01 00 00  #u.....izj......
juin 18 20:30:11 home0 kernel:  front: 00000040: 03 c2 60 66 85 ee 2b 2d 07 c0 71 66 59 2e 10 2e  ..`f..+-..qfY...
juin 18 20:30:11 home0 kernel:  front: 00000050: 07 00 00 00 01 00 00 00 00 00 00 00 1d 05 7c 01  ..............|.
juin 18 20:30:11 home0 kernel:  front: 00000060: 00 00 01 03 00 02 01 00 00 00 01 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 00000070: 00 00 00 00 00 00 76 00 00 00 00 00 00 00 00 00  ......v.........
juin 18 20:30:11 home0 kernel:  front: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 00000090: 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 000000a0: 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 000000b0: 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 00  ................
juin 18 20:30:11 home0 kernel:  front: 000000c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
juin 18 20:30:11 home0 kernel:  front: 000000d0: 00 00 00 00 01 01 01 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 000000f0: 00 00 00 00 00 00 00 80 1a 06 00 00 35 0c 00 00  ............5...
juin 18 20:30:11 home0 kernel:  front: 00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 00000110: 00 00 00 00 00 00 00 00 00 00 00 c0 27 09 00 00  ............'...
juin 18 20:30:11 home0 kernel:  front: 00000120: 00 00 00 01 00 00 00 00 00 00 00 00 00 02 01 24  ...............$
juin 18 20:30:11 home0 kernel:  front: 00000130: 00 00 00 02 00 00 00 0f 00 00 00 01 00 00 00 01  ................
juin 18 20:30:11 home0 kernel:  front: 00000140: 00 00 00 00 00 00 00 17 00 00 00 01 00 00 00 20  ...............
juin 18 20:30:11 home0 kernel:  front: 00000150: 00 00 00 00 00 00 00 00 00 00 00 02 00 00 00 03  ................
juin 18 20:30:11 home0 kernel:  front: 00000160: 00 00 00 6d 67 72 00 00 00 00 10 00 00 00 6d 67  ...mgr........mg
juin 18 20:30:11 home0 kernel:  front: 00000170: 72 5f 64 65 76 69 63 65 68 65 61 6c 74 68 00 00  r_devicehealth..
juin 18 20:30:11 home0 kernel:  front: 00000180: 00 00 7e a7 62 66 ae 44 53 37 01 00 00 00 01 00  ..~.bf.DS7......
juin 18 20:30:11 home0 kernel:  front: 00000190: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
juin 18 20:30:11 home0 kernel:  front: 000001a0: 00 00 02 01 01 35 00 00 00 01 00 00 00 00 00 00  .....5..........

And

juin 18 20:34:30 home0 systemd-journald[320]: /dev/kmsg buffer overrun, some messages lost.
juin 18 20:34:30 home0 kernel: osdmap: 00001f90: 1c 00 00 00 0a 00 1a a0 00 00 00 00 fd 92 69 ee  ..............i.
juin 18 20:34:30 home0 kernel: osdmap: 00001fa0: d3 6f 00 00 00 00 00 00 00 00 00 c9 00 00 00 00  .o..............
juin 18 20:34:30 home0 kernel: osdmap: 00001fb0: 92 a9 72 66 78 7b e3 16 01 01 01 28 00 00 00 03  ..rfx{.....(....
juin 18 20:34:30 home0 kernel: osdmap: 00001fc0: 00 00 00 73 49 ab fc 1c 00 00 00 0a 00 1a a1 00  ...sI...........
juin 18 20:34:30 home0 kernel: osdmap: 00001fd0: 00 00 00 fd 92 69 ee d3 6f 00 00 00 00 00 00 00  .....i..o.......

On incus storage side, I get:

incus storage ls
+-----------+--------+-------------+-------------+---------+
|    NOM    | PILOTE | DESCRIPTION | UTILISÉ PAR |  ÉTAT   |
+-----------+--------+-------------+-------------+---------+
| home-data | cephfs |             | 0           | PENDING |
+-----------+--------+-------------+-------------+---------+

incus storage show home-data
config:
  cephfs.cluster_name: ceph
  cephfs.user.name: admin
  source: home-data
description: ""
name: home-data
driver: cephfs
used_by: []
status: Pending
locations:
- none

stgraber · June 18, 2024, 7:03pm

Does that incus storage create run fail with an error or hang?

phillippe · June 18, 2024, 7:37pm

Yes.
The initial error message is reproducible.

The no route to host error is related to the Ceph cluster crash/restart after quite a long time.

incus storage create home-data cephfs source=home-data cephfs.cluster_name=ceph cephfs.user.name=admin
Error: Failed to mount "[fd92:69ee:d36f::c8]:3300,[fd92:69ee:d36f::c8]:6789,[fd92:69ee:d36f::c9]:3300,[fd92:69ee:d36f::c9]:6789,[fd92:69ee:d36f::ca]:3300,[fd92:69ee:d36f::ca]:6789:/" on "/tmp/incus_cephfs_724568887/mount" using "ceph": no route to host

Here is what we get from the Ceph mon processes with journalctl -k --grep mon

juin 18 20:34:30 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 20:35:18 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 20:36:19 home0 kernel: libceph: mon1 (1)[fd92:69ee:d36f::c8]:6789 session established
juin 18 20:37:21 home0 kernel: libceph: mon5 (1)[fd92:69ee:d36f::ca]:6789 session established
juin 18 20:38:22 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:38:22 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:38:23 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:38:24 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:38:26 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 20:39:24 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 20:40:26 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:26 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:26 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:27 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:29 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:29 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:29 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:30 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:40:32 home0 kernel: libceph: mon5 (1)[fd92:69ee:d36f::ca]:6789 session established
juin 18 20:41:27 home0 kernel: libceph: mon1 (1)[fd92:69ee:d36f::c8]:6789 session established
juin 18 20:42:28 home0 kernel: libceph: mon5 (1)[fd92:69ee:d36f::ca]:6789 session established
juin 18 20:43:29 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:30 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:30 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:31 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:33 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:33 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:34 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:35 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 20:43:36 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 20:44:31 home0 kernel: libceph: mon1 (1)[fd92:69ee:d36f::c8]:6789 session established
juin 18 21:30:12 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:30:13 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:30:13 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:30:14 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:30:16 home0 kernel: libceph: mon1 (1)[fd92:69ee:d36f::c8]:6789 session established
juin 18 21:31:15 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 21:32:16 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 21:33:17 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 21:34:19 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:19 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:20 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:21 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:23 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:23 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:23 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:24 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:25 home0 kernel: libceph: mon0 (1)[fd92:69ee:d36f::c8]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:26 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:26 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:26 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:27 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:34:29 home0 kernel: libceph: mon5 (1)[fd92:69ee:d36f::ca]:6789 session established

From journalctl -k --grep ceph command:

juin 18 21:37:23 home0 kernel: ceph: No mds server is up or the cluster is laggy
juin 18 21:37:23 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:23 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:24 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:25 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:27 home0 kernel: libceph: mon4 (1)[fd92:69ee:d36f::ca]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:27 home0 kernel: libceph: mon4 (1)[fd92:69ee:d36f::ca]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:27 home0 kernel: libceph: mon4 (1)[fd92:69ee:d36f::ca]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:28 home0 kernel: libceph: mon4 (1)[fd92:69ee:d36f::ca]:3300 socket closed (con state V1_BANNER)
juin 18 21:37:30 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 18 21:37:30 home0 kernel: libceph: another match of type 1 in addrvec
juin 18 21:37:30 home0 kernel: libceph: corrupt full osdmap (-22) epoch 441 off 2970 (000000004238c3e1 of 000000001841fb3c-000000008ba65e44)
juin 18 21:37:30 home0 kernel: libceph: osdc handle_map corrupt msg
juin 18 21:37:30 home0 kernel:  front: 00000840: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 08  .....cephfs.....
juin 18 21:37:30 home0 kernel:  front: 000009b0: 00 00 00 63 65 70 68 66 73 01 00 00 00 04 00 00  ...cephfs.......

stgraber · June 18, 2024, 8:31pm

That’d definitely be a problem for cephfs.

Can you show ceph status?

phillippe · June 19, 2024, 7:42pm

Hello,

I went through a complete Ceph reinstall and I got the same results. I was unable to mount the ceph file system manually. This is not a problem coming from incus. Sorry for the inconvenience.

sudo ceph status
  cluster:
    id:     288688d4-2e66-11ef-ab02-1c697a6abd96
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum home0,home1,home2 (age 30m)
    mgr: home0.emmngm(active, since 92m), standbys: home1.vluqya
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 26m), 3 in (since 26m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 145 pgs
    objects: 24 objects, 579 KiB
    usage:   82 MiB used, 2.5 TiB / 2.5 TiB avail
    pgs:     145 active+clean

sudo ceph fs status
cephfs - 0 clients
======
RANK  STATE         MDS           ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active  fs.home1.qbapcf  Reqs:    0 /s    10     13     12      0
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata  96.0k   797G
  cephfs_data      data       0    797G
  STANDBY MDS
fs.home0.emzeua
fs.home2.hhmvfi
MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)

juin 19 21:35:57 home0 kernel: libceph: another match of type 1 in addrvec
juin 19 21:35:57 home0 kernel: libceph: corrupt full osdmap (-22) epoch 38 off 1479 (00000000a64f7738 of 0000000072392dc9-000000002b4836f5)
juin 19 21:35:57 home0 kernel: osdmap: 00000290: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 04  .....cephfs.....
juin 19 21:35:57 home0 kernel: osdmap: 000002a0: 00 00 00 64 61 74 61 06 00 00 00 63 65 70 68 66  ...data....cephf
juin 19 21:35:57 home0 kernel: osdmap: 00000420: 00 00 00 00 01 00 00 00 06 00 00 00 63 65 70 68  ............ceph
juin 19 21:35:57 home0 kernel: osdmap: 00000440: 74 61 06 00 00 00 63 65 70 68 66 73 47 2e 73 66  ta....cephfsG.sf
juin 19 21:35:57 home0 kernel: osdmap: 000004c0: 00 00 00 00 0b 00 00 00 63 65 70 68 66 73 5f 64  ........cephfs_d
juin 19 21:35:57 home0 kernel: libceph: osdc handle_map corrupt msg
juin 19 21:35:57 home0 kernel:  front: 000002b0: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 04  .....cephfs.....
juin 19 21:35:57 home0 kernel:  front: 000002c0: 00 00 00 64 61 74 61 06 00 00 00 63 65 70 68 66  ...data....cephf
juin 19 21:35:57 home0 kernel:  front: 00000440: 00 00 00 00 01 00 00 00 06 00 00 00 63 65 70 68  ............ceph
juin 19 21:35:57 home0 kernel:  front: 00000460: 74 61 06 00 00 00 63 65 70 68 66 73 47 2e 73 66  ta....cephfsG.sf
juin 19 21:35:57 home0 kernel:  front: 000004e0: 00 00 00 00 0b 00 00 00 63 65 70 68 66 73 5f 64  ........cephfs_d
juin 19 21:36:58 home0 kernel: ceph: No mds server is up or the cluster is laggy
juin 19 21:36:58 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 19 21:36:58 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 19 21:36:59 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 19 21:37:00 home0 kernel: libceph: mon2 (1)[fd92:69ee:d36f::c9]:3300 socket closed (con state V1_BANNER)
juin 19 21:37:01 home0 kernel: libceph: mon3 (1)[fd92:69ee:d36f::c9]:6789 session established
juin 19 21:37:01 home0 kernel: libceph: another match of type 1 in addrvec
juin 19 21:37:01 home0 kernel: libceph: corrupt full osdmap (-22) epoch 38 off 1479 (00000000a64f7738 of 0000000072392dc9-000000002b4836f5)
juin 19 21:37:01 home0 kernel: osdmap: 00000290: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 04  .....cephfs.....
juin 19 21:37:01 home0 kernel: osdmap: 000002a0: 00 00 00 64 61 74 61 06 00 00 00 63 65 70 68 66  ...data....cephf
juin 19 21:37:01 home0 kernel: libceph: osdc handle_map corrupt msg
juin 19 21:37:01 home0 kernel:  front: 000002b0: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 04  .....cephfs.....
juin 19 21:37:01 home0 kernel:  front: 000002c0: 00 00 00 64 61 74 61 06 00 00 00 63 65 70 68 66  ...data....cephf
juin 19 21:37:01 home0 kernel:  front: 00000440: 00 00 00 00 01 00 00 00 06 00 00 00 63 65 70 68  ............ceph
juin 19 21:37:01 home0 kernel:  front: 00000460: 74 61 06 00 00 00 63 65 70 68 66 73 47 2e 73 66  ta....cephfsG.sf
juin 19 21:37:01 home0 kernel:  front: 000004e0: 00 00 00 00 0b 00 00 00 63 65 70 68 66 73 5f 64  ........cephfs_d
juin 19 21:37:01 home0 kernel: libceph: another match of type 1 in addrvec
juin 19 21:37:01 home0 kernel: ceph: corrupt mdsmap
juin 19 21:37:01 home0 kernel: mdsmap: 00000470: 00 00 00 63 65 70 68 66 73 00 00 00 00 00 00 00  ...cephfs.......
juin 19 21:37:01 home0 kernel: ceph: error decoding mdsmap -22. Shutting down mount.
juin 19 21:37:01 home0 kernel:  front: 00000480: 00 00 00 00 00 00 01 06 00 00 00 63 65 70 68 66  ...........cephf
juin 19 21:37:01 home0 kernel: libceph: another match of type 1 in addrvec
juin 19 21:37:01 home0 kernel: libceph: corrupt full osdmap (-22) epoch 38 off 1479 (00000000a64f7738 of 0000000072392dc9-000000002b4836f5)
juin 19 21:37:01 home0 kernel: osdmap: 00000290: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 04  .....cephfs.....
juin 19 21:37:01 home0 kernel: osdmap: 000002a0: 00 00 00 64 61 74 61 06 00 00 00 63 65 70 68 66  ...data....cephf
juin 19 21:37:01 home0 kernel: osdmap: 00000420: 00 00 00 00 01 00 00 00 06 00 00 00 63 65 70 68  ............ceph
juin 19 21:37:01 home0 kernel: osdmap: 00000440: 74 61 06 00 00 00 63 65 70 68 66 73 47 2e 73 66  ta....cephfsG.sf
juin 19 21:37:01 home0 kernel: osdmap: 000004c0: 00 00 00 00 0b 00 00 00 63 65 70 68 66 73 5f 64  ........cephfs_d
juin 19 21:37:01 home0 kernel: libceph: osdc handle_map corrupt msg
juin 19 21:37:01 home0 kernel:  front: 000002b0: 00 06 00 00 00 63 65 70 68 66 73 01 00 00 00 04  .....cephfs.....
juin 19 21:37:01 home0 kernel:  front: 000002c0: 00 00 00 64 61 74 61 06 00 00 00 63 65 70 68 66  ...data....cephf
juin 19 21:37:01 home0 kernel:  front: 00000440: 00 00 00 00 01 00 00 00 06 00 00 00 63 65 70 68  ............ceph
juin 19 21:37:01 home0 kernel:  front: 00000460: 74 61 06 00 00 00 63 65 70 68 66 73 47 2e 73 66  ta....cephfsG.sf
juin 19 21:37:01 home0 kernel:  front: 000004e0: 00 00 00 00 0b 00 00 00 63 65 70 68 66 73 5f 64  ........cephfs_d

phillippe · June 23, 2024, 3:44pm

I’ve restarted the configuration from scratch with IPv4 and access to the Ceph cluster is working as expected.
Ceph ‘reef’ stable version is simply not IPv6 ready.

benaryorg · July 19, 2024, 8:27pm

As someone who has been running several Ceph clusters in IPv6-only environments for a few years now across several major versions of Ceph, this hurts me.

I am currently debugging what exactly I have to provide to Incus to get it to actually mount CephFS, since things have changed significantly with the introduction of volumes and subvolumes lately, and I have, in the middle of all that, managed to get this very same error here:

mount: /mnt: mount(2) system call failed: No route to host.
       dmesg(1) may have more information after failed mount system call.

Despite every single daemon being perfectly reachable (heck, my daemons are all on a separate IP address so I can tell which address supposedly isn’t reachable even). Usually the error is in client implementations (e.g. krbd really just stack traces into your dmesg if you try accessing an RBD where the osdmap contains both IPv4 and IPv6 addresses, but single stack of either is fine).

Anyway I get this error when I try to mount the filesystem with the same option that Incus uses internally in its mount syscall, which is a ist of the Mon addresses separated by commas. This mount string is currently obsolete/deprecated as far as I can tell:

mount("[2001:41d0:700:2038::1:0]:3300,[2001:41d0:1004:1a22::1:1]:3300,[2001:41d0:602:2029::1:2]:3300:/", "/var/lib/incus/storage-pools/cephfs", "ceph", 0, "name=benaryorg-incus-cephfs,secret=[secret here],mds_namespace=") = -1 EINVAL (Invalid argument)

I actually do have both source and cephfs.path set on the storage pool, but those don’t show up in there, then again I did change them, and now I’m in a state that does neither allow me to delete the pool nor change it because it is in state “pending” so I can’t say for sure if I could change that path, but I have not had luck yet trying to mount it with that pattern anyway.

The current way to properly mount, as used by mount.ceph from a reef client package is as follows:

mount("benaryorg-incus-cephfs@94dec8c9-a487-42d7-9882-87d2b62549ac.cephfs=/volumes/benaryorg/incus/e7c5cd0c-10fa-42e2-9d48-902544f13d07", "/mnt", "ceph", 0, "name=benaryorg-incus-cephfs,ms_mode=prefer-crc,key=benaryorg-incus-cephfs,mon_addr=[2001:41d0:602:2029::1:2]:3300/[2001:41d0:700:2038::1:0]:3300/[2001:41d0:1004:1a22::1:1]:3300") = 0

Or the less populated mount command I used for this (with most parameters being drawn from /etc/ceph/ceph.conf, which Incus even uses to determine the mon addresses, it could just.… not provide them itself but I digress):

mount.ceph benaryorg-incus-cephfs@.cephfs=/volumes/benaryorg/incus/e7c5cd0c-10fa-42e2-9d48-902544f13d07 /mnt

The mount.ceph man-page has the new synopsis which should be used for mounting CephFS in newer versions.
I suspect that somehow the old syntax hits other (kernel-side) legacy code paths which trigger odd behaviour including a No Route To Host somehow (I assume it’s trying to fallback onto a random IPv4 address, which wouldn’t work in an IPv6-only setup).

@stgraber should I open an issue for the new mount syntax? Or a new discuss thread since my issue may not actually be related and I’m still trying to figure out the exact parameters?

stgraber · July 19, 2024, 9:26pm

Probably worth opening an issue. Though it’s a bit concerning if they changed the mount syntax as that’s kernel API and it’s supposed to always be backwards compatible…

What version of Ceph are you running?

My own clusters are running reef (18.2.2) in an IPv6-only environment and that includes cephfs all working properly here.

benaryorg · July 19, 2024, 9:39pm

Saw the notification in this thread the moment I submitted the issue ^^

Though it’s a bit concerning if they changed the mount syntax as that’s kernel API and it’s supposed to always be backwards compatible…

They changed the preferred syntax back in Quincy already (after Pacific which has been EoL for a while).
The old syntax still works, but apparently not for newer features like the volume based mounts (ceph fs volume […]).
I’m running 18.2.2 too, on two clusters, one of which still has the old style mounts which continue to work, but the other one with the volume based mounts I fail to convince Incus to mount the shares, and I am getting the No Route To Host errors (but seemingly only when I Ctrl+Z the mount process for a while? I’m still not sure about the connection to this thread here) when mounting manually with the old syntax.

With all the brackets and delimited values in the mount string there I would honestly not be surprised if something were trying to resolve some part as an IP by shoving it through the appropriate APIs when it shouldn’t, so I can see how the error could bubble up, but I have no idea where that would happen exactly.

Either way, I assume we should be moving further discussion to the issue then, as an aside that feels too informal for the issue; do you happen to have volume based mounts working? What do I have to tell Incus to use them because I am at a loss there. I tried something like:

incus storage create cephfs cephfs cephfs.cluster_name=ceph cephfs.user.name=benaryorg-incus-cephfs source=cephfs/volumes/benaryorg/incus/e7c5cd0c-10fa-42e2-9d48-902544f13d07

But that’s still failing with dmesg reporting libceph socket closed (con state V1_BANNER) from the mons.