Introducing MicroCloud

mtheimpaler · February 17, 2023, 6:20pm

Not sure why im having so much trouble but i’ve purged and rebooted and still i cant get my cluster to work.

root@debian-dell:~# microcloud init
Please choose the address MicroCloud will be listening on [default=10.30.2.2]:
Scanning for eligible servers...
Press enter to end scanning for servers
 Found "debian-dell2" at "10.30.2.3"
 Found "debian-nuc" at "10.30.2.4"

Ending scan
Initializing a new cluster
 Local MicroCloud is ready
 Local MicroCeph is ready
 Local LXD is ready
Awaiting cluster formation...

<< just gets stuck here and then timesout…

2023-02-17T10:15:56-08:00 microcloud.daemon[2956]: time="2023-02-17T10:15:56-08:00" level=warning msg="microcluster database is uninitialized"
2023-02-17T10:16:15-08:00 microcloud.daemon[2956]: time="2023-02-17T10:16:15-08:00" level=error msg="Failed to parse join token" error="Failed to parse token map: invalid character 'i' looking for beginning of value" name=debian-dell2

<< This is from another node on my network (same subnet)

Timed out waiting for a response from all cluster members
Cluster initialization is complete
Would you like to add additional local disks to MicroCeph? (yes/no) [default=yes]: Select from the available unpartitioned disks:
Space to select; Enter to confirm; Esc to exit; Type to filter results.
Up/Down to move; Right to select all; Left to select none.
       +-------------+----------------+-----------+------+----------------------------------------+
       |  LOCATION   |     MODEL      | CAPACITY  | TYPE |                  PATH                  |
       +-------------+----------------+-----------+------+----------------------------------------+
> [ ]  | debian-dell | EDGE SE847 SSD | 465.76GiB | sata | /dev/disk/by-id/wwn-0x588891410006496d |
  [ ]  | debian-dell | EDGE SE847 SSD | 465.76GiB | sata | /dev/disk/by-id/wwn-0x5888914100071325 |
       +-------------+----------------+-----------+------+----------------------------------------+

Error: Failed to confirm disk selection: Failed to confirm selection: interrupt

<< and this is from where i run microcloud init… its almost as if it cant reach out to the other nodes.?

UPDATE:
for some reason after on debian it is necessary to do a snap restart microcloud before init … if you keep seeing an error in snap log microcloud from parsing tokens then just keep doing snap restart microcloud until the only debug error is that database has not been initialized… it works after …

mtheimpaler · February 17, 2023, 7:01pm

Is there any way to make the containers on the lxd fan network to be accessible from my current lan? it would be on the same underlay network.

ahmadqz · February 19, 2023, 12:04am

Now going through the same issue as nkrapf’s and bolapara’s.

Awaiting cluster formation…
Timed out waiting for a response from all cluster members

Error on Node3

ahmad@microcloud3:~$ sudo tail -f /var/log/syslog
Feb 18 23:50:03 microcloud3 microcloud.daemon[686]: time=“2023-02-18T23:50:03Z” level=error msg=“Failed to parse join token” error=“Failed to parse token map: invalid character ‘r’ looking for beginning of value” name=microcloud3

Info of the nodes:
Node1

ahmad@microcloud1:~$ lsb_release -r
Release: 22.04
ahmad@microcloud1:~$ sudo snap services
Service Startup Current Notes
lxd.activate enabled inactive -
lxd.daemon enabled inactive socket-activated
lxd.user-daemon enabled inactive socket-activated
microceph.daemon enabled active -
microceph.mds disabled inactive -
microceph.mgr disabled inactive -
microceph.mon disabled inactive -
microceph.osd disabled inactive -
microcloud.daemon enabled active -
ahmad@microcloud1:~$ sudo snap list
Name Version Rev Tracking Publisher Notes
core20 20230126 1822 latest/stable canonical✓ base
core22 20230207 509 latest/stable canonical✓ base
lxd 5.10-b392610 24323 latest/stable canonical✓ -
microceph 0+git.00fe8d8 120 latest/stable canonical✓ -
microcloud 0+git.d78a41a 70 latest/stable canonical✓ -
snapd 2.58 17950 latest/stable canonical✓ snapd
ahmad@microcloud1:~$

Node2

ahmad@microcloud2:~$ lsb_release -r
Release: 22.04
ahmad@microcloud2:~$ sudo snap services
Service Startup Current Notes
lxd.activate enabled inactive -
lxd.daemon enabled inactive socket-activated
lxd.user-daemon enabled inactive socket-activated
microceph.daemon enabled active -
microceph.mds disabled inactive -
microceph.mgr disabled inactive -
microceph.mon disabled inactive -
microceph.osd disabled inactive -
microcloud.daemon enabled active -
ahmad@microcloud2:~$ sudo snap list
Name Version Rev Tracking Publisher Notes
core20 20230126 1822 latest/stable canonical✓ base
core22 20230207 509 latest/stable canonical✓ base
lxd 5.10-b392610 24323 latest/stable canonical✓ -
microceph 0+git.00fe8d8 120 latest/stable canonical✓ -
microcloud 0+git.d78a41a 70 latest/stable canonical✓ -
snapd 2.58 17950 latest/stable canonical✓ snapd
ahmad@microcloud2:~$

Node3

ahmad@microcloud3:~$ lsb_release -r
Release: 22.04
ahmad@microcloud3:~$ sudo snap services
Service Startup Current Notes
lxd.activate enabled inactive -
lxd.daemon enabled inactive socket-activated
lxd.user-daemon enabled inactive socket-activated
microceph.daemon enabled active -
microceph.mds disabled inactive -
microceph.mgr disabled inactive -
microceph.mon disabled inactive -
microceph.osd disabled inactive -
microcloud.daemon enabled active -
ahmad@microcloud3:~$ sudo snap list
Name Version Rev Tracking Publisher Notes
core20 20230126 1822 latest/stable canonical✓ base
core22 20230207 509 latest/stable canonical✓ base
lxd 5.10-b392610 24323 latest/stable canonical✓ -
microceph 0+git.00fe8d8 120 latest/stable canonical✓ -
microcloud 0+git.d78a41a 70 latest/stable canonical✓ -
snapd 2.58 17950 latest/stable canonical✓ snapd
ahmad@microcloud3:~$

I started over multiple times, and the same failure was appearing on the other nodes randomly. Errors I got are related to the join tokens.

Feb 19 00:18:40 microcloud2 microcloud.daemon[1483]: time=“2023-02-19T00:18:40Z” level=error msg=“Failed to handle join token” error=“Failed to join "MicroCloud" cluster: Failed to join cluster with the given join token” name=microcloud2

Feb 19 00:18:31 microcloud3 microcloud.daemon[2628]: time=“2023-02-19T00:18:31Z” level=error msg=“Failed to parse join token” error=“Failed to parse token map: invalid character ‘r’ looking for beginning of value” name=microcloud3

ahmadqz · February 19, 2023, 12:39am

Finally, the initialization completed successfully and now I have it up and running.

ahmad@microcloud1:~$ sudo -i
root@microcloud1:~# microcloud init
Please choose the address MicroCloud will be listening on [default=192.168.0.35]: 
Scanning for eligible servers...
Press enter to end scanning for servers
 Found "microcloud2" at "192.168.0.36"
 Found "microcloud3" at "192.168.0.37"

Ending scan
Initializing a new cluster
 Local MicroCloud is ready
 Local MicroCeph is ready
 Local LXD is ready
Awaiting cluster formation...
 Peer "microcloud2" has joined the cluster
 Peer "microcloud3" has joined the cluster
Cluster initialization is complete
Would you like to add additional local disks to MicroCeph? (yes/no) [default=yes]: 
Select from the available unpartitioned disks:

Select which disks to wipe:

Adding 3 disks to MicroCeph
MicroCloud is ready
root@microcloud1:~#

Summary of issues and solutions:

If while executing “microclound init” it failed with Timed out waiting for a response from all cluster members and one or more of the nodes failed to handle and/or parse the join token (see the errors I got in the previous comment), cancel the current execution, wipe out all snaps on all nodes, and run again, eventually it will succeed.
To wipe out, I executed what nkrapf suggested:

snap stop microcloud microceph lxd && snap disable microcloud && snap disable microceph && snap disable lxd && snap remove --purge microcloud && snap remove --purge microceph && snap remove --purge lxd

If you are using qemu-kvm VMs, and the additional disks attached to the VMs has a disk bus of virtio, then during adding the disks to ceph, there will be an issue that the path to the disks will be unknown or there is some issue there, so there is no path to the disks appearing

/dev/disk/by-id/

to overcome this issue, wipe out all snaps on all nodes as mentioned in point 1, re-add the disks to the VMs but choose SCSI as the bus type and run microclound again, eventually it should succeed.

bolapara · February 24, 2023, 6:05am

Thank you @mtheimpaler

If you are experiencing timeouts waiting for the cluster to form, the key is to go to each node in the cluster and keep issuing “snap restart microcloud” until you no longer see the error in the logs about parsing tokens. Once all the nodes no longer present that error, initiate the “microcloud init”.

hypeit · February 24, 2023, 11:10pm

Hi all,

Has anyone seen following error?
Happens after I installed lxd in both machines and initialised them with preseeded config

config:
  cluster.https_address: 10.10.11.11:8443
  core.https_address: '[::]'
  core.trust_password: blaaah
networks:
- config:
    bridge.mode: fan
    fan.underlay_subnet: 10.10.99.0/24
  description: ""
  name: lxdfan0
  type: ""
  project: default
storage_pools:
- config:
    size: 14GiB
  description: ""
  name: local
  driver: zfs
profiles:
- config: {}
  description: ""
  devices:
    eth0:
      name: eth0
      network: lxdfan0
      type: nic
    root:
      path: /
      pool: local
      type: disk
  name: default
projects: []
cluster:
  server_name: infra1
  enabled: true
  member_config: []
  cluster_address: ""
  cluster_certificate: ""

  server_address: ""
  cluster_password: "blaaah"
  cluster_certificate_path: ""
  cluster_token: ""
 cluster_certificate: "-----BEGIN CERTIFICATE-----"

Please choose the address MicroCloud will be listening on [default=10.20.0.11]:
Scanning for eligible servers...
Press enter to end scanning for servers
 Found "infra2" at "10.20.0.12"

Ending scan
Initializing a new cluster
Error: Failed to bootstrap local MicroCloud: Post "http://control.socket/cluster/control": dial unix /var/snap/microcloud/common/state/control.socket: connect: connection refused

stgraber · February 25, 2023, 12:34am

Hmm, no, that’s an odd one.
I’d suggest looking at snap services and systemctl --failed on that initial system.

The error suggests that the microcloud daemon isn’t running.

hypeit · February 25, 2023, 12:53pm

Hi @stgraber,

You were right, as always

 mother@infra1:~$ sudo systemctl start snap.microcloud.daemon.service  
mother@infra1:~$ systemctl status snap.microcloud.daemon.service  
× snap.microcloud.daemon.service - Service for snap application microcloud.daemon
     Loaded: loaded (/etc/systemd/system/snap.microcloud.daemon.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sat 2023-02-25 12:50:59 UTC; 1s ago
    Process: 84222 ExecStart=/usr/bin/snap run microcloud.daemon (code=exited, status=1/FAILURE)
   Main PID: 84222 (code=exited, status=1/FAILURE)
        CPU: 113ms

Feb 25 12:50:59 infra1 systemd[1]: snap.microcloud.daemon.service: Scheduled restart job, restart counter is at 5.
Feb 25 12:50:59 infra1 systemd[1]: Stopped Service for snap application microcloud.daemon.
Feb 25 12:50:59 infra1 systemd[1]: snap.microcloud.daemon.service: Start request repeated too quickly.
Feb 25 12:50:59 infra1 systemd[1]: snap.microcloud.daemon.service: Failed with result 'exit-code'.
Feb 25 12:50:59 infra1 systemd[1]: Failed to start Service for snap application microcloud.daemon.

Any options here? Do I need to tweak it from systemd service?

Thanks

stgraber · February 25, 2023, 6:14pm

sudo journalctl -u snap.microcloud.daemon -n 300

Maybe that will show something more useful?

hypeit · February 26, 2023, 9:33am

Thank you, that is what I did yesterday and in the end I ended up just reinstalling microcloud since it had the previous address LXD was listening on set up.

bolapara · February 26, 2023, 9:11pm

I’m experiencing an issue where I am unable to start VMs that I’ve copied from an existing LXD host to my Microcloud cluster. I get ‘Error: Failed setting up disk device “root”: Couldn’t find a keyring entry’ when I try and start them.

josh@lxd00:~$ lxc copy lxdtest1:homeassistant homeassistant -s remote
josh@lxd00:~$ lxc config device remove homeassistant eth0
Device eth0 removed from homeassistant
josh@lxd00:~$ lxc profile apply homeassistant lan2
Profiles lan2 applied to homeassistant
josh@lxd00:~$ lxc start homeassistant
Error: Failed setting up disk device "root": Couldn't find a keyring entry
Try `lxc info --show-log homeassistant` for more info
josh@lxd00:~$ lxc info --show-log homeassistant
Name: homeassistant
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Location: lxd02
Created: 2023/02/26 11:46 CST
Error: open /var/snap/lxd/common/lxd/logs/homeassistant/qemu.log: no such file or directory
josh@lxd00:~$

stgraber · February 27, 2023, 2:36am

I submitted a fix for this to LXD on Friday.

Until then, you can avoid the issue with:

ln -s ceph.keyring /var/snap/microceph/current/conf/ceph.client.admin.keyring

stgraber · February 27, 2023, 2:37am

Basically microceph uses ceph.keyring whereas LXD expected ceph.client.admin.keyring. Both are valid paths, so we’re now expanding the LXD lookup logic to match that of Ceph itself.

David_Collantes · March 2, 2023, 2:26am

Can I fool it by having partitions instead of disks? I have three Lenovo Thinkcentre machines, each with a 1TB NVMe, no space to add more disks to them.

stgraber · March 2, 2023, 2:28am

Not currently, MicroCloud as it stands today is looking for full disks and actively skips any partitioned disks.

But @masnax is working on quite a few improvements in that area and one thing we’re looking at doing is let you add additional entries to what’s auto-detected which could then be used to force it to use partitions.

mtheimpaler · March 2, 2023, 7:20pm

lets say i want to add a third node to my already initialized cluster , so i want to add another node with disks to use for storage…

whats the best way using microcloud and microceph to do that?

would running init over be helpful?

stgraber · March 2, 2023, 8:57pm

@masnax is also currently working on extending an existing cluster.
The plan is indeed to allow microcloud init to be run again for this.

bolapara · March 2, 2023, 9:33pm

I’m experiencing an issue when I try and move a container instance from my local ZFS pool to the remote Ceph pool.

josh@lxd01:~$ lxc move motioneye -s remote
Error: Migration operation failure: Create instance from copy: Create instance volume from copy failed: [Rsync send failed: motioneye, /var/snap/lxd/common/lxd/storage-pools/loc
al/backup.1302756649/: [exit status 11 read unix @lxd/cfa31ccf-c74e-4c78-8569-6f0df2743511->@: use of closed network connection] (rsync: write failed on "/var/snap/lxd/common/lx
d/storage-pools/remote/containers/lxd-move-of-116d4b8f-15f0-4cac-b117-5985164bfae9/rootfs/var/log/journal/82c548939e714e58afcf80b967b33ddd/system@e258b1e601694d89a989f76224c056b
1-0000000000d06b47-0005f3f71cf0ba7e.journal": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(374) [receiver=3.1.3]
) Rsync receive failed: /var/snap/lxd/common/lxd/storage-pools/remote/containers/lxd-move-of-116d4b8f-15f0-4cac-b117-5985164bfae9/: [exit status 11] ()]
josh@lxd01:~$

All my nodes have storage.images_volume and backups_volume set to a volume on the Ceph storage. The Ceph cluster has ~100TB available. This container’s root disk is <5GiB in size. The root disk of this node has ~40GiB available.

I’m not sure why this keeps failing with “No space left”. I even tried setting storage.images_volume and backups_volume to a volume on the local storage but got the same error.

mtheimpaler · April 27, 2023, 10:42am

if i run microcloud init again to add another compute node to the cluster … would that wipe everything ? is there a better way to extend cluster yet ?

stgraber · April 27, 2023, 10:27pm

With current microcloud, it won’t wipe everything but the grow/shrink logic is still quite new and not as widely tested as the initial setup.

If you hit any weirdness, let us know and @masnax or I will take a look.