Error When Creating/Transferring Container to New Cluster with Ceph Storage Backend

muhfiasbin · December 27, 2021, 9:44am

I create new LXD Cluster with Ceph as Storage Backend. I try to transfer a container from the existing cluster to the new one. On the existing cluster, I create a snapshot on the container then publish the snapshot as image. The image is 20 GB in size. I got errors when creating container on the new cluster with that image.

Error when creating the container :

tar: rootfs/etc/ssl/certs/AffirmTrust_Premium_ECC.pem: Cannot create symlink to ‘/usr/share/ca-certificates/mozilla/AffirmTrust_Premium_ECC.crt’: No space left on device
tar: rootfs/etc/ssl/certs/AffirmTrust_Networking.pem: Cannot create symlink to ‘/usr/share/ca-certificates/mozilla/AffirmTrust_Networking.crt’: No space left on device
tar: rootfs/etc/ssl/certs/AffirmTrust_Commercial.pem: Cannot create symlink to ‘/usr/share/ca-certificates/mozilla/AffirmTrust_Commercial.crt’: No space left on device
tar: rootfs/etc/ssl/certs/Actalis_Authentication_Root_CA.pem: Cannot create symlink to ‘/usr/share/ca-certificates/mozilla/Actalis_Authentication_Root_CA.crt’: No space left on device
tar: Exiting with failure status due to previous errors.

I use the following profile :

ubuntu@node1:~$ lxc profile show ssd
config: {}
description: LXD SSD Ceph Storage
devices:
  eth0:
    name: eth0
    network: lxdfan0
    type: nic
  root:
    path: /
    pool: ssd
    size: 40GB
    type: disk
name: default
used_by: []

and Storage Pool configuration :

ubuntu@node1:~$ lxc storage show ssd
config:
  ceph.cluster_name: ceph
  ceph.osd.pg_num: "32"
  ceph.osd.pool_name: lxd-ssd
  ceph.user.name: admin
  volatile.pool.pristine: "false"
description: ""
name: ssd
driver: ceph
used_by:
- /1.0/images/a8402324842148ccfcbacbc69bf251baa9703916593089f0609e8d45e3185bff
- /1.0/profiles/ssd
status: Created
locations:
- node1
- node2

cemzafer · December 27, 2021, 12:25pm

Hi @muhfiasbin,
What is the command you executed? Have you configure ceph storage on each lxd cluster member?
Maybe that link can be helpful, https://discuss.linuxcontainers.org/t/need-some-guidance-to-setup-a-storage-pool-with-ceph/9001
Regards.

muhfiasbin · December 27, 2021, 4:25pm

Hi @cemzafer,

on lxd init, i didn’t configure storage so I manually add ceph storage on each cluster member with this command :

lxc storage create ssd ceph source=lxd-ssd --target node1
lxc storage create ssd ceph source=lxd-ssd --target node2
lxc storage create ssd ceph

then I confirm that the storage is already created on each node
On node1 :

ubuntu@node1:~$ lxc storage list
+------+-------------+--------+---------+---------+
| NAME | DESCRIPTION | DRIVER |  STATE  | USED BY |
+------+-------------+--------+---------+---------+
| ssd  |             | ceph   | CREATED | 3       |
+------+-------------+--------+---------+---------+

On node2 :

ubuntu@node2:~$ lxc storage list
+------+-------------+--------+---------+---------+
| NAME | DESCRIPTION | DRIVER |  STATE  | USED BY |
+------+-------------+--------+---------+---------+
| ssd  |             | ceph   | CREATED | 3       |
+------+-------------+--------+---------+---------+

cemzafer · December 27, 2021, 4:34pm

Looks fine, what you want to achieve? Can you post lxc cluster ls output also.
Regards.

muhfiasbin · December 27, 2021, 5:14pm

This lxc cluster ls output :

ubuntu@node1:~$ lxc cluster ls
+---------+----------------------------+----------+--------+-------------------+--------------+
|  NAME   |            URL             | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+---------+----------------------------+----------+--------+-------------------+--------------+
| node1   | https://192.168.120.1:8443 | YES      | ONLINE | Fully operational | x86_64       |
+---------+----------------------------+----------+--------+-------------------+--------------+
| node2   | https://192.168.120.2:8443 | YES      | ONLINE | Fully operational | x86_64       |
+---------+----------------------------+----------+--------+-------------------+--------------+

I try to move a container from existing cluster to this new cluster. I snapshot the container and published the container snapshot as an image on the existing cluster.

ubuntu@so5:~$ lxc snapshot ocs snap0
ubuntu@so5:~$ lxc publish ocs/snap0 --alias ocs description="OCS"
ubuntu@so5:~$ lxc image list ocs
+-------+--------------+--------+-------------+--------------+-----------+------------+------------------------------+
| ALIAS | FINGERPRINT  | PUBLIC | DESCRIPTION | ARCHITECTURE |   TYPE    |    SIZE    |         UPLOAD DATE          |
+-------+--------------+--------+-------------+--------------+-----------+------------+------------------------------+
| ocs   | 1e763a63ce8b | no     | OCS         | x86_64       | CONTAINER | 27983.63MB | Dec 24, 2021 at 5:19am (UTC) |
+-------+--------------+--------+-------------+--------------+-----------+------------+------------------------------+

I add remote so5 to node1.

ubuntu@node1:~$ lxc remote add so5 192.168.100.11
Certificate fingerprint: a3397b87d04765d6ba4c2d1f311b302927edb424b19820f668a42809176596b9
ok (y/n)? y
Admin password for so5: 
Client certificate now trusted by server: so5
ubuntu@node1:~$ lxc remote list
+--------------+------------------------------------------+---------------+-------------+--------+--------+
|       NAME   |                   URL                    |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC |
+--------------+------------------------------------------+---------------+-------------+--------+--------+
| images       | https://images.linuxcontainers.org       | simplestreams | none        | YES    | NO     |
+--------------+------------------------------------------+---------------+-------------+--------+--------+
| local        | unix://                                  | lxd           | file access | NO     | YES    |
+--------------+------------------------------------------+---------------+-------------+--------+--------+
| so5          | https://192.168.100.11:8443              | lxd           | tls         | NO     | NO     |
+--------------+------------------------------------------+---------------+-------------+--------+--------+
| ubuntu       | https://cloud-images.ubuntu.com/releases | simplestreams | none        | YES    | YES    |
+--------------+------------------------------------------+---------------+-------------+--------+--------+
| ubuntu-daily | https://cloud-images.ubuntu.com/daily    | simplestreams | none        | YES    | YES    |
+--------------+------------------------------------------+---------------+-------------+--------+--------+

Then, init a container with this image on the new cluster with command lxc init so5:ocs ocs -p ssd . After finished unpacking image, the errors appear.

cemzafer · December 27, 2021, 5:41pm

Can you post the lxc profile ls

That “-p” parameter can be “-s”
Thanks.

muhfiasbin · December 28, 2021, 1:14am

Nope, i am sure “-p” is the right parameter, I create a profile called ssd too.

ubuntu@node1:~$ lxc profile ls
+---------+---------+
|  NAME   | USED BY |
+---------+---------+
| default | 0       |
+---------+---------+
| ssd     | 0       |
+---------+---------+
ubuntu@node1:~$ lxc profile show ssd
config: {}
description: LXD SSD Ceph Storage
devices:
  eth0:
    name: eth0
    network: lxdfan0
    type: nic
  root:
    path: /
    pool: ssd
    size: 40GB
    type: disk
name: default
used_by: []

I have also tried to create a container and it worked.

ubuntu@node1:~$ lxc init ubuntu:20.04 -p ssd
Creating the instance
Instance name is: deciding-finch
ubuntu@node1:~$ lxc list
+----------------+---------+------+------+-----------+-----------+----------+
|      NAME      |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+----------------+---------+------+------+-----------+-----------+----------+
| deciding-finch | STOPPED |      |      | CONTAINER | 0         | node1    |
+----------------+---------+------+------+-----------+-----------+----------+

cemzafer · December 28, 2021, 1:55pm

Sorry for the delay, can you set the debug mode and inspect output for the suspicious instead look at the output of the container with --debug parameter. lxc start deciding-finch --debug
snap set lxd daemon.debug=true
systemctl reload snap.lxd.daemon

Regards.

muhfiasbin · December 30, 2021, 2:30pm

I set debug on the LXD Daemon with snap set lxd daemon.debug=true, then reload the service. I try again to create container using ocs image :

lxc init so5:ocs ocs -p ssd

and this is the error that appear :

t=2021-12-30T19:56:13+0700 lvl=dbug msg="Failed to run: tar -C /var/snap/lxd/common/lxd/storage-pools/ssd/images/1e763a63ce8b0cff6406d2b69f7aabaed9bf62817c82622af0ceb8b9cadbc7a8 --numeric-owner --xattrs-include=* -zxf -: tar: rootfs/srv/www/ocs/backup/ocs.20170918.tar.xz: Wrote only 9728 of 10240 bytes\ntar: rootfs/srv/www/ocs/backup/ocs.20180309.sql.gz: Cannot write: No space left on device

cemzafer · December 30, 2021, 5:36pm

You have to check it out the storage system, the error message is pretty clear to me. Indeed you can post the storage system as lxc storage show ssd. Plus you posted the status of the profile of ssd how is it possible the name line written as default.

ubuntu@node1:~$ lxc profile show ssd
config: {}
description: LXD SSD Ceph Storage
devices:
  eth0:
    name: eth0
    network: lxdfan0
    type: nic
  root:
    path: /
    pool: ssd
    size: 40GB
    type: disk
name: default
used_by: []

Regards.

muhfiasbin · December 31, 2021, 11:22pm

Sorry my bad.

Recently I read this post Ceph storage pool resize image root and found that storage pool has volume.size parameter.

So then I try set lxc storage set ssd volume.size 40GB and then lxc init so5:ocs ocs -p ssd, the container created successully.

cemzafer · January 1, 2022, 6:29am

Perfect @muhfiasbin, please mark as solved.
Regards.

muhfiasbin · January 4, 2022, 10:09am

Thank you very much for your responses.

But, it’s left me with a question how the profile works on resizing the block device.

cemzafer · January 4, 2022, 6:14pm

@muhfiasbin, if you show the lxc storage show ssd and the profile related with the storage we can say something more.
Every containers get all the parameters and characteristics from the profile and also you can specify more than one profile as well. I have tested with my ceph environment and if you dont specify the volume.size the default value is 10GB.
Another thing is you can unset the volume.size and set the profile root size as well.
Regards.