mar
(Madeline Autumn-Rose)
February 12, 2020, 8:49pm
1
[apologies if my markdown is wrong]
I am using lxd 3.20 from snap on ubuntu 19.10
I had ceph working with a node standalone, however attempting to replicate the config to a cluster isn’t working (even specifying the same node that worked standalone with --target):
(I have copied the ceph keys and config to all nodes)
lxc storage create default ceph --target aa1-cptef101-n1 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef101-n2 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef101-n3 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef101-n4 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef102-n1 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef102-n2 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef102-n3 source=rbd-lxc-aa0.a1f
lxc storage create default ceph --target aa1-cptef102-n4 source=rbd-lxc-aa0.a1f
lxc storage create default ceph ceph.user.name=lxd ceph.cluster_name=ceph
lxc storage create default ceph ceph.user.name=lxd ceph.cluster_name=ceph
ceph.osd.force_reuse=true
lxc profile device add default root disk path=/ pool=default
And the storage config as it appears:
config:
ceph.cluster_name: ceph
ceph.osd.force_reuse: "true"
ceph.osd.pg_num: "32"
ceph.osd.pool_name: rbd-lxc-aa0.a1f
ceph.user.name: lxd
volatile.pool.pristine: "false"
description: ""
name: default
driver: ceph
used_by:
- /1.0/containers/lasting-bengal
- /1.0/profiles/default
status: Created
locations:
- aa1-cptef101-n2
- aa1-cptef101-n3
- aa1-cptef101-n4
- aa1-cptef102-n1
- aa1-cptef102-n2
- aa1-cptef102-n3
- aa1-cptef102-n4
- aa1-cptef101-n1
Creating a container will either hang forever or return Error: Failed instance creation: Create instance from image: No such object
Is there a way to get debug info on what it’s doing to ceph? lxd -d
is saying it’s already running despite an lxd shutdown
stgraber
(Stéphane Graber)
February 12, 2020, 9:41pm
2
lxc monitor --type=logging --pretty
in a separate shell on the same system that you’re trying to create the container on should help.
mar
(Madeline Autumn-Rose)
February 12, 2020, 11:03pm
3
DBUG[02-12|22:57:47] Handling ip=10.224.1.11:49072 method=GET
url="/1.0/operations/b0f9931b-3c49-4c59-a98a-eceb703a184c?target=aa1-cptef101-n2" .
user=
DBUG[02-12|22:57:47] Image already exists in the db
image=9e7158fc0683d41f7f692ce8b17598716d7eee925c6a593432df59488bf4131f
INFO[02-12|22:57:47] Creating container ephemeral=false name=fond-
lioness project=default
INFO[02-12|22:57:47] Created container ephemeral=false name=fond-
lioness project=default
DBUG[02-12|22:57:47] Creating RBD storage volume for container "fond-lioness" on
storage pool "default"
INFO[02-12|22:57:47] Deleting container name=fond-lioness project=default
used="1970-01-01 00:00:00 +0000 UTC" created="2020-02-12 22:57:47.593203723
+0000 UTC" ephemeral=false
DBUG[02-12|22:57:47] Failure for task operation: b0f9931b-3c49-4c59-a98a-
eceb703a184c: Create instance from image: No such object
INFO[02-12|22:57:47] Deleted container project=default used="1970-01-01
00:00:00 +0000 UTC" created="2020-02-12 22:57:47.593203723 +0000 UTC"
ephemeral=false name=fond-lioness
Hmm. Not as much debug info as I would’ve wanted.
stgraber
(Stéphane Graber)
February 12, 2020, 11:49pm
4
Yeah, this is a bit light on debug information
Can you show:
lxd sql global “SELECT * FROM nodes;”
lxd sql global “SELECT * FROM images;”
lxd sql global “SELECT * FROM images_nodes;”
lxd sql global “SELECT * FROM storage_volumes WHERE type=1;”
mar
(Madeline Autumn-Rose)
February 13, 2020, 12:00am
5
Absolutely! Happy to provide as much information as possible.
+----+-----------------+-------------+------------------+--------+----------------+-------------------------------
------+---------+------+
| id | name | description | address | schema | api_extensions |
heartbeat | pending | arch |
+----+-----------------+-------------+------------------+--------+----------------+-------------------------------
------+---------+------+
| 2 | aa1-cptef101-n2 | | 10.224.1.12:8443 | 24 | 165 | 2020-02-
12T15:55:40.063634275-08:00 | 0 | 2 |
| 3 | aa1-cptef101-n3 | | 10.224.1.13:8443 | 24 | 165 | 2020-02-
12T15:55:40.063043196-08:00 | 0 | 2 |
| 4 | aa1-cptef101-n4 | | 10.224.1.14:8443 | 24 | 165 | 2020-02-
12T15:55:40.063180046-08:00 | 0 | 2 |
| 5 | aa1-cptef102-n1 | | 10.224.1.21:8443 | 24 | 165 | 2020-02-
12T15:55:40.063274336-08:00 | 0 | 2 |
| 6 | aa1-cptef102-n2 | | 10.224.1.22:8443 | 24 | 165 | 2020-02-
12T15:55:40.063348426-08:00 | 0 | 2 |
| 7 | aa1-cptef102-n3 | | 10.224.1.23:8443 | 24 | 165 | 2020-02-
12T15:55:40.063420615-08:00 | 0 | 2 |
| 8 | aa1-cptef102-n4 | | 10.224.1.24:8443 | 24 | 165 | 2020-02-
12T15:55:40.063491265-08:00 | 0 | 2 |
| 9 | aa1-cptef101-n1 | | 10.224.1.11:8443 | 24 | 165 | 2020-02-
12T15:55:40.063562885-08:00 | 0 | 2 |
+----+-----------------+-------------+------------------+--------+----------------+-------------------------------
------+---------+------+
images:
+----+------------------------------------------------------------------+--------------------------------------------
---+----------------+--------+--------------+---------------------------+---------------------------+--------------
----------------------+--------+-------------------------------------+-------------+------------+------+
| id | fingerprint | filename |
size | public | architecture | creation_date | expiry_date |
upload_date | cached | last_use_date | auto_update | project_id |
type |
+----+------------------------------------------------------------------+--------------------------------------------
---+----------------+--------+--------------+---------------------------+---------------------------+--------------
-----------------------+--------+-------------------------------------+-------------+------------+------+
| 1 | 9e7158fc0683d41f7f692ce8b17598716d7eee925c6a593432df59488bf4131f |
ubuntu-18.04-server-cloudimg-amd64-lxd.tar.xz | 1.87413264e+08 | 0 | 2 | 2020-
01-28T16:00:00-08:00 | 2023-04-25T17:00:00-07:00 | 2020-02-12T12:13:40.562369879-
08:00 | 1 | 2020-02-12T15:05:22.031966555-08:00 | 1 | 1 | 0 |
+----+------------------------------------------------------------------+--------------------------------------------
---+----------------+--------+--------------+---------------------------+---------------------------+--------------
----------------------+--------+-------------------------------------+-------------+------------+------+
images_node:
+----+----------+---------+
| id | image_id | node_id |
+----+----------+---------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 4 |
| 4 | 1 | 9 |
+----+----------+---------+
storage_volumes: (I have a feeling this shouldn’t be blank?)
+----+------+-----------------+---------+------+-------------+----------+------------+
| id | name | storage_pool_id | node_id | type | description | snapshot | project_id |
+----+------+-----------------+---------+------+-------------+----------+------------+
+----+------+-----------------+---------+------+-------------+----------+------------+
stgraber
(Stéphane Graber)
February 13, 2020, 12:10am
6
Ok, so the above suggests that:
You have a single image in the image store
The image hasn’t been loaded onto CEPH yet (rbd ls --pool RBD-POOL
would confirm)
The image is physically present (in /var/snap/lxd/common/lxd/images/
) on 4 of the cluster nodes:
101-n2
101-n3
101-n4
101-n1
The error seems to suggest that LXD thinks the image is available on CEPH already, despite the database indicating that it shouldn’t be yet, so that’s all a bit confusing
mar
(Madeline Autumn-Rose)
February 13, 2020, 12:12am
7
Here’s the weird bit:
root@ceph-operator-101:~# rbd ls --pool rbd-lxc-aa0.a1f
container_pure-malamute
image_9e7158fc0683d41f7f692ce8b17598716d7eee925c6a593432df59488bf4131f
lxd_rbd-lxc-aa0.a1f
It does appear to be there, unless I am reading the ID wrong
stgraber
(Stéphane Graber)
February 13, 2020, 3:31am
8
Indeed sure looks like it’s there. This is a bit confusing. I wonder if that’s part of the issue.
Can you try moving it aside, see if that helps?
rbd mv --pool rbd-lxc-aa0.a1f --image image_9e7158fc0683d41f7f692ce8b17598716d7eee925c6a593432df59488bf4131f image_9e7158fc0683d41f7f692ce8b17598716d7eee925c6a593432df59488bf4131f.bak
mar
(Madeline Autumn-Rose)
February 13, 2020, 6:07pm
9
Done - time to try launching again, or is there another intermediate thing to do for testing first?
stgraber
(Stéphane Graber)
February 13, 2020, 9:37pm
10
Nope, I’d just try again now and see if you get a different result
mar
(Madeline Autumn-Rose)
February 13, 2020, 11:37pm
11
No change - command seems to be hanging. Let it run for about 30 minutes with no output past “Creating the instance”
HOWEVER. if I choose a different image:
root@aa1-cptef101-n4:/home/ubuntu# time lxc launch ubuntu:19.10
Creating the instance
Instance name is: viable-hermit
The instance you are starting doesn't have any network attached to it.
To create a new network, use: lxc network create
To attach a network to an instance, use: lxc network attach
Starting viable-hermit
real 0m30.300s
It works
mar
(Madeline Autumn-Rose)
February 14, 2020, 12:10am
12
Repeatable, too:
Deleting the image with lxc image rm fixes things, and i can launch 18.04 now. Weird.