On lxd 2.0.10, when executing lxc publish container/name_backup --alias name_backup is getting stuck and not giving any errors. But it creates at /var/lib/lxd/images/ folders with random numbers like lxd_build_884612142 with a file inside like lxd_post_516577205
filesytem used is ZFS and there is enough space at zpool.
It can take a few minutes for lxd publish to complete. Do you wait enough time for it to finish?
I suppose that if you kill it with Ctrl+C, you will get remnants like those directories.
running lxc publish test/test --alias test --debug not seems to give any error, but never ending:
DBUG[11-26|[some-time]] Raw response: {"type":"sync","status":"Success","status_code":200,"operation":"","error_code":0,"error":"","metadata":{"config":{"core.https_address":"0.0.0.0:8443","core.trust_password":true,"storage.zfs_pool_name":"lxd"},"api_extensions":["id_map"],"api_status":"stable","api_version":"1.0","auth":"trusted","public":false,"environment":{"addresses":
DBUG[11-26|[some-time]] POST {"properties":null,"public":false,"source": {"name":"test/test","type":"snapshot"}} to http://unix.socket/1.0/images DBUG[11-26|[some-time]] Raw response: {"type":"async","status":"Operation created","status_code":100,"operation":"/1.0/operations/[some-id]","error_code":0,"error":"","metadata":{"id":"[some-id]","class":"task","created_at":"[some-time]","updated_at":"[some-time]","status":"Running","status_code":103,"resources":null,"metadata":null,"may_cancel":false,"err":""}}
It could be that your container contains a very large sparse file which would cause tar to take ages to create the tarball (and eat a lot of disk space).
du -x --apparent-size -sch /
In the container you’re about to publish should tell you how large a temporary tarball you’ll end up with.
Thanks for advise @stgraber but as I told before “lxc publish” is even non working for a new test container in that case the result of du -x --apparent-size -sch / is 371MB. Also is not working in any other containers that was working until now with different sizes, from some megas up to 10-12GB containers “lxc publish” got stuck.
What else we could check to determinate why “lxc publish” stop working?
No way to make “lxc publish” work we need it so much as our backups depends on it. Is there a way to reproduce what does “lxc publish” by hand?
I see the contents of an “lxc export file” are the ones at /var/lib/lxd/containers/ but then won’t work compress each container folder?
or is there any method to backup containers individually?
what we did until now is snapshot > image publish > export . I know snapshot is good for fast local restore but we need remote backups, also i test to backup whole /var/lib/lxd + lxd.db but this don’t suite our needs to recover containers individually. but looks like if manually tar /var/lib/lxd/containers/ + sqlite3 /var/lib/lxd/lxd.db .dump > lxdbak.db and then recover in other host (without containers) both looks like working. Is this a correct workaround or could give some problems?
You may want to run lxc monitor at the same time as you run lxc publish, that may give you more details on what’s going on inside LXD. You could also look for temporary files in /var/lib/lxd/images as that’s where lxc publish will be writing during publish.
I don’t see any errors here. I did lxc monitor when trying to publish a a new empty stopped container named test, similar results at lxc monitor if I try to publish image by snapshot
thanks so much for suggestion, days ago I readed your post, and start digging to this tool. Great discovering, let’s see if trough it can determinate something arround that weird issue.
On the other hand if we do a backup of the full /var/lib/lxd or individually each /var/lib/lxd/ is easy to have system inconsistency in case of restore, right?. A part of lxc publish which is the best approach for backup lxd on top of zfs?
Using csysdig -pc + lstrace gives no output on a new lxc publish process, and using gdb inside gives:
attaching to process 17550
[New LWP 17551]
[New LWP 17552]
[New LWP 17553]
[New LWP 17554]
[New LWP 17555]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00000000004a1db3 in ?? ()
The proces of lxc publish test --alias test hangs like this last days then trough csysdig I send kill -9 and the proces stop as expected. But this is not the case for the processes hanging in relation of older lxc publish for production containers when I send kill -9 to them just start another proces.