Backup specific files in container from host

vic-t · November 14, 2022, 4:59pm

I’m wondering what the best approach is to regularly backup specific files from a container.

Intuitively, I would do something like a “container mount” so that I can directly backup the files needed. I’m not sure how to do this, though.

lxc file mount does not seem to be the solution, or at least I don’t know how to run it in a script. The same goes for mounting the lxd namespace using nsenter. In both cases, a separate shell is needed to access the files.

So, I guess I could run lxc file pull as an alternative but this can generate quite some overhead because now every file will be copied twice, once for the pull, then again for the actual backup.

Seeing as how I use lvm for my storage pool, the better option would probably be to just create a snapshot and then mount the snapshot volume to access the files this way. But that wouldn’t be a universal LXD solution.

What would you recommend?

vic-t · November 14, 2022, 8:23pm

Another way I just thought of is to mount a host volume/directory in the container, then copy over the contents to be backed up and then run the backup routine from the host. But it still means I have the additional step of copying the data once more than really needed and having to clean up after it.

So, I guess I could mount the directory in such a way that the data that I want to backup is actually held on the host volume all the time. This way, I don’t have the additional step of copying data and clean-up. But since the data is already present in the potential mount points, some data shifting will be needed, making this a bit more complex than I hoped it would be.

tomp · November 15, 2022, 9:07am

What is the reason you are trying to do this?

vic-t · November 15, 2022, 9:59am

Not sure I understand the question but I’ll elaborate.

So I have a container that is fulfilling some function, for example it’s a CRM or an ERP. There is a data part, such as the database and a file archive as well as the applications part.

The applications part is pretty much static so there is no real need to back it up every day. The data part changes all the time, though, so I want to make sure I have it backed up several times per day.

I could do this from within the container, of course, but it would be much cleaner and easier if I can run the backup from the host. But running lxc export multiple times per day is just not efficient, and deduplication of the backup becomes more difficult, I believe, with a compressed file.

I hope this makes sense. So, I’m just trying to figure out what more experienced people than me are doing to back up data within a container. And if you believe I should configure my containers differently in the first place, I’m all ears for advice.

tomp · November 15, 2022, 11:22am

Thanks, this is useful to understand. Often we are getting questions asking about how to do something particular (in this case backup selective files) without having explained the why of the problem.

Now that you’ve explained why, it makes total sense, thanks.

What I would suggest is using a custom volume for the data part and then attach that to your container so it appears inside your container but is stored elsewhere.

This can be on the same storage pool as the container, or a different one, it doesn’t matter.

Once that is in place you can then backup the storage volume separately from the container.

vic-t · November 15, 2022, 12:00pm

Thanks for your recommendation, @tomp.

I looked into the storage volume story yesterday but I’m not sure it solves my main issues.

My understanding is that a volume of the type file system can only be attached to one specific path in the container. If, for example, I have data in paths /var/x1 and /var/x2 on the container, I would have to work with two volumes. That will get messy very quickly. So, as an alternative, I guess I could work with a block volume and then, from within the container, mount folders on that volume to various paths within the system. The main downside here is that a block volume should only be mounted to one container, so I could not use the same volume to hold data for various containers at the same time.
I again have the problem that I cannot mount the volume on the host, meaning I can’t access the directory structure directly but instead, according to your link, should run an export first and then back up the exported archive. Yes, this helps me avoid copying unnecessary application files but I still have a copy process instead of simply having my backup software access the files directly. And because the export process creates a compressed archive, neither deduplication nor incremental backup will work properly, so, I assume the backup will take up more space than necessary.

Does that make sense or am I overlooking something?

tomp · November 15, 2022, 12:44pm

In that case lxc file [pull|mount] will be the only “supported” way of accessing files.

See How to access files in an instance - LXD documentation

Because LXD provides its services over a (potentially remote) API, we do not offer any supported way of mounting the local volumes on the local system.

Of course you are free to do so anyway (with the caveat that the specific underyling volume names may change in the future, although unlikely)

tomp · November 15, 2022, 12:48pm

LXD does have the concept of doing a --refresh copy of an instance to another LXD server too.
Combined with snapshots, this can make periodically taking an off-machine backup more efficient, as only the differences will be copied.

vic-t · November 15, 2022, 12:50pm

That makes sense…

Is there any way you know of to use “lxc file mount” and then access the files right away, i.e. from a script? If yes, that would solve all my issues, I believe…

And yes, I guess that in my specific use case, I may end up doing just that. Not very “LXD-y” but it should work.

TBH, I haven’t worked with remote LXD servers yet. I agree it’s an elegant solution but more costly and complex.

By the way, any reason you’re not in favor of mounting local directories read-write to the container?

tomp · November 15, 2022, 12:51pm

I don’t follow what you mean by that approach?

tomp · November 15, 2022, 12:52pm

You can run that command in the background by starting it with & and capturing the pid with $1 and then killing it when you’re done.

Something like:

#!/bin/bash
set -ex

lxc file mount c1/ /home/user/foo &
pid=$!
#Wait for mount to occur
sleep 1
ls /home/user/foo
kill $pid

vic-t · November 15, 2022, 12:53pm

I mean, instead of creating a LXD volume and mounting it in the container, mount a directory from the host, i.e.

lxc config device add c1 foo disk source=/home path=/home

Only thing here is that you have to make sure that uid and gid mapping work out…

tomp · November 15, 2022, 12:53pm

Sure you could do that too.
But that would get as unwieldy as custom volumes if you have lots of sub-directories (which it sounds like you do).

vic-t · November 15, 2022, 12:54pm

Oh, I believe that may be the solution here… I’ll try this out and report back, thanks!

vic-t · November 15, 2022, 12:58pm

Not sure it would. The way I look a this, I could create, on the host, directories for each container, so something like:

/cdata/c1
/cdata/c2
/cdata/c3

Then, within each of these directories, I’d prepare subdirectories that could be put in the right place, for example

/cdata/c1/data to be mounted in c1-rootfs/var/x1, and so on.

So, I may end up with multiple volumes of type disk in the container but they are all under /cdata on the host. So, that’s all I’d have to back up to back up the data to all my containers.

Again, do let me know if you see problems in this approach.

tomp · November 15, 2022, 1:00pm

I’ve updated my post with an example script.

tomp · November 15, 2022, 1:02pm

Yes I see. You could achieve the same effect by using a dir storage pool for each container’s custom volumes with a custom source=/cdata/c1 and then all of the custom volumes would be created inside that directory.

vic-t · November 15, 2022, 2:02pm

So, I changed my backup script, adding your lines to it, and it all works like a charm. And it’s perfect for my use case, so much cleaner and easier than all the other options we discussed. Thanks a lot for your support, @tomp

tomp · November 15, 2022, 2:10pm

Excellent!

vic-t · November 16, 2022, 8:19pm

I realize one major downside of the solution using lxc file mount. Unless you’re shutting down the container before running the command (which works fine, by the way), you may end up accessing data that is currently in use, being changed.

When I used to work directly on a storage driver level, i.e. lvm, I was always working with snapshots so I could leave the instance running and still access every file safely. Quite convenient.

Seeing as how lxc file mount works on a shut down system, is there a way to get it to work on a snapshot? If not, is this a realistic feature request or would that be too complex?