Restore Container on Different Host uid and gid shifted to 1000000

Hello everyone, how are you?

I hope you can help me with this problem.
I’m an infrastructure analyst at a company that uses LXD/LXC on most of its machines, with many containers. I am positioned in the Backups/Restore field, which is new for me, in addition to having only 3 years as a Linux analyst.
Well, we have two Backup servers where one is a mirror of the other, and as we don’t have the Tapes Library it was decided to do the Offline Backup in the Cloud. The Script for this backup was created three years before I joined the Company.

Until I joined, no one knew the condition of the backups, how to use them and what procedures were used to restore them.
Well, I decided to carry out this task, and within the scope of local servers, everything went smoothly.
The problem is with Cloud Backup.

Note that I can download the system files. However, the root ids changed to 1000000.
The company’s policy is to keep containers in non-privileged mode for security, which causes this fact.
However, when this image is imported into LXC, most processes do not work, as the host machine’s LXD/LXC does not recognize the IDS mapping and internally to the container they remain as 1000000.
This means that the container’s internal processes do not work properly.
I don’t know if it’s appropriate, but as the Backup was made from the file system, the process I use to perform the Restore after the Download is:

1- Once I have the backup.yaml and matadata.yaml, as well as the root fs, I mount a .tar.gz file.
2- using the import command, I import this container via tar.gz into LXD/LXC, creating an image of it.
3-Launch this image as a container via lxc launch.

The Container even starts, however all files have their uids and gids changed to 1000000.

I’ve already tried doing it on different Hostnodes as well as on the same hostnode.

The result is always the same.

Below I will post the information that I believe is necessary to help with the troubleshoot.

# lxc version 
Client version: 5.0.3
Server version: 5.0.3

Lxc Container config

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20230503_07:42)
  image.name: ubuntu-jammy-amd64-default-20230503_07:42
  image.os: ubuntu
  image.release: jammy
  image.serial: "20230503_07:42"
  image.variant: default
  security.privileged: "false"
  volatile.base_image: 156f8cf4f381f067fca3d29c766222b5c75680b89c6d73e5f5e70a6128df2e15
  volatile.cloud-init.instance-id: 268b2e3d-559a-4dec-a63f-538ad8ffa6d0
  volatile.eth0.host_name: vethf81e05d0
  volatile.eth0.hwaddr: 00:16:3e:53:55:fe
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,">
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Map>
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":>
  volatile.last_state.power: RUNNING
  volatile.uuid: 1cc88a9f-3efe-45c7-9ecf-1cbb3bc2a7df
  volatile.uuid.generation: 1cc88a9f-3efe-45c7-9ecf-1cbb3bc2a7df
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""
created_at: 2024-04-04T14:43:00.51204393Z
name: zabbix-restore-test00
status: Running
status_code: 103
last_used_at: 2024-05-23T14:07:02.87278368Z
location: none
type: container
project: default

Here is a screenshot from inside the container.

2024-05-23_16-25

The screen shot from cloud download file system:

:~# cat /etc/subuid
ubuntu:100000:65536
linuxuser:165536:65536

:~# cat /etc/subgid
ubuntu:100000:65536
linuxuser:165536:65536

If you guys need more information let me know.

Hope you can help me with this.

Oh dear. A backup which is not test-restored is no backup at all :slight_smile:

Questions:

  1. What do you mean by “I mount a .tar.gz file”? How was this .tar.gz created in the first place?
  2. What does tar -tvzf blah.tar.gz show for the uids of the files which should be root owned?
  3. How are the backups being made? Are you just backing up the whole underlying server/VM, and yet trying to restore individual containers from that whole-machine backup?
  4. If you create a fresh unprivileged container on the restore machine, does it work? i.e. root appears as uid 0 inside the container, but uid 1000000 outside? (If uid mapping is broken on the restore machine, you need to fix that first)

Aside: if you had backed up containers with lxc export, then lxc import should work directly. Going via an intermediate step of creating an image then launching it is a bit painful.

Assuming uid mapping is working, an approach I know works is to restore the entire container directory (which includes rootfs, backup.yaml and metadata.yaml) directly into the container storage area on the filesystem, then use incus admin recover to discover it and recreate the instance.

I don’t use lxd any more, I migrated to incus, so I can’t say if this will work with lxd (especially not the old LTS version you are using)

If you can’t find a clean way to restore, you might need to mess with changing the uids of files. I have a python script somewhere that I used to fix up a broken container, I will dig it out if necessary.

hi Brian, Here are the answers for your question.

Questions:

Q- What do you mean by “I mount a .tar.gz file”? How was this .tar.gz created in the first place?
R- Firstly I download the filesystem from the cloud to the server which I intent to start the container, then I simply run a tar -czvf .tar.gz * in the directory where the rootfs is.

Q-What does tar -tvzf blah.tar.gz show for the uids of the files which should be root owned?
R- I noticed that, since when I create the backup all the uids were already shifited to 1000000. so When I download and compress them the uids are 1000000.

Q- How are the backups being made? Are you just backing up the whole underlying server/VM, and yet trying to restore individual containers from that whole-machine backup?
R- Well This is the Backup Process:
1- A Snpashot is taken from the container’s Dataset via ZFS snapshot.
2- This snapshot is mounted using a ZFS Mount, and accessed to make a copy of the Filesystem.
3- a direct copy of the container’s filesystem is made to the cloud. and stored it.
4- This is not related to backup, but rather to restore. When you download what was copied to the cloud, it looks like the screenshot below.

The main steps of this process is below.

Q- If you create a fresh unprivileged container on the restore machine, does it work? i.e. root appears as uid 0 inside the container, but uid 1000000 outside? (If uid mapping is broken on the restore machine, you need to fix that first)
R- The LXD/LXC is working perfectly. I am able to create now containers provilege and unprovilege. I also made a test, creating and restore a privilege container and it works (just informing) When I do the creation of new container inside the container it appears as root.

These are the steps for copy the files. Note that the container has its uids and gids as root in the screen shot above.

root@sfosj0001-vm-d-backuptest-01:/mnt/zpool1# lxc snapshot ubuntu-caontainer-test test-curret
root@sfosj0001-vm-d-backuptest-01:/mnt/zpool1# mount -t zfs zpool1/containers/ubuntu-caontainer-test@snapshot-test-curret /mnt/
root@sfosj0001-vm-d-backuptest-01:/mnt/zpool1# cd /mnt/test01/
root@sfosj0001-vm-d-backuptest-01:/mnt/test01# ls -l 
total 8
-r--------  1 root    root    3639 May 24 11:17 backup.yaml
-rw-r--r--  1 root    root     295 May 14 02:33 metadata.yaml
drwxr-xr-x 18 1000000 1000000   24 May 14 02:16 rootfs
drwxr-xr-x  2 root    root       3 May 14 02:33 templates
root@sfosj0001-vm-d-backuptest-01:/mnt/test01# 

we use the LXC import and export for the on premisse backup. and on this task they work perfectly.

I Just need to understand what is the trick to make the LXC see the new container and shift back “internally” the uids to root when imported to the new hostnode.

OK. So it looks to me the problem is basically that you’re using different processes to backup and restore.

You backed up just by copying the underlying zfs filesystem (where the uids are shifted for unprivileged containers), but you’re trying to restore using lxc import (which assumes the uids are correct in the tarball, as would be the case if you’d used lxc export)

I think the simplest approach is:

  • Restore the filesystem into a new zfs dataset in the new location
  • Use whatever the lxd equivalent is to incus admin recover to pick it up

Find out what your storage layout is. For example, here is one running container I have:

# mount | grep nfsen
zfs/lxd/containers/nfsen on /var/lib/incus/storage-pools/default/containers/nfsen type zfs (rw,relatime,xattr,posixacl,casesensitive)
# ls /var/lib/incus/storage-pools/default/containers/nfsen
backup.yaml  metadata.yaml  rootfs  templates

To restore “foobar”, I would create a new dataset zfs/lxd/containers/foobar and underneath it I would put the same directory structure, with backup.yaml and metadata.yaml and rootfs. Note that the dataset doesn’t have to be mounted (in fact, probably shouldn’t be mounted)

Then, incus admin recover will scan the dataset, find this newly-created container and offer to restore it (i.e. add it to the database of known containers). Since the uids are already shifted, and the container is defined as an unprivileged one, everything should be correct.

I would guess something like this works for lxd too.

Hello Brian.

First of all, thank you for your patience and quick and accurate responses.

After your last answer where you made it clear that it was very difficult to try to backup in one way and restore in another, I noticed that I was making the process difficult. So I went back to the Script that performed the backup tasks and applied the table test to it. So I mapped each step of the process, and then applied the steps in reverse in an attempt to complete the restore.
INCREDIBLE, it worked.
I did this test with two other containers that gave the same error and I managed to get it to work on these machines too.
Thanks again.
Now I have the restore process defined and working.
Hugs.