Why owner of most file in container is 1000000

Hi, LXD is very useful. But I have some problems recently.
LXD init some bad container some times, owner of most file in container is 1000000.
And I can not exec command on host, such as lxc shell xxx.

Owner of most file in normal container is root.

Has anyone had the same problem?

lxc shell is an alias, it is defined as exec @ARGS@ -- su -l. So lxc shell <instance> ls / translates to lxc exec <instance> ls / -- su -l

Instead, try lxc exec <instance> -- ls /.

Thank you for your replying.

My LXD cluster will create some bad instance sometimes. I can not exec shell on these instances, such as “lxc shell $instance_name”. When I exec “lxc shell $instance_name ls /”, I found that owner of most file in container is 1000000.
I can not found out why file owner is 1000000.

Can you please try the same commands using exec instead of shell e.g.

lxc exec shpc-1106-instance-idpgYnLq -- whoami
lxc exec shpc-1106-instance-idpgYnLq -- sh
lxc exec shpc-1106-instance-idpgYnLq -- ls /

and post the result. Thanks.

Sure. I can exec command on host successfully.

But when I logon the instance with user (not root), I get this warning because owner of many important file is 1000000. How can I fix this problem?

Thank you for your replying.

Hi @libinkai, LXD performs idmapping for security reasons but it is not clear why these files are owned by 1000000. Your instance already contains a user, is this a custom image? If so, it is likely that this is due to misconfiguration of the image rather than an issue with LXD itself.

Can you please post steps to reproduce this issue, including the version of LXD you are using, the image you are using here, and its configuration. Thank you.

Please can you show output of lxc config show <instance> --expanded

hers is output of lxc config show <instance> --expanded, thanks~

xiyou@lxd1:~$ lxc config show shpc-1106-instance-idpgYnLq --expanded
architecture: x86_64
config:
cloud-init.network-config: |
#cloud-config
version: 2
ethernets:
eth0:
dhcp4: ‘no’
addresses: [172.16.10.153/16]
nameservers:
addresses: [172.16.0.1, 223.5.5.5, 114.114.114.114, 8.8.8.8]
gateway4: 172.16.0.1
cloud-init.user-data: |
#cloud-config
runcmd:
- [mkdir, -p, /home/txb/jupyter_home]
- [chmod, ‘777’, /home/txb/jupyter_home]
- [docker, run, --name, base-jupyter-notebook, -d, --restart=always, -p, ‘8888:8888’,
-v, ‘/home/txb/jupyter_home:/home/jovyan’, ‘jupyter/base-notebook:ubuntu-22.04’,
start-notebook.sh, ‘–NotebookApp.password=’‘sha1:x9rQgX6qCAybC1ND:94333e0668c319ee8c13acf3feb8b231a116bfee’’’]
- [sh, -c, echo $(openssl rand -hex 16) > /var/lib/rstudio-server/secure-cookie-key]
- [sh, -c, service rstudio-server restart]
apt:
primary:
- uri: https://mirrors.aliyun.com/ubuntu/
arches: [default]
timezone: Asia/Shanghai
ssh_pwauth: ‘yes’
packages: [openssh-server]
users:
- lock_passwd: false
shell: /bin/bash
passwd: $6$FkVfhFgV6Ota$y.QcYb1Y7E4KBUAIeQoZPotYZTJT33GpgpyX6QNT09DfDYieru5omhvcHZjuHogszMgoJ2qwX3U1Hc.CVEz2N/
name: txb
groups: [sudo]
image.architecture: amd64
image.description: Ubuntu focal amd64 (cloud) (20221119_07:42)
image.name: ubuntu-focal-amd64-cloud-20221119_07:42
image.os: ubuntu
image.release: focal
image.serial: “20221119_07:42”
image.variant: cloud
limits.cpu: “20”
limits.memory: 33GB
security.nesting: “true”
volatile.base_image: c92c964c1f4682ed741a8d669ff9987f3ffa5087f15cbbc2dde9836fa64bb2dc
volatile.cloud-init.instance-id: 06bd3672-cfe9-4472-9f2a-7b5066c4ccfa
volatile.eth0.host_name: veth6f48edd8
volatile.eth0.hwaddr: 00:16:3e:47:78:8d
volatile.idmap.base: “0”
volatile.idmap.current: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.power: RUNNING
volatile.uuid: 6ea5cc39-77a8-4aad-89bd-015987965020
devices:
eth0:
ipv4.address: 172.16.10.153
name: eth0
nictype: bridged
parent: bridge0
security.ipv4_filtering: “true”
security.mac_filtering: “true”
security.port_isolation: “true”
type: nic
home:
path: /home
pool: remote
source: custom-volume-of-1106-50-jRO8yaRf
type: disk
root:
path: /
pool: remote
size: 69GB
type: disk
ephemeral: false
profiles:

  • default
    stateful: false
    description: Instance of txb

Yes, I use custom image.
My LXD version is 5.10.
I build custom image base images:ubuntu/20.04/cloud, I install docker, rstudio on base-instance and publish it to build base image.
I create about 100 instances using this image in my LXD cluster, and I got about 10 bad instances. The others works very well. I can not find out the steps to reproduce this issue, because I don’t when I will got a bad instance. :sob:

I really need help for this issue :sob:.

It looks like the shifting process got interrupted partway at instance launch time.
Did you have any indication of LXD crashing or getting restarted for some reason?

The LXD cluster run smoothly when I create instance.

My LXD cluster use ceph osd as storage backend and My custom image’s size is 3.4GB. Will it cause this kind of problem?

My custom image contains many R language packages so it is large than the default image.

What kernel version are you on?

The shifting logic shouldn’t actually be used at all when on ceph provided you’re running something reasonably recent kernel-wise. So it could be that you can massively improve your launch time and reliability with a simple kernel update.

My Operating System version: Ubuntu 20.04.5 LTS
My Kernel version : Linux 5.4.0-147-generic
Architecture: x86-64

I’d recommend you install linux-generic-hwe-20.04 which will get you the 5.15 kernel that will then have no need for the slow shifting logic.

OK, I will try it. Thank you very very much!

Hi, today I scan my LXD cluster using script and I found that I have about 10% bad instances.
Is there any way to fix these errors? I can not delete them directly, because some instnaces already been used.

And I found that the good instances (the owner of root file is root), will become bad instances at some point. Can I fix them manually?

Hi, I install linux-generic-hwe-20.04 and get 5.15 kernel.
But my instances can not use docker after upgrading kernel.