Lxc init failed - Error adding configuration item volatile.eth0.hwaddr

Hi,

I had an already running system which is now unable to create new containers.

I get:

lxc init sdk-base
Creating the container
Error: open /tmp/lxd_config_408598002: no such file or directory

Writing the lxd daemon log shows:

metadata:
  context:
    ephemeral: "false"
    name: knowing-octopus
  level: info
  message: Creating container
timestamp: "2018-11-08T13:01:45.561414035+01:00"
type: logging


metadata:
  context: {}
  level: dbug
  message: Error adding configuration item volatile.eth0.hwaddr = 00:16:3e:72:c2:c2
    to container 25
timestamp: "2018-11-08T13:01:45.586364211+01:00"
type: logging


metadata:
  context: {}
  level: dbug
  message: 'Database error: bindings.Error{Code:2067, Message:"UNIQUE constraint failed:
    containers_config.container_id, containers_config.key"}'
timestamp: "2018-11-08T13:01:45.586506707+01:00"
type: logging


metadata:
  context: {}
  level: dbug
  message: Error adding configuration item volatile.eth0.name = eth0 to container
    25
timestamp: "2018-11-08T13:01:45.586931371+01:00"
type: logging

[...]

metadata:
  class: task
  created_at: "2018-11-08T13:01:45.547178008+01:00"
  description: Creating container
  err: 'open /tmp/lxd_config_927979998: no such file or directory'
  id: 8cdaf52e-3786-446f-81b6-8894fd38866b
  may_cancel: false
  metadata: null
  resources:
    containers:
    - /1.0/containers/knowing-octopus
  status: Failure
  status_code: 400
  updated_at: "2018-11-08T13:01:45.547178008+01:00"
timestamp: "2018-11-08T13:01:46.05522656+01:00"
type: operation

So I checked the database, but there are no entries within containers_config causing the UNIQUE constraint to fail.

Any idea how to proceed further?

Thanks

Mike

Update: This statement is wrong. I initially checked the database via lxd SQL global .dump but this command is not showing the containers_config entries. If I open a copy of the database with sqlitebrowser I see I have entries for containers within containers_config, but the containers do not exist anymore.

It seems these entries are somehow not deleted during container deletion.

If I restart the system, these entries are gone.

How can this happen?

Thanks

Mike

Hi!

What LXD version are you using? There has been quite some recent work in DB handling in LXD (https://github.com/lxc/lxd/tree/master/lxd/db), so the answer might depend on this.

Also, can you show a full example that demonstrates the issue with the database? You mention lxd SQL global .dump, is that a command?

The initial error about /tmp/... is usually explained by your /tmp on the host having been partly wiped, specifically the directory that’s exposed to LXD as it’s /tmp in the snap.

There isn’t really any good way to repair that kind of breakage on a running system, your best luck would be to reboot the system.

Specifically what was deleted from the host is a directory called /tmp/snap.0_lxd_XXXXXXX which is mapped to /tmp in the snap’s mount namespace. The no such file or directory error comes as a result of /tmp itself being invalid.

For the database error, could you run lxd sql global "SELECT * FROM containers_config WHERE container_id NOT IN (SELECT id FROM containers);"

That would let us check whether there’s indeed something wrong with foreign keys in your DB or if it’s something else.

Thanks for the hint with the temp-file. On the build-nodes was tmpreaper installed.
After adding /tmp/snap.* to TMPREAPER_PROTECT_EXTRA the error was gone.

Regarding the database: I had 3.0.x installed on the build nodes and upgraded to 3.7. After the upgrade, the occasional database issues seem to be gone.

Thanks a lot for the help.