After shutting down a container, backup.yaml appears in the mount directory

So, this has me scratching my head…

I have a few containers that I use for development, etc to keep environments isolated. For one of them, called “applications”, when I stop the container something weird happens. A file called backup.yaml is placed in the mount directory. So when I go to start it back up again, I get this message:

Error: Common start logic: Failed to run: zfs mount default/containers/applications: cannot mount '/var/snap/lxd/common/lxd/storage-pools/default/containers/applications': directory is not empty Trylxc info --show-log applicationsfor more info

If I go look at /var/snap/lxd/common/lxd/storage-pools/default/containers/applications, why sure enough, it’s not empty; backup.yaml is there. If I delete backup.yaml, then I can start the container normally and it’s all good.

Note: This behavior does not seem to be consistent. That is, the file backup.yaml doesn’t seem to always appear. However, it only happens for this one container in any case. Also, the container didn’t always do this. It started a few weeks ago. I was thinking about just re-creating the container, but it would be nice to understand what is happening.

Any ideas?

lxc info output please

Thanks @stgraber… you are always super awesome for helping out.

lxc info applications --show-log

Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/02/10 11:08 UTC
Status: Running
Type: container
Profiles: default
Pid: 6223
Ips:
  eth0:	inet	10.60.163.126	vethde4fcc74
  eth0:	inet6	fd42:d9c2:c4ab:e0ac:216:3eff:fe13:c226	vethde4fcc74
  eth0:	inet6	fe80::216:3eff:fe13:c226	vethde4fcc74
  lo:	inet	127.0.0.1
  lo:	inet6	::1
Resources:
  Processes: 693
  Disk usage:
    root: 8.16GB
  CPU usage:
    CPU usage (in seconds): 7142
  Memory usage:
    Memory (current): 4.48GB
  Network usage:
    eth0:
      Bytes received: 160.32MB
      Bytes sent: 100.69MB
      Packets received: 319709
      Packets sent: 299948
    lo:
      Bytes received: 1.71MB
      Bytes sent: 1.71MB
      Packets received: 19156
      Packets sent: 19156
Snapshots:
  update (taken at 2020/04/06 02:12 UTC) (stateless)

Log:

lxc applications 20200406021826.923 ERROR    cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.applications"
lxc applications 20200406021826.924 ERROR    cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.applications"
lxc applications 20200406021826.924 ERROR    utils - utils.c:lxc_can_use_pidfd:1855 - Invalid argument - Kernel does not support waiting on processes through pidfds
lxc applications 20200406021826.927 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1455 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )

Also including lxc config show applications --expanded in case it helps

architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Ubuntu 19.04 server (20190724)
  image.os: ubuntu
  image.release: disco
  limits.cpu: "2"
  limits.memory: 10GB
  raw.idmap: both 1000 1000
  security.nesting: "true"
  volatile.base_image: 1169dae22e484d31cd4056803f56d4aa0c80272a4d776085b67aeb7054aff738
  volatile.eth0.host_name: vethde4fcc74
  volatile.eth0.hwaddr: 00:16:3e:13:c2:26
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":999998999},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":999998999}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":999998999},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":999998999}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":999998999},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":999998999}]'
  volatile.last_state.power: RUNNING
devices:
  PACookie_toochevere:
    path: /home/toochevere/.config/pulse/cookie
    source: /home/toochevere/.config/pulse/cookie
    type: disk
  Xauthority_toochevere:
    path: /home/toochevere/.Xauthority
    source: /run/user/1000/gdm/Xauthority
    type: disk
  Xdisplay:
    path: /tmp/.X11-unix/X0
    source: /tmp/.X11-unix/X1
    type: disk
  commontools:
    path: /common-tools
    source: /common-tools
    type: disk
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  home-Documents:
    path: /home/toochevere/Documents
    source: /home/toochevere/Documents
    type: disk
  home-Downloads:
    path: /home/toochevere/Downloads
    source: /home/toochevere/Downloads
    type: disk
  home-Music:
    path: /home/toochevere/Music
    source: /home/toochevere/Music
    type: disk
  home-Pictures:
    path: /home/toochevere/Pictures
    source: /home/toochevere/Pictures
    type: disk
  home-Public:
    path: /home/toochevere/Public
    source: /home/toochevere/Public
    type: disk
  home-Templates:
    path: /home/toochevere/Templates
    source: /home/toochevere/Templates
    type: disk
  home-Videos:
    path: /home/toochevere/Videos
    source: /home/toochevere/Videos
    type: disk
  hostgpu:
    gid: "1000"
    type: gpu
    uid: "1000"
  root:
    path: /
    pool: default
    type: disk
  video0:
    path: /dev/video0
    source: /dev/video0
    type: disk
  video1:
    path: /dev/video1
    source: /dev/video1
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Just the lxc info output please, no arguments passed to it.

sorry! Totally misunderstood your clear directions.

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICBDCCAYqgAwIBAgIRAMLRiN03egvrlTnIBkuStjAwCgYIKoZIzj0EAwMwNDEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEUMBIGA1UEAwwLcm9vdEBjb2Jh
    bHQwHhcNMjAwMjEwMjM1NzM2WhcNMzAwMjA3MjM1NzM2WjA0MRwwGgYDVQQKExNs
    aW51eGNvbnRhaW5lcnMub3JnMRQwEgYDVQQDDAtyb290QGNvYmFsdDB2MBAGByqG
    SM49AgEGBSuBBAAiA2IABMgB0TAVI2n/bBDs9i4/EWVqd+kAVh48nAokdFvSv3nG
    PwrfQapQdMoFySYeOoQ7ZSGS7L5kvO7qHd44THBn1rXiIr74rCkKitOSkxGj21cK
    eN/i04YlNq7Bs6Bvz/VGu6NgMF4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
    CCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwKQYDVR0RBCIwIIIGY29iYWx0hwR/AAAB
    hxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMFVWNqXJlM21d4jt
    ZKwoPz5TYhhaXEk0kwBfphRPHHVJcipSXxT8V4jKO8cuG0PfAAIxAM0q6eEQVJO9
    +qB9zoygF/cGxHC2BX2ymG3Z4jJOwcKzcH5okniydXVPThMhgkA3gQ==
    -----END CERTIFICATE-----
  certificate_fingerprint: 877ad7b541c8fb69e0daf413cc4d91dc39245f80545d3b48232dabc4f34c5edd
  driver: lxc
  driver_version: 4.0.0
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.3.0-7629-generic
  lxc_features:
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    seccomp_notify: "true"
  project: default
  server: lxd
  server_clustered: false
  server_name: cobalt
  server_pid: 5019
  server_version: 4.0.0
  storage: zfs
  storage_version: 0.8.1-1ubuntu14.3

Thanks.

So I’m not sure why it happens to that one container and not others, that’s a bit confusing.

@tomp can we tweak the backup.yaml logic further so this can’t happen?

I will take a look at this now.

Please can you enable LXD debug mode in the snap:

sudo snap set lxd daemon.debug=true
sudo systemctl reload snap.lxd.daemon

Then look in the logs when starting and stopping the problem container until you hit the issue again.

I’d like to see what the logs say during both startup and shutdown of the container.

I would expect to see entries like this:

DBUG[04-07|13:58:04] MountInstance started                    driver=zfs pool=default project=test instance=c1
DBUG[04-07|13:58:04] Mounted ZFS dataset                      driver=zfs pool=default dev=default/containers/test_c1 path=/var/lib/lxd/storage-pools/default/containers/test_c1
DBUG[04-07|13:58:04] MountInstance finished                   driver=zfs pool=default project=test instance=c1
DBUG[04-07|13:58:04] UpdateInstanceBackupFile started         driver=zfs pool=default project=test instance=c1
DBUG[04-07|13:58:04] UpdateInstanceBackupFile finished        driver=zfs pool=default project=test instance=c1
1 Like

@stgraber as far as I can see all occurrences of writing the backup.yaml happen inside a MountTask() so the volume should be mounted. I’d like to see the system logs when it happens to see if there is some sort of ordering or race condition issue.

give me the command(s) you’d like me to run and I’ll post the output. (just to be sure I hit the things).

Normally its sudo journalctl -f to tail the logs then just need the bits that are generated when u start and stop the container before and as the issue occurs. Once debug mode is enabled.

forgive the incredibly stupid question… how do I enable debug mode?

I posted the commands to do that earlier in this thread.

@tomp I followed the directions, but my confusion was coming from the fact that they didn’t seem to me to produce the debug log items. I’ve seen other cases where setting snap settings seemed to not have an effect; not sure if that’s happening here too.

In any case, I realized contrary to what I originally stated, it’s not on the shutdown where it it is happening, but rather on on the startup. Once it happens and I delete the backup.yaml file, the next time I try to start up the container it seems to always be successful.

Here is the log output for startup. I edited out some obviously unrelated entries, but left the rest. Like I said, I don’t see any “DBUG” items. I verified that daemon.debug is true for lxd and I rebooted after setting it to be sure. I can provide the stop logs next if you still want them. If you want to continue to chase this, I’ll stay with it. On the other hand, I can also recreate the container since it seems that I am the only one seeing it.

Apr 10 22:13:26 cobalt audit[16703]: AVC apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=16703 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:26 cobalt kernel: audit: type=1400 audit(1586571206.355:264): apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=16703 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:26 cobalt audit[16703]: AVC apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=16703 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:26 cobalt kernel: audit: type=1400 audit(1586571206.359:265): apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=16703 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:26 cobalt NetworkManager[1574]: <info>  [1586571206.4169] manager: (vethdf47ddd8): new Veth device (/org/freedesktop/NetworkManager/Devices/19)
Apr 10 22:13:26 cobalt systemd-udevd[16747]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 22:13:26 cobalt systemd-udevd[16747]: Using default interface naming scheme 'v240'.
Apr 10 22:13:26 cobalt systemd-udevd[16747]: vethdf47ddd8: Could not generate persistent MAC: No data available
Apr 10 22:13:26 cobalt systemd-udevd[16749]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 22:13:26 cobalt systemd-udevd[16749]: Using default interface naming scheme 'v240'.
Apr 10 22:13:26 cobalt systemd-udevd[16749]: vethe4372744: Could not generate persistent MAC: No data available
Apr 10 22:13:26 cobalt NetworkManager[1574]: <info>  [1586571206.4183] manager: (vethe4372744): new Veth device (/org/freedesktop/NetworkManager/Devices/20)
Apr 10 22:13:26 cobalt kernel: lxdbr0: port 1(vethe4372744) entered blocking state
Apr 10 22:13:26 cobalt kernel: lxdbr0: port 1(vethe4372744) entered disabled state
Apr 10 22:13:26 cobalt kernel: device vethe4372744 entered promiscuous mode
Apr 10 22:13:26 cobalt audit[16800]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-applications_</var/snap/lxd/common/lxd>" pid=16800 comm="apparmor_parser"
Apr 10 22:13:26 cobalt kernel: audit: type=1400 audit(1586571206.495:266): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-applications_</var/snap/lxd/common/lxd>" pid=16800 comm="apparmor_parser"
Apr 10 22:13:26 cobalt kernel: eth0: renamed from vethdf47ddd8
Apr 10 22:13:26 cobalt kernel: lxdbr0: port 1(vethe4372744) entered disabled state
Apr 10 22:13:26 cobalt kernel: device vethe4372744 left promiscuous mode
Apr 10 22:13:26 cobalt kernel: lxdbr0: port 1(vethe4372744) entered disabled state
Apr 10 22:13:26 cobalt NetworkManager[1574]: <info>  [1586571206.6282] device (vethe4372744): released from master device lxdbr0
Apr 10 22:13:26 cobalt lxd.daemon[27943]: t=2020-04-10T22:13:26-0400 lvl=eror msg="Failed starting container" action=start created=2020-02-10T08:08:03-0300 ephemeral=false name=applications project=default stateful=false used=2020-04-08T08:17:19-0400
Apr 10 22:13:27 cobalt lxd.daemon[27943]: t=2020-04-10T22:13:27-0400 lvl=eror msg="The stop hook failed" container=applications err="Failed to run: zfs unmount default/containers/applications: umount: /home/snap/lxd/common/lxd/storage-pools/default/containers/applications: not mounted\ncannot unmount '/home/snap/lxd/common/lxd/storage-pools/default/containers/applications': umount failed"
Apr 10 22:13:34 cobalt audit[16864]: AVC apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=16864 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:34 cobalt kernel: audit: type=1400 audit(1586571214.695:267): apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=16864 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:34 cobalt audit[16864]: AVC apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=16864 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:34 cobalt kernel: audit: type=1400 audit(1586571214.699:268): apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=16864 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:34 cobalt audit[16910]: AVC apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=16910 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:34 cobalt audit[16910]: AVC apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=16910 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:34 cobalt kernel: audit: type=1400 audit(1586571214.751:269): apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=16910 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:34 cobalt kernel: audit: type=1400 audit(1586571214.751:270): apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=16910 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:34 cobalt NetworkManager[1574]: <info>  [1586571214.8229] manager: (veth46590781): new Veth device (/org/freedesktop/NetworkManager/Devices/21)
Apr 10 22:13:34 cobalt systemd-udevd[16954]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 22:13:34 cobalt systemd-udevd[16954]: Using default interface naming scheme 'v240'.
Apr 10 22:13:34 cobalt systemd-udevd[16954]: veth46590781: Could not generate persistent MAC: No data available
Apr 10 22:13:34 cobalt systemd-udevd[16956]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 22:13:34 cobalt systemd-udevd[16956]: Using default interface naming scheme 'v240'.
Apr 10 22:13:34 cobalt systemd-udevd[16956]: vethbc785f11: Could not generate persistent MAC: No data available
Apr 10 22:13:34 cobalt NetworkManager[1574]: <info>  [1586571214.8240] manager: (vethbc785f11): new Veth device (/org/freedesktop/NetworkManager/Devices/22)
Apr 10 22:13:34 cobalt kernel: lxdbr0: port 1(vethbc785f11) entered blocking state
Apr 10 22:13:34 cobalt kernel: lxdbr0: port 1(vethbc785f11) entered disabled state
Apr 10 22:13:34 cobalt kernel: device vethbc785f11 entered promiscuous mode
Apr 10 22:13:34 cobalt audit[17009]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-applications_</var/snap/lxd/common/lxd>" pid=17009 comm="apparmor_parser"
Apr 10 22:13:34 cobalt kernel: audit: type=1400 audit(1586571214.903:271): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-applications_</var/snap/lxd/common/lxd>" pid=17009 comm="apparmor_parser"
Apr 10 22:13:34 cobalt kernel: eth0: renamed from veth46590781
Apr 10 22:13:35 cobalt kernel: lxdbr0: port 1(vethbc785f11) entered disabled state
Apr 10 22:13:35 cobalt kernel: device vethbc785f11 left promiscuous mode
Apr 10 22:13:35 cobalt kernel: lxdbr0: port 1(vethbc785f11) entered disabled state
Apr 10 22:13:35 cobalt NetworkManager[1574]: <info>  [1586571215.0966] device (vethbc785f11): released from master device lxdbr0
Apr 10 22:13:35 cobalt lxd.daemon[27943]: t=2020-04-10T22:13:35-0400 lvl=eror msg="Failed starting container" action=start created=2020-02-10T08:08:03-0300 ephemeral=false name=applications project=default stateful=false used=2020-04-10T22:13:26-0400
Apr 10 22:13:35 cobalt audit[17049]: AVC apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=17049 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:35 cobalt kernel: audit: type=1400 audit(1586571215.227:272): apparmor="DENIED" operation="open" profile="/snap/core/8935/usr/lib/snapd/snap-confine" name="/etc/pop-os/os-release" pid=17049 comm="snap-confine" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Apr 10 22:13:35 cobalt audit[17049]: AVC apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=17049 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:35 cobalt kernel: audit: type=1400 audit(1586571215.231:273): apparmor="DENIED" operation="open" profile="snap.lxd.lxc" name="/var/lib/snapd/hostfs/etc/pop-os/os-release" pid=17049 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Apr 10 22:13:35 cobalt lxd.daemon[27943]: t=2020-04-10T22:13:35-0400 lvl=eror msg="The stop hook failed" container=applications err="Failed to run: zfs unmount default/containers/applications: umount: /home/snap/lxd/common/lxd/storage-pools/default/containers/applications: not mounted\ncannot unmount '/home/snap/lxd/common/lxd/storage-pools/default/containers/applications': umount failed"

Hrm weird you are seeing some lxd logs but not the debug ones.

That unmount error at the bottom looks interesting though, do you see that only when the error occurs?

Can you also look in /var/snap/lxd/common/lxd/logs/ and see if there is a log file for your container?

There’s a good chance this PR will fix the issue (it may have been that issuing a restart command was what was causing this problem):

But this issue was about containers, not VMs though

Mmm true, id forgotten that. Ive not been able to reproduce it.

So, some more information: Interestingly enough, another container started manifesting the same problem suddenly. What is different though is that once I see the problem, where I can’t start the container and the backup.yaml file gets created, it stays consistent. The container will not start. After a reboot of the system, however, I am once again able to start the container.

Looking at those logs I didn’t find anything that looked particularly interesting. Here is some of what I did find:

t=2020-04-13T08:26:30-0400 lvl=warn msg="Detected poll(POLLNVAL) event." 
t=2020-04-13T08:26:34-0400 lvl=eror msg="The stop hook failed" container=applications err="Failed to run: zfs unmount default/containers/applications: umount: /home/snap/lxd/common/lxd/storage-pools/default/containers/applications: not mounted\ncannot unmount '/home/snap/lxd/common/lxd/storage-pools/default/containers/applications': umount failed"
t=2020-04-13T08:26:34-0400 lvl=info msg="Shut down container" action=shutdown created=2020-02-10T08:08:03-0300 ephemeral=false name=applications project=default timeout=15s used=2020-04-10T22:23:28-0400
t=2020-04-13T08:26:49-0400 lvl=info msg="Starting container" action=start created=2020-02-10T08:08:03-0300 ephemeral=false name=applications project=default stateful=false used=2020-04-10T22:23:28-0400
t=2020-04-13T08:26:49-0400 lvl=eror msg="Failed starting container" action=start created=2020-02-10T08:08:03-0300 ephemeral=false name=applications project=default stateful=false used=2020-04-10T22:23:28-0400

At the time of this occurring, this is my lxd snap version info:

lxd 4.0.1 14804 latest/stable canonical✓ -

Can you confirm you have done:

sudo snap set lxd daemon.debug=true
sudo systemctl reload snap.lxd.daemon

As your system is still not apparently producing debug level logs.

Can you run journalctl -u snap.lxd.daemon -n 300 the next time it happens please.