After the LXD 3.15 upgrade propagated through our 7 node cluster, we can no longer start containers, even newly created ones (eg: lxc launch ubuntu:18.04 wdelgenio-test0)
The forkstart fails and I don’t know how to get more information.
CentOS Linux release 7.6.1810 (Core)
3.10.0-957.el7.x86_64
lxc wdelgenio-test0 20190724035743.889 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:864 - No such file or directory - Failed to receive the container state
lxc 20190724035743.889 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command “get_state”
Can you show lxc config show --expanded NAME for an affected container?
Can you run lxc monitor --type=logging --pretty in a terminal on the same node as the container is using, then run lxc start NAME, this should provide much more details on the LXD side
Can you create a new container, then do lxc config set NAME raw.lxc lxc.log.level=trace, then run lxc start NAME and lxc info --show-log NAME and provide the hopefully much more detailed log output?
FYI, if I remove the cgroup directories with find /sys/fs/cgroup/*/lxc*/wdelgenio-test1 -type d | tac | xargs rmdir, the container still fails to start.
[root@team-dev2 user]# lxc info --show-log wdelgenio-test1
Name: wdelgenio-test1
Location: team-dev2
Remote: unix://
Architecture: x86_64
Created: 2019/07/24 14:03 UTC
Status: Stopped
Type: persistent
Profiles: default
Log:
lxc wdelgenio-test1 20190724143740.776 TRACE commands - commands.c:lxc_cmd:303 - Connection refused - Command "get_state" failed to connect command socket
lxc wdelgenio-test1 20190724143740.776 TRACE start - start.c:lxc_init_handler:774 - Created anonymous pair {3,5} of unix sockets
lxc wdelgenio-test1 20190724143740.776 TRACE commands - commands.c:lxc_cmd_init:1310 - Created abstract unix socket "/var/snap/lxd/common/lxd/containers/wdelgenio-test1/command"
lxc wdelgenio-test1 20190724143740.776 TRACE start - start.c:lxc_init_handler:786 - Unix domain socket 7 for command server is ready
lxc wdelgenio-test1 20190724143740.777 INFO lxccontainer - lxccontainer.c:do_lxcapi_start:993 - Set process title to [lxc monitor] /var/snap/lxd/common/lxd/containers wdelgenio-test1
lxc wdelgenio-test1 20190724143740.778 INFO start - start.c:lxc_check_inherited:311 - Closed inherited fd 4
lxc wdelgenio-test1 20190724143740.783 TRACE start - start.c:lxc_start:2145 - Doing lxc_start
lxc wdelgenio-test1 20190724143740.783 INFO lsm - lsm/lsm.c:lsm_init:50 - LSM security driver nop
lxc wdelgenio-test1 20190724143740.783 TRACE start - start.c:lxc_init:805 - Initialized LSM
lxc wdelgenio-test1 20190724143740.783 TRACE seccomp - seccomp.c:get_new_ctx:488 - Added arch 2 to main seccomp context
lxc wdelgenio-test1 20190724143740.783 TRACE seccomp - seccomp.c:get_new_ctx:496 - Removed native arch from main seccomp context
lxc wdelgenio-test1 20190724143740.783 TRACE seccomp - seccomp.c:get_new_ctx:488 - Added arch 3 to main seccomp context
lxc wdelgenio-test1 20190724143740.783 TRACE seccomp - seccomp.c:get_new_ctx:496 - Removed native arch from main seccomp context
lxc wdelgenio-test1 20190724143740.783 TRACE seccomp - seccomp.c:get_new_ctx:501 - Arch 4 already present in main seccomp context
lxc wdelgenio-test1 20190724143740.783 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "reject_force_umount # comment this to allow umount -f; not recommended"
lxc wdelgenio-test1 20190724143740.783 INFO seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc wdelgenio-test1 20190724143740.783 INFO seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for reject_force_umount action 0(kill)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for reject_force_umount action 0(kill)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for reject_force_umount action 0(kill)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for reject_force_umount action 0(kill)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "[all]"
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "kexec_load errno 38"
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for kexec_load action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for kexec_load action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for kexec_load action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for kexec_load action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "open_by_handle_at errno 38"
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for open_by_handle_at action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for open_by_handle_at action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for open_by_handle_at action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for open_by_handle_at action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "init_module errno 38"
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for init_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for init_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for init_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for init_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "finit_module errno 38"
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for finit_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for finit_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for finit_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for finit_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:789 - Processing "delete_module errno 38"
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for delete_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for delete_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for delete_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for delete_module action 327718(errno)
lxc wdelgenio-test1 20190724143740.784 INFO seccomp - seccomp.c:parse_config_v2:1008 - Merging compat seccomp contexts into main context
lxc wdelgenio-test1 20190724143740.784 TRACE seccomp - seccomp.c:parse_config_v2:1018 - Merged first compat seccomp context into main context
lxc wdelgenio-test1 20190724143740.784 TRACE seccomp - seccomp.c:parse_config_v2:1034 - Merged second compat seccomp context into main context
lxc wdelgenio-test1 20190724143740.784 TRACE start - start.c:lxc_init:812 - Read seccomp policy
lxc wdelgenio-test1 20190724143740.784 TRACE start - start.c:lxc_serve_state_clients:474 - Set container state to STARTING
lxc wdelgenio-test1 20190724143740.784 TRACE start - start.c:lxc_serve_state_clients:477 - No state clients registered
lxc wdelgenio-test1 20190724143740.784 TRACE start - start.c:lxc_init:820 - Set container state to "STARTING"
lxc wdelgenio-test1 20190724143740.784 TRACE start - start.c:lxc_init:883 - Set environment variables
lxc wdelgenio-test1 20190724143740.784 INFO conf - conf.c:run_script_argv:374 - Executing script "/proc/439929/exe callhook /var/snap/lxd/common/lxd 118 start" for container "wdelgenio-test1"
lxc wdelgenio-test1 20190724143740.784 TRACE conf - conf.c:run_script_argv:421 - Set environment variable: LXC_HOOK_TYPE=pre-start
lxc wdelgenio-test1 20190724143740.784 TRACE conf - conf.c:run_script_argv:429 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc wdelgenio-test1 20190724143740.846 TRACE start - start.c:lxc_init:890 - Ran pre-start hooks
lxc wdelgenio-test1 20190724143740.846 TRACE start - start.c:setup_signal_fd:356 - Created signal file descriptor 4
lxc wdelgenio-test1 20190724143740.846 TRACE start - start.c:lxc_init:901 - Set up signal fd
lxc wdelgenio-test1 20190724143740.847 DEBUG terminal - terminal.c:lxc_terminal_peer_default:676 - No such device - The process does not have a controlling terminal
lxc wdelgenio-test1 20190724143740.847 DEBUG terminal - terminal.c:lxc_terminal_create_log_file:848 - Using "/var/snap/lxd/common/lxd/logs/wdelgenio-test1/console.log" as terminal log file
lxc wdelgenio-test1 20190724143740.847 TRACE terminal - terminal.c:lxc_terminal_create_ringbuf:829 - Allocated 131072 byte terminal ringbuffer
lxc wdelgenio-test1 20190724143740.847 TRACE start - start.c:lxc_init:909 - Created console
lxc wdelgenio-test1 20190724143740.847 TRACE terminal - terminal.c:lxc_terminal_map_ids:1192 - Chowned terminal "/dev/pts/0"
lxc wdelgenio-test1 20190724143740.847 TRACE start - start.c:lxc_init:916 - Chowned console
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1018 - basecginfo is:
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1019 - 11:blkio:/
10:hugetlb:/
9:memory:/
8:cpuacct,cpu:/
7:net_prio,net_cls:/
6:pids:/
5:devices:/
4:cpuset:/
3:freezer:/
2:perf_event:/
1:name=systemd:/
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 0: blkio
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 1: hugetlb
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 2: memory
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 3: cpuacct
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 4: cpu
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 5: net_prio
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 6: net_cls
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 7: pids
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 8: devices
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 9: cpuset
lxc wdelgenio-test1 20190724143740.847 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 10: freezer
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1022 - kernel subsystem 11: perf_event
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_basecg_debuginfo:1025 - named subsystem 0: name=systemd
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:cg_hybrid_init:2589 - Writable cgroup hierarchies:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:999 - Hierarchies:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 0: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/systemd
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: name=systemd
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 1: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/perf_event
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: perf_event
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 2: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/freezer
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: freezer
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 3: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/cpuset
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: cpuset
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 4: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/devices
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: devices
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 5: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/pids
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: pids
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 6: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/net_cls,net_prio
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: net_cls
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 1: net_prio
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 7: base_cgroup: /
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/cpu,cpuacct
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: cpu
lxc wdelgenio-test1 20190724143740.848 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 1: cpuacct
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 8: base_cgroup: /
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/memory
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: memory
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 9: base_cgroup: /
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/hugetlb
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: hugetlb
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1004 - 10: base_cgroup: /
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1005 - mountpoint: /sys/fs/cgroup/blkio
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1006 - controllers:
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:lxc_cgfsng_print_hierarchies:1008 - 0: blkio
lxc wdelgenio-test1 20190724143740.849 TRACE cgroup - cgroups/cgroup.c:cgroup_init:61 - Initialized cgroup driver cgfsng
lxc wdelgenio-test1 20190724143740.849 TRACE cgroup - cgroups/cgroup.c:cgroup_init:64 - Running with legacy cgroup layout
lxc wdelgenio-test1 20190724143740.849 TRACE start - start.c:lxc_init:923 - Initialized cgroup driver
lxc wdelgenio-test1 20190724143740.849 TRACE start - start.c:lxc_init:930 - Initialized LSM
lxc wdelgenio-test1 20190724143740.849 INFO start - start.c:lxc_init:932 - Container "wdelgenio-test1" is initialized
lxc wdelgenio-test1 20190724143740.849 DEBUG cgfsng - cgroups/cgfsng.c:cg_legacy_filter_and_set_cpus:502 - Removed isolated or offline cpus from cpuset
lxc wdelgenio-test1 20190724143740.849 TRACE cgfsng - cgroups/cgfsng.c:cg_legacy_handle_cpuset_hierarchy:616 - "cgroup.clone_children" was already set to "1"
lxc wdelgenio-test1 20190724143740.864 INFO cgfsng - cgroups/cgfsng.c:cgfsng_monitor_create:1405 - The monitor process uses "lxc.monitor/wdelgenio-test1" as cgroup
lxc wdelgenio-test1 20190724143740.864 DEBUG storage - storage/storage.c:get_storage_by_name:232 - Detected rootfs type "dir"
lxc wdelgenio-test1 20190724143740.865 DEBUG cgfsng - cgroups/cgfsng.c:cg_legacy_filter_and_set_cpus:502 - Removed isolated or offline cpus from cpuset
lxc wdelgenio-test1 20190724143740.871 DEBUG lxccontainer - lxccontainer.c:wait_on_daemonized_start:861 - First child 440282 exited
lxc wdelgenio-test1 20190724143740.871 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:864 - No such file or directory - Failed to receive the container state
lxc 20190724143740.871 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command "get_state"
We’re also apparently having issues with lxcfs’s proc mounts on all containers now as well.
We’re seeing errors like:
Error: /proc must be mounted
To mount /proc at boot you need an /etc/fstab line like:
proc /proc proc defaults
In the meantime, run "mount proc /proc -t proc"
Thanks Ron, unfortunately due to the other issue above I cannot restart any of the containers. snap refresh --list indicates all the snaps are the latest stable version.
The weird thing is that I don’t see any segfaults in abrt-cli list and all of the container host machines have running lxcfs processes:
Are you able to copy the containers to a new server to get them back online? This is what we do when we run into these sorts of issues. Kind of a PIA, but it works. We always have a “migration” server ready in case unexpected issues happen.
Also, not to sound harsh, but this is why we don’t run LXD clusters. Too many dependencies on all the nodes working exactly 100% correct for the cluster to work well. We have seen one bad server cause lots of issues for the whole cluster.
This is what will like likely end up doing.
I installed a non-clustered zfs backed lxd on a migration server now.
It still cannot start even a stock Ubuntu container.
We tried moving to candidate version and it still cannot start a container.
edit: edge doesn’t work either
I have done the following without success, somehow:
Uninstall lxd
reboot
snap install lxd --channel=3.0/stable
lxd init , non-clustered, no networking, dir storage pool
lxc launch ubuntu:18.04
Starting working-gopher
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart working-gopher /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/working-gopher/lxc.conf:
Try `lxc info --show-log local:working-gopher` for more info
lxc working-gopher 20190724155154.320 ERROR start - start.c:lxc_spawn:1737 - Invalid argument - Failed to clone a new set of namespaces
lxc working-gopher 20190724155154.322 ERROR start - start.c:__lxc_start:2019 - Failed to spawn container "working-gopher"
lxc working-gopher 20190724155154.323 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:851 - Received container state "ABORTING" instead of "RUNNING"
lxc working-gopher 20190724155154.754 ERROR conf - conf.c:userns_exec_1:4311 - Failed to clone process in new user namespace
lxc working-gopher 20190724155154.755 WARN cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:1108 - Failed to destroy cgroups
lxc 20190724155154.105 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command "get_state"
FYI - we are running 4.18.0-25-generic on U18.10. This seems to be a very stable release.
Also, in case you don’t know, I believe netplan is the new network config tool for Ubuntu 18. Make sure you know how to get it configured and working before starting any container migrations. They syntax is different and is very sensitive to spaces, tabs, etc. For us, we removed netplan and moved back to ifupdown to keep life simple (and more consistent with our CentOS installs).
Thanks Ron, you’ve been a good help. I think we’ll stick with netplan as our networking configuration is simple enough and I’m already somewhat familiar with it.
Hmm, so the errors would seem to indicate some kind of problem with liblxc 3.2.1 combined with the CentOS kernel. What’s odd is that we have automated CI on CentOS 7 and didn’t run in this problem.
I’m investigating what kernel is running on that machine now.
Confirmed that our test system perfectly matches your kernel and CentOS release, yet I’m not having any problems with a clean install of the lxd stable snap using the zfs backend…
It seems likely to be a cgroup handling issue in liblxc, but it’s unclear how you ran into this on a clean system and ours won’t fail.
@stgraber Thanks for the reply. I can’t do any further tests on that configuration as I just finished putting Ubuntu on all of our hosts, they’re too important to not have fully operational. We did not have SELinux enabled. I also saw the problem with the ‘dir’ backend as well.
Indeed, the storage backend is extremely unlikely to matter in this case.
This sounds like some cgroup issue got introduced in liblxc 3.2.1 but I can’t figure out why our test systems wouldn’t be impacted then given they’re running the exact same kernel and OS you were…
I don’t suppose you can easily setup a similarly broken test machine/vm somewhere?