For the last 9 hours or so I’ve had some unusual issues when building clusters from scratch. For one thing, about 10 hours ago, the nodes were installing 5.1, but a few hours later they were back to installing 5.0 so I had some quick work required to get clusters to form during this time when it seemed to be a mystery whether 5.0 or 5.1 would be installed. ATM 5.0 is being installed by default.
Then Orabuntu-LXD started getting this error about 9 hours ago (and this is still happening) when trying to create the container. It succeeds eventually but not before several retries:
==============================================
Launch Oracle LXD Seed Container...
==============================================
==============================================
(takes a minute or two ... patience) ...
==============================================
Creating oel84d10
Error: Failed instance creation: Post "http://unix.socket/1.0/instances": net/http: timeout awaiting response headers
Launch of LXD container encountered an unexpected issue. Retrying launch ...
Creating oel84d10
Error: Failed instance creation: Post "http://unix.socket/1.0/instances": net/http: timeout awaiting response headers
Launch of LXD container encountered an unexpected issue. Retrying launch ...
Creating oel84d10
Error: Failed instance creation: Post "http://unix.socket/1.0/instances": net/http: timeout awaiting response headers
Launch of LXD container encountered an unexpected issue. Retrying launch ...
Creating oel84d11
Error: Failed instance creation: Post "http://unix.socket/1.0/instances": net/http: timeout awaiting response headers
Launch of LXD container encountered an unexpected issue. Retrying launch ...
Creating oel84d12
The Orabuntu-LXC software keeps re-trying for awhile to see if it can get a successful result, and it also increments the name of the container as part of that retry. As can see it tries the same container name 3 times, then it starts incrementing to see if that will help. I have not seen this fail-safe code active for quite awhile, so I am wondering if there is something going on with container launch as well. The output is cutoff in the cut and paste, but oel84d12 launch did succeed without any further drama onthe first try.
Thank you. Everything in there that has a “log” extension:
[root@r8sv1 ~]# cd /var/snap/lxd/common/lxd/logs
[root@r8sv1 logs]# ls -lrt
total 8
-rw-------. 1 root root 1268 May 3 08:18 lxd.log.1
-rw-r--r--. 1 root root 0 May 3 08:18 dnsmasq.lxdbr0.log
-rw-------. 1 root root 704 May 3 08:18 lxd.log
drwxr-xr-x. 2 root root 21 May 3 08:22 oel84d10
drwxr-xr-x. 2 root root 21 May 3 08:22 oel84d11
drwxr-xr-x. 2 root root 116 May 3 08:59 maestro
drwxr-xr-x. 2 root root 116 May 3 08:59 violin1
drwxr-xr-x. 2 root root 116 May 3 09:00 violin2
drwxr-xr-x. 2 root root 116 May 3 09:02 ora84d10
drwxr-xr-x. 2 root root 116 May 3 09:03 ora84d11
drwxr-xr-x. 2 root root 116 May 3 09:03 oel84d12
[root@r8sv1 logs]# cat lxd.log
time="2022-05-03T08:18:43-05:00" level=warning msg="AppArmor support has been disabled because of lack of kernel support"
time="2022-05-03T08:18:43-05:00" level=warning msg=" - AppArmor support has been disabled, Disabled because of lack of kernel support"
time="2022-05-03T08:18:43-05:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
time="2022-05-03T08:18:43-05:00" level=warning msg="Instance type not operational" driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
time="2022-05-03T08:18:44-05:00" level=warning msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to initialize fanotify: invalid argument"
[root@r8sv1 logs]# date
Tue May 3 09:48:32 CDT 2022
[root@r8sv1 logs]# cat lxd.log.1
time="2022-05-03T08:17:39-05:00" level=warning msg="AppArmor support has been disabled because of lack of kernel support"
time="2022-05-03T08:17:39-05:00" level=warning msg=" - AppArmor support has been disabled, Disabled because of lack of kernel support"
time="2022-05-03T08:17:39-05:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
time="2022-05-03T08:17:39-05:00" level=warning msg="Instance type not operational" driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
time="2022-05-03T08:17:40-05:00" level=warning msg="Failed to initialize fanotify, falling back on fsnotify" err="Failed to initialize fanotify: invalid argument"
time="2022-05-03T08:17:43-05:00" level=warning msg="Failed adding member event listener client" err="Unable to connect to: 0.0.0.0:443 ([dial tcp 0.0.0.0:443: connect: connection refused])" local="10.209.53.1:8443" remote=0.0.0.0
time="2022-05-03T08:18:40-05:00" level=warning msg="Failed to update instance types: Get \"https://us.lxd.images.canonical.com/meta/instance-types/aws.yaml\": context canceled"
time="2022-05-03T08:18:40-05:00" level=warning msg="Could not handover member's responsibilities" err="Failed to transfer leadership: No online voter found"
[root@r8sv1 logs]# cat dnsmasq.lxdbr0.log
[root@r8sv1 logs]#
This was the install of the first node in the cluster. As mentioned in the original post I did run into issues with some of the nodes pulling down LXD 5.0 and some pulling 5.1, and it was odd because the first node pulled 5.1, then the second node pulled 5.0, when intuition would sort of suggest the other way round. Anyway, yes I’ve been watching that carefully since 5.1 came out because that did cause some hiccups in cluster deployments.
Yes I’ve been checking snap info lxd and Orabuntu-LXC runs that command automatically as part of the cluster deployments. And yes it has been a bit unpredictable. All these odd little issues that cropped up last 24 hours or so may just go away by themselves once 5.1 “settles in” so to speak. Right now Orabuntu-LXC is deploying the second node on Redhat 8 so we’ll see how that goes. I’m just doing testing for the upcoming release of Orabuntu-LXC v7.0.0-alpha ELENA and the testing consists of having Orabuntu-LXC auto deploy 5-node LXD clusters.
Orabuntu-LXC retries the cluster formation using a preseed several times, so I can workaround the issue of phased rollouts by just upgrading a wayward node to 5.1 on the fly (preseed retry waits 2 minutes before each retry) and then the preseed of course succeeds.
It was more the failure of the container launch that I found surprising. Other container launches later in the flow of Orabuntu-LXC were working fine. It just seemed to be that first one after the cluster had formed that kept consistently running into issue.