LXD 3.23 - Cluster setup, lxc exec has different behavior for containers and VMs

This is probably expected behavior (given current state of VM support), but wanted to confirm that I was not doing anything wrong …

I have a cluster setup of LXD 3.23 (using snap), on which I have launched a number of containers and VMs. The following is the behavior I see when running the lxc exec command against containers and VMs on the local host/other hosts in the cluster:

  • lxc exec <container> works as expected, irrespective of whether the container is running on the local host or another host in the cluster
  • lxc exec <vm> works for VMs running on the local host
  • lxc exec <vm> against a VM running on a non-local host in the cluster returns Error: not found

Hi!

When you refer to a remote LXD server, don’t you use something like

lxc exec myremote1:mycontainer -- /bin/sh

That is, you specify the remote plus the container name.

No, I am finding that in cluster mode, I don’t have to specify the remote explicitly for any of the containers (local or non-local).

I was assuming that in cluster mode, lxc/lxd was implicitly setting up remote’s for all of the cluster hosts …

Okay, I did not notice the part about the cluster.

You do not get an error Error: Failed to connect to lxd-agent, therefore you have configured successfully the LXD Agent in all cluster members.

Oh, that would be a bug. I’ll reproduce the issue here and look into it.

lxc exec is certainly meant to work over the cluster, I supposed it’s something pretty obvious that we may have forgotten to put in the API.

Thanks for looking into this/fixing it promptly!

Another issue, which may or may not be related (looking at your pull request, it feels as if the previous container-only code paths now need to work with container+vm instances, and the change may have been missed in some paths):

  • From within a container running on any of the hosts in the cluster, I am able to resolve other container names that correspond to either local or remote containers (it implicitly adds the .lxd domain and whatever resolver setup you use is able to do the right thing).
  • From within a container, I am not able to resolve either other local or remote VM names
  • From within a VM, I am able to resolve container names that correspond to local or remote containers
  • From within a VM, I am not able to resolve either other local or remote VM names

Hmm, that part should be handled the same as it’s not in any container or VM specific logic.

What does lxc network list-leases lxdbr0 show you?

@tomp

I am using fan networking for the cluster, so replaced lxdbr0 with lxdfan0 in the output below …

akriadmin@c4akri01:~/scripts$ lxc network list
+---------+----------+---------+-------------+---------+---------+
|  NAME   |   TYPE   | MANAGED | DESCRIPTION | USED BY |  STATE  |
+---------+----------+---------+-------------+---------+---------+
| eno1    | physical | NO      |             | 0       |         |
+---------+----------+---------+-------------+---------+---------+
| eno2    | physical | NO      |             | 0       |         |
+---------+----------+---------+-------------+---------+---------+
| eno3    | physical | NO      |             | 0       |         |
+---------+----------+---------+-------------+---------+---------+
| eno4    | physical | NO      |             | 0       |         |
+---------+----------+---------+-------------+---------+---------+
| lxdfan0 | bridge   | YES     |             | 16      | CREATED |
+---------+----------+---------+-------------+---------+---------+
akriadmin@c4akri01:~/scripts$ lxc network list-leases lxdfan0
+----------+-------------+------------+------+----------+
| HOSTNAME | MAC ADDRESS | IP ADDRESS | TYPE | LOCATION |
+----------+-------------+------------+------+----------+

To be explicit, the list-leases call returned no output.

I’ll take a look at this.

The network leases list is likely empty, even for containers, because they are newly created and using the network property in the bridged NIC device, and the network leases list view has not been updated to handle that new config key.

This PR fixes it:

Will look at DNS issue next.

Hi,

So I’ve setup a test cluster locally running a container on one host and a VM on another host.

I can ping their hostnames from each other without a problem.

Can you confirm that dnsmasq and forkdns process is running on each of your cluster hosts please?

Please can you send the outputs of:

ps aux | grep forkdns

ps aux | grep dnsmasq

Thanks for looking into this.

My setup is a 7-host cluster with fan networking.
I have 6 containers and 10 VMs deployed on this cluster. In case it matters, the containers are ubuntu:18.04 images, and the VMs are split between images:ubuntu/18.04 and images:ubuntu/19.10.

akriadmin@c4akri01:~/scripts$ lxc cluster list
+------------+---------------------------+----------+--------+-------------------+--------------+
|    NAME    |            URL            | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4akri01   | https://10.30.30.204:8443 | YES      | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4akri02   | https://10.30.30.197:8443 | YES      | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4akri03   | https://10.30.30.215:8443 | YES      | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4akri04   | https://10.30.30.205:8443 | NO       | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4astore01 | https://10.30.30.221:8443 | NO       | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4astore02 | https://10.30.30.222:8443 | NO       | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+
| c4astore03 | https://10.30.30.223:8443 | NO       | ONLINE | fully operational | x86_64       |
+------------+---------------------------+----------+--------+-------------------+--------------+

forkdns and dnsmasq are running on all of the hosts (run-physical-hosts.sh is a wrapper script that ssh’s into each host and runs a command):

akriadmin@c4akri01:~/scripts$ ./run-physical-hosts.sh "ps aux | egrep '(forkdns | dnsmasq)'"
[c4akri01]:
akriadm+ 26364  0.0  0.0  11596  3108 pts/0    S+   16:04   0:00 /bin/bash ./run-physical-hosts.sh ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 26366  0.0  0.0  46844  5792 pts/0    S+   16:04   0:00 ssh c4akri01 ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 26450  0.0  0.0  11592  3196 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 26452  0.0  0.0  13136  1004 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)
lxd      36494  0.0  0.0  49984  3692 ?        Ss   04:59   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.204.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.204.0.2,240.204.0.254,1h -s lxd -S /lxd/240.204.0.1#1053 --rev-server=240.0.0.0/8,240.204.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root     36495  0.0  0.0 2019848 35412 ?       Ssl  04:59   0:08 /snap/lxd/current/bin/lxd forkdns 240.204.0.1:1053 lxd lxdfan0
[c4akri02]:
akriadm+ 10373  0.0  0.0  11592  3196 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 10375  0.0  0.0  13136  1044 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)
lxd      39993  0.0  0.0  49984  3708 ?        Ss   02:27   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.197.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.197.0.2,240.197.0.254,1h -s lxd -S /lxd/240.197.0.1#1053 --rev-server=240.0.0.0/8,240.197.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root     39994  0.0  0.0 2209028 34172 ?       Ssl  02:27   0:11 /snap/lxd/current/bin/lxd forkdns 240.197.0.1:1053 lxd lxdfan0
[c4akri03]:
lxd       5018  0.0  0.0  49984  3536 ?        Ss   00:34   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.215.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.215.0.2,240.215.0.254,1h -s lxd -S /lxd/240.215.0.1#1053 --rev-server=240.0.0.0/8,240.215.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root      5019  0.0  0.0 2085896 33520 ?       Ssl  00:34   0:13 /snap/lxd/current/bin/lxd forkdns 240.215.0.1:1053 lxd lxdfan0
akriadm+ 35439  0.0  0.0  11592  2984 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 35441  0.0  0.0  13136  1000 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)
[c4akri04]:
akriadm+ 23152  0.0  0.0  11592  3184 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 23154  0.0  0.0  13136  1036 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)
lxd      38010  0.0  0.0  49984  3596 ?        Ss   Mar30   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.205.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.205.0.2,240.205.0.254,1h -s lxd -S /lxd/240.205.0.1#1053 --rev-server=240.0.0.0/8,240.205.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root     38011  0.0  0.0 2152008 34932 ?       Ssl  Mar30   0:14 /snap/lxd/current/bin/lxd forkdns 240.205.0.1:1053 lxd lxdfan0
[c4astore01]:
lxd       8342  0.0  0.0  49984  3552 ?        Ss   Mar30   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.221.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.221.0.2,240.221.0.254,1h -s lxd -S /lxd/240.221.0.1#1053 --rev-server=240.0.0.0/8,240.221.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root      8343  0.0  0.1 1921784 34308 ?       Ssl  Mar30   0:12 /snap/lxd/current/bin/lxd forkdns 240.221.0.1:1053 lxd lxdfan0
akriadm+ 13900  0.0  0.0  11592  3156 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 13902  0.0  0.0  13136  1104 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)
[c4astore02]:
akriadm+ 21060  0.0  0.0  11592  3256 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 21062  0.0  0.0  13136  1008 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)
lxd      31602  0.0  0.0  49984  3696 ?        Ss   Mar30   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.222.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.222.0.2,240.222.0.254,1h -s lxd -S /lxd/240.222.0.1#1053 --rev-server=240.0.0.0/8,240.222.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root     31603  0.0  0.0 1371340 30020 ?       Ssl  Mar30   0:08 /snap/lxd/current/bin/lxd forkdns 240.222.0.1:1053 lxd lxdfan0
[c4astore03]:
lxd       5908  0.0  0.0  49984  3528 ?        Ss   Mar30   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.223.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.223.0.2,240.223.0.254,1h -s lxd -S /lxd/240.223.0.1#1053 --rev-server=240.0.0.0/8,240.223.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root      5909  0.0  0.0 1436620 32092 ?       Ssl  Mar30   0:10 /snap/lxd/current/bin/lxd forkdns 240.223.0.1:1053 lxd lxdfan0
akriadm+ 11131  0.0  0.0  11592  3120 ?        Ss   16:04   0:00 bash -c ps aux | egrep '(forkdns | dnsmasq)'
akriadm+ 11133  0.0  0.0  13136  1084 ?        S    16:04   0:00 grep -E (forkdns | dnsmasq)

To illustrate the DNS resolution issue, the following shows nslookup responses for local and remote containers and VMs from one of the container instances (ctrlr running on host c4akri01, and assigned an IP address of 240.204.0.186):

  • k8s-master1 is a local container instance
  • k8s-worker1 is a local VM instance
  • k8s-lb is a remote container running on a different host (c4akri04)
  • k8s-worker4 is a remote VM instance

ubuntu@ctrlr:~$ nslookup k8s-master1
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   k8s-master1.lxd
Address: 240.204.0.82

ubuntu@ctrlr:~$ nslookup k8s-worker1
Server:         127.0.0.53
Address:        127.0.0.53#53

** server can't find k8s-worker1: SERVFAIL

ubuntu@ctrlr:~$ nslookup k8s-lb
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   k8s-lb.lxd
Address: 240.205.0.177
** server can't find k8s-lb.lxd: NXDOMAIN

ubuntu@ctrlr:~$ nslookup k8s-worker4
Server:         127.0.0.53
Address:        127.0.0.53#53

** server can't find k8s-worker4: SERVFAIL

ubuntu@ctrlr:~$

Adding to the previous note, I added the .lxd domain suffix to the VM names to see if that makes a difference …

ubuntu@ctrlr:~$ nslookup k8s-worker1.lxd
Server:         127.0.0.53
Address:        127.0.0.53#53

** server can't find k8s-worker1.lxd: NXDOMAIN

ubuntu@ctrlr:~$ nslookup k8s-worker4.lxd
Server:         127.0.0.53
Address:        127.0.0.53#53

** server can't find k8s-worker4.lxd: NXDOMAIN

Can you paste the contents of the the file /var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases from each host please.

Also can you use the dig tool rather than nslookup and try querying the LXD host directly rather than the local caching resolver.

e.g.

Run
ip r

Get the default route IP for that container.

Then

dig @<default route ip> k8s-worker1.lxd

dnsmasq.leases output below … don’t see entries for the VMs, only for the containers.

akriadmin@c4akri01:~/scripts$ ./run-physical-hosts.sh "sudo cat /var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases"
[c4akri01]:
1585677285 00:16:3e:e4:d9:96 240.204.0.142 distrobuilder-099a8a43-b9b3-4906-8643-df5603417eb6 ff:49:72:1f:47:00:02:00:00:ab:11:db:34:d4:5a:56:81:92:38
1585677352 00:16:3e:77:41:e1 240.204.0.82 k8s-master1 ff:88:5a:06:e6:00:02:00:00:ab:11:02:65:52:86:70:87:f5:4c
1585677067 00:16:3e:e6:95:e4 240.204.0.186 ctrlr ff:82:3c:be:6d:00:02:00:00:ab:11:54:56:f7:f0:1b:3e:f1:a3
[c4akri02]:
1585677644 00:16:3e:58:5f:c9 240.197.0.215 distrobuilder-099a8a43-b9b3-4906-8643-df5603417eb6 ff:49:72:1f:47:00:02:00:00:ab:11:db:34:d4:5a:56:81:92:38
1585677301 00:16:3e:1a:ef:88 240.197.0.18 k8s-master2 ff:9d:73:a8:25:00:02:00:00:ab:11:2a:1d:38:85:12:70:4c:96
[c4akri03]:
1585676600 00:16:3e:25:2f:7a 240.215.0.200 distrobuilder-099a8a43-b9b3-4906-8643-df5603417eb6 ff:49:72:1f:47:00:02:00:00:ab:11:db:34:d4:5a:56:81:92:38
1585676377 00:16:3e:9e:14:52 240.215.0.110 k8s-master3 ff:ce:98:af:ae:00:02:00:00:ab:11:ff:24:c7:81:ac:99:b6:69
[c4akri04]:
1585677638 00:16:3e:28:03:ec 240.205.0.196 distrobuilder-099a8a43-b9b3-4906-8643-df5603417eb6 ff:49:72:1f:47:00:02:00:00:ab:11:db:34:d4:5a:56:81:92:38
1585677307 00:16:3e:5e:78:f6 240.205.0.177 k8s-lb ff:08:e8:02:b3:00:02:00:00:ab:11:6a:7b:43:9a:50:46:31:ba
[c4astore01]:
1585677304 00:16:3e:71:f0:fb 240.221.0.117 distrobuilder-3956aeca-2342-4fb5-9ff2-d9e31ffd2a21 ff:49:72:1f:47:00:02:00:00:ab:11:3f:10:62:e1:3b:46:c9:cd
1585676824 00:16:3e:bc:c8:db 240.221.0.52 distrobuilder-099a8a43-b9b3-4906-8643-df5603417eb6 ff:49:72:1f:47:00:02:00:00:ab:11:db:34:d4:5a:56:81:92:38
1585676735 00:16:3e:3b:71:3c 240.221.0.237 hdfs-namenode ff:68:37:aa:04:00:02:00:00:ab:11:4c:09:fe:13:38:bb:ce:43
[c4astore02]:
1585676927 00:16:3e:cb:4c:d3 240.222.0.137 distrobuilder-a76bda2c-9e96-4822-9b6f-0ec7f07e5957 ff:49:72:1f:47:00:02:00:00:ab:11:71:34:76:98:f7:4c:f5:a0
1585677515 00:16:3e:84:ae:7f 240.222.0.12 distrobuilder-76b76cba-efdf-43cc-8bbb-d9461156de33 ff:49:72:1f:47:00:02:00:00:ab:11:40:04:3b:f2:aa:44:c8:e6
[c4astore03]:
1585676510 00:16:3e:90:26:83 240.223.0.187 distrobuilder-3956aeca-2342-4fb5-9ff2-d9e31ffd2a21 ff:49:72:1f:47:00:02:00:00:ab:11:3f:10:62:e1:3b:46:c9:cd
1585677261 00:16:3e:96:82:14 240.223.0.249 distrobuilder-099a8a43-b9b3-4906-8643-df5603417eb6 ff:49:72:1f:47:00:02:00:00:ab:11:db:34:d4:5a:56:81:92:38
akriadmin@c4akri01:~/scripts$

Also can you paste the contents of

/var/snap/lxd/common/lxd/networks/lxdfan0/forkdns.servers/servers.conf

And finally, do you have any firewall rules running on the LXD host machines that could be blocking port 1053 UDP over the fan network?