Concerns about proxy

geodb27 · April 7, 2020, 2:54pm

People : hi !
I’ve ran into a deep panic in one of my lxd clusters. To be short : all three machines went out of free disk space in /var and all three instances of lxd went wrong. This is kindof my fault, I don’t come over here to get help on how to monitor my machines and so on, that’s not the purpose of my post.
Yep, despite this, I was abble to copy my containers on a fourth machine and re-install a brand new lxd cluster on these three machines.
This part was not without any troubles for me and I think that there are some things that are not clear in the install process. Let me explain.
On the three machines, at first install try, I had in /etc/environment variables set up like http_proxy, https_proxy, ftp_proxy and no_proxy. And, since these machines can’t access outside without proxy, I also had the first machine set up with lxd’s core config : core.http_proxy, core.https_proxy and core.proxy_ignore_host. This led to troubles, at least with brand new lxd-3.19 installation.
So, I ended up with throwing all this apart, no proxy at all and the three machines went back online and I just had to get my containers back on the cluster.

That said, I alos have two other clusters which were installed maybe more than one year ago.
The fact is that I had entries in my proxy log file complaining that the machines of these two clusters were trying to talk to each other via the proxy. That shouldn’t happen. So, I checked the config variables.
http_proxy, https_proxy (both in shell env), core.http.proxy and core.https_proxy are setup the way they should, and env var no_proxy is set to "localhost, 127.0.0.1, .mydomain.com, 10.0.0.0/24", same to core.proxy_ignore_host’s lxd config key.
Since the 6 machines are on network 10.0.0.0/24, this was expected to work out of the box and machine with ip 10.0.0.1 souhld talk directly to machine with ip 10.0.0.2 without going via the proxy, which it does not according to the logs in my proxy’s machine and the tcpdump I got on each machine. Why is that ?
Since I couldn’t make out what was wrong, I commented out the lines in /etc/environment, unsetted the lxd’s config keys, restarted all the lxd daemons with no success.
The core. don’t exist anymore, as would report “lxc config show” and lxd sql global “select * from config”.
So, there might be something left somewhere, but I couldn’t figure out clearly what it is.
I went to /var/snap/lxd/common/lxd/databases/global and found in there a db.bin file.
With a sqlite3 db.bin select * from config I found out that the core.* config keys I mentioned earlier are still in there. Can I safely remove these entries in this file ? Should I stop lxd before and restart it afterward ?
I don’t want to break my two clusters, so I’d like not to break anything.

Any help/explanations would be much appreciated !
Thanks in advance.

stgraber · April 7, 2020, 4:20pm

Modifying db.bin will not work, it’s just a temporary view of what the DB looks like, it gets re-generated during LXD runtime.

Instead early DB changes can be done with a patch.global.sql file in the database directory. This is covered here: https://linuxcontainers.org/lxd/docs/master/database#running-custom-queries-at-lxd-daemon-startup

geodb27 · April 7, 2020, 4:56pm

Thank you so much for your quick answer stgraber
I’ll try as you suggest tomorow : create a patch.global.sql file which will contain something like “delete from config where …” (I have no lxd daemon to access to at this time, but you understand my meaning). The idea would then be for me to delete these core.proxy_http core.proxy_https and core.proxy_ignore_hosts.
You said that the db.bin is a reflect of the running database, this I can understand. However, how can one explain that the three keys above are in this db.bin file and they are not shown at all with the lxd sql global “select * from config” ?
I just try to understand how it works.

In my first post I forgot to mention that when answering the questions issued by the lxd init process right after the snap insall, at some point near the end, it is asked wether we would like to maintain images up-to-date. I supposed that it was a good idea, so I answered “yes”. However, these three lxd cluster should not do that at any point, they are fed containers from another stand-alone machine. Is there a way to revert this choice I made at init time ?

stgraber · April 7, 2020, 4:57pm

lxd sql global should show you the current state much more accurately than db.bin, so if it’s not in there, there’s a good chance it’s just db.bin being a bit outdated.

geodb27 · April 7, 2020, 5:03pm

I understand. However, how can I explain that machine 1 keeps trying to talk to machine 2 (or machine3) via the proxy while there is no more set of variables concerning this proxy either in environment or in the lxd’s config ? The processes have been restarted once from a shell without these values the last day of the 3.19’s life, and they were automatically restarted when the 4.0.0 snap was installed.
I clearly see in my squid’s access log request from 10.0.0.1 to 10.0.0.2:8443 (the two machines are in the same lxd cluster).

stgraber · April 7, 2020, 6:27pm

Have you checked that the lxd’s process environment is clean?

cat /proc/$(cat /var/snap/lxd/common/lxd.pid)/environ | tr '\0' '\n' would show you what’s in there, check that there is no proxy env variables left.