Apt/curl stopped working in all containers, anybody else?

jzerohn · August 2, 2018, 1:05pm

Hi everybody,

I’m using lxd for quite a while now and usually I’m very happy with it. I guess my setup with the default bridge and zfs storage is pretty standard. However, yesterday I tried to install apache in a container (bionic) and noticed that apt does not work anymore. I’m just getting connection timeouts.

So far I checked the following things:

Rebooting the host/containers does not solve the problem (but everything starts up fine)
apt/curl on the host (bionic) works fine
iptables rules of the host seem fine
apt/curl stopped working on all containers (all bionic), not only the one mentioned above
Firewall rules in the containers seem fine, but it also does not work when the firewall is deactivated
Launching a new container with a different distribution (xenial) has the same connection problems
ping works from and to every container
Calling host in the containers resolves the internal ips of the lxd-network correctly and also works for the rest of the web
curl from a container to google.com returns ‘network is unreachable’, curling the ip returns with a ‘connection timed out’
Disabling IPv6 in /etc/gai.conf does not solve the problem.
The lxd logs do not look suspicious (at least not to my eyes)
Pinging the bridge (IPv4) works from the host, but not from the container

At first I thought I traced the error to a problem with the bridge, but the funny thing is: Nat-forwarded connections to the reverse proxy container and displaying the corresponding websites in other containers still work fine.

I don’t know what to try next. Anybody else with the same problem or a suggestion how to debug?

Greetings,
Jan

simos · August 2, 2018, 2:06pm

Hi!

So the networking of the containers got messed up, and you are using the default bridge that you get in LXD.
LXD managed the lxdbr0 bridge, and spawns a dnsmasq process that does the autoconfiguration for the containers. dnsmasq acts as a DHCP server and provides to each container the IP address, default route and DNS server IP (itself).

Can you check with lxc list on the host whether your containers manage to get IP addresses?
If they do not get IP addresses, then something is wrong with dnsmasq.
If your containers do get IP addresses, then the mystery gets bigger.
You can check the default profile with lxc profile show default and it should show the default network settings. Then, you can lxc network show lxdbr0 and it should show the details of the lxcbr0 network.
It would be good to post these here (of course remove first any personal info from the output).
If you run ps ax | grep dnsmasq | grep lxd, it will show you the full command line of LXD’s dnsmasq that is running.
If you do not get a dnsmasq, then it is likely that you have installed bind or another DNS server on the host, and that service managed to bind first to port 53 (DNS). Therefore, LXD’s dnsmasq could not initialize and it would not run at all. If you get that, then you would need to edit the configuration of the other DNS server and make it not bind onto lxdbr0 (so that LXD’s dnsmasq can do it instead).

jzerohn · August 2, 2018, 2:58pm

Hi Simos, thanks for your reply!

Every container has its own IP address from the correct range. Looks as usual.

lxc profile show default

config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd
    type: disk
name: default
used_by:
- /1.0/containers/containerOne
and so on... for all containers/snapshots

lxc network show lxdbr0

config:
  ipv4.address: 10.200.100.1/24
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/containers/containerOne
  and so on... for all containers
managed: true
status: Created
locations:
- none

I get an output for the ps ax command with some arguments referring to the lxdbr0 interface, the dhcp-range etc. and no errors or warnings.

Everything seems pretty normal and I cannot remember changing anything on the host, except creating new containers and forwarding IPs to the reverse proxy container. The host itself runs a bare minimum of services.

simos · August 2, 2018, 4:37pm

How do you forward the connections to the reverse proxy?

Normally, you would use an iptables rule per each port. Something similar to

PORT=80 PUBLIC_IP=your_server_ip CONTAINER_IP=your_container_ip \
sudo -E bash -c 'iptables -t nat -I PREROUTING -i eth0 -p TCP -d $PUBLIC_IP --dport $PORT -j DNAT --to-destination $CONTAINER_IP:$PORT -m comment --comment "forward to the Nginx container"'

I cannot think of another issue other than an iptables rule. If you do not see something weird at iptables -L -n -t nat, then you can try using tcpdump or tshark in order to check on which interface your connections get blocked.

As a sidenote, in LXD 3.3 it is also possible to use the proxy device (therefore no iptables).

jzerohn · August 2, 2018, 10:14pm

tcpdump revealed the culprit. I’ve just discovered that all outgoing connections were routed to the reverse proxy container.

This was caused by invalid iptables forwarding rules, which did not include the public IP of the server. With your command (which is the one that I thought I used, but apperently did not) everything works fine. Correct and invalid rules look quite similar in the iptables list. Neverthless, I’m sorry for not noticing this in the first place.

Thank you very much for your support simos!