Can't access container from internet with ovn network attached

Carsten · August 14, 2022, 9:01pm

Hi,

as suggested in this post, I have still a problem with ovn and lxd. May somebody can help or had a similar problem.

Configuration is same as described in the previous post.

What’s curious:

It isn’t possible to access the server, e.g. a nginx site, from the internet, but it works well for any local network like 192.1.2.x or 192.1.3.x. The problem just exits when a ovn network is add to the container. The problem doesn’t exists for the container with a normal lxd bridge network. The connections just ran into a timeout, no error message. It’s the same for http or https.

Configuration of the container:

Debian 11 Container
Installed Nginx
No firewall or something special installed

What I tried:

Check connection form outside with tcpdump -vvvv -i eth1 port 80shows:

20:06:38.826459 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    external_address.49398 > internal_adress.http: Flags [S], cksum 0xa580 (correct), seq 2377559074, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 2357842636 ecr 0,sackOK,eol], length 0

...

20:26:56.502962 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 48)
    external_address.49620 > internal_address.http: Flags [S], cksum 0x8138 (correct), seq 3326100349, win 65535, options [mss 1460,sackOK,eol], length 0
    
...

10 packets captured
10 packets received by filter
0 packets dropped by kernel

Check nginx error.log after adding debugto the error log config and tail -f /var/log/nginx/error.logdoesn’t show any error code.

So it’s complicated for me to isolate the symptom, because I have not realy an approach.

Anybody an idea?

Thanks in advanced.

tomp · August 16, 2022, 12:41pm

Can you explain the problem and what you’re doing to reproduce it in more detail please.

Carsten · August 16, 2022, 4:43pm

Hi Thomas,

may I describe my problem to short, it was late and my English isn’t that good.

I have a lxd cluster and implement ovn, as described here, to have a network connection between containers on different instances.

One container operate as a reverse proxy for all other containers and has two network interface:

eth0 (lxdbr0, internal lxd network bridge)
eth1 (br0, physical network bridge, local network)

In this constellation everything work fine.

For testing the ovn network, I implement a test container with the two network interfaces, where eth0 is connect to the ovn network. The container just have a nginx instance installed, that serve the standard “Welcome” page.

The lxd cluster is in a “dmz”.

Internet → Router → DMZ (Lxd Cluster) → Local Network

The nginx page on the test container is accessible from inside the “dmz” and the local network behind, but not from outside (Internet). Timeout with no error message in nginx, syslog etc.

As I describe in the first topic, there is traffic inside the container when connecting from outside, but without a ack signal. In nmap the port of nginx is show as filtered from outside, open from inside. There is no firewall installed.

As soon as I deactivate eth0, it is possible to access the nginx page on the test container from outside. I check that on several containers. It’s curious.

Hopefully it is more understandable.

Thanks.

tomp · August 18, 2022, 7:57am

Thanks for the explanation. Its getting clearer but I still have questions.

You said “I have a lxd cluster and implement ovn, as described here, to have a network connection between containers on different instances.” but then went on to say:

"One container operate as a reverse proxy for all other containers and has two network interface:

eth0 (lxdbr0, internal lxd network bridge)
eth1 (br0, physical network bridge, local network)

"

But what confuses me is that neither of these interfaces is connected to an OVN network.

To clarify your actual setup, please can you show the output of:

lxc network show <OVN network>
lxc network show <uplink network for OVN network>
lxc config show <nginx instance> --expanded
lxc config show <reverse proxy instance> --expanded
ip a and ip r from the LXD host(s) and from inside both instances.

Carsten · August 18, 2022, 5:17pm

Thanks for your help. Sorry may my English is confusing.

I just want to make clear that in this constellation the reverse proxy is working as expected. I wrote after the passage:

“In this constellation everything work fine.”

And then:

“For testing the ovn network, I implement a test container …, where eth0 is connect to the ovn network”

So in the test container a ovn network is still connected to eth0.

Currently I’m not able to send you the configuration. I post them as soon as possible.

Carsten · August 31, 2022, 7:09pm

Sorry for late reply. Here are the output:

lxc network show <OVN network>

config:
  bridge.mtu: "1442"
  ipv4.address: 10.206.12.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:99f8:d3cc:80e1::1/64
  ipv6.nat: "true"
  network: ovn-uplink
  volatile.network.ipv4.address: 192.168.17.201
description: ""
name: lxdovn0
type: ovn
used_by:
- /1.0/instances/container1
- /1.0/instances/container2
managed: true
status: Created
locations:
- node1
- node2

lxc network show <uplink network for OVN network>

config:
  dns.nameservers: 192.168.17.36
  ipv4.gateway: 192.168.17.1/24
  ipv4.ovn.ranges: 192.168.17.200-192.168.17.211
  volatile.last_state.created: "false"
description: ""
name: ovn-uplink
type: physical
used_by:
- /1.0/networks/lxdovn0
managed: true
status: Created
locations:
- node1
- node2

lxc config show <nginx instance> --expanded

architecture: aarch64
config:
  image.architecture: arm64
  image.description: Debian bullseye arm64 (20211225_07:49)
  image.os: Debian
  image.release: bullseye
  image.serial: "20211328_07:49"
  image.type: squashfs
  image.variant: cloud
  volatile.base_image: 36fe6744706815a37f63d943bc813bfa728c53276c9f27e740afbf5f3f4ffc3
  volatile.cloud-init.instance-id: 8fc69d2d-bedb-4b79-9863-34a29a01b718
  volatile.eth0.host_name: veth7394c989
  volatile.eth0.hwaddr: 00:16:3e:bc:1a:34
  volatile.eth0.name: eth0
  volatile.eth1.host_name: vethfb5856ef
  volatile.eth1.hwaddr: 00:16:3e:ed:6d:21
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 48221a04-36f0-4906-9870-df49ea083420
devices:
  eth0:
    ipv4.address: 10.206.12.2
    network: lxdovn0
    type: nic
  eth1:
    name: eth1
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: pool01
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

lxc config show <reverse proxy instance> --expanded
At the moment, for testing, the reverse proxy does act as normal nginx endpoint and provides a simple html site. The reverse functionality is not the reason for the problem.
ip a and ip r from the LXD host(s) and from inside both instances.
Just for container1:
ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
182: eth0@if183: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:bc:1a:66 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.206.12.2/24 brd 10.206.12.255 scope global dynamic eth0
       valid_lft 2588sec preferred_lft 2588sec
184: eth1@if185: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:ed:6d:89 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.17.37/24 brd 192.168.17.255 scope global dynamic eth1
       valid_lft 826676sec preferred_lft 826676sec

ip r

default via 10.206.12.1 dev eth0 
10.206.12.0/24 dev eth0 proto kernel scope link src 10.206.12.2 
192.168.17.0/24 dev eth1 proto kernel scope link src 192.168.17.37

Hopefully that helps.

Thanks!

tomp · September 1, 2022, 7:36am

If the ip r output you showed above is from the proxy container then I can see the problem.

The container’s default gateway is via 10.206.12.1 which is the default gateway on the lxdovn0 network.

So if external ingress packets arrive at eth1 in the proxy container then any response packets generated by the container will be sent out of eth1 to 10.206.12.1, which won’t know about the associated inbound packets and will drop them.

So you should ensure that the default gateway in the proxy container is using the default gateway on the eth1/br0 network.

Carsten · September 4, 2022, 7:30am

Hi Thomas, that’s it. Seems to work. I’m feeling a little bit stupid, don’t reflect that issue. May because for the local network (192.178.x.x), wich is connected to eth1/br0 too, everything works fine.

I delete the rout with ip route del default and add a new one with ip route add default via 192.168.x.x dev eth1.

Thanks a lot.