LXD Proxy Device tcp not responding

I have a basic container with lighttpd, after adding the proxy device it does not respond to any requests on the host, checked with curl -I localhost.

The container has lighttpd installed and is running on port 80.

I can reproduce it with

lxc launch ubuntu:bionic webserver
lxc config device add webserver http proxy listen=tcp:0.0.0.0:80 connect=tcp:localhost:80
lxc exec webserver -- apt-get update
lxc exec webserver -- apt-get -y install lighttpd

I have run tcpdump and on the host I can see it being received, but in the container it never gets received.

kernel: 4.19.20
lxd --version: 3.11
lxc --version: 3.11

I have checked that forkproxy is running and the log(/var/snap/lxd/common/lxd/logs/webserver/proxy.http.log) is empty.

curl shows the following and never closes

:~$ curl -I -v localhost
* Rebuilt URL to: localhost/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 80 (#0)
> HEAD / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
> 

Curl is resolving localhost to the ipv6 address. Can you try with 127.0.0.1?

Unfortunately that returns the same result.

:~$ curl -I -v 127.0.0.1
* Rebuilt URL to: 127.0.0.1/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> HEAD / HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.58.0
> Accept: */*
> 

I have tried with your detailed steps and it works with me (both nginx and lighttpd). I used the same LXD version, but in my case the kernel is the stock for Ubuntu 18.04.

Since the forkproxy process is running and curl manages to open a connection,
you should get some error in /var/snap/lxd/common/lxd/logs/webserver/proxy.http.log.
Something like

Error: Failed to connect to target: dial tcp 127.0.0.1:80: connect: connection refused
Failed to prepare new listener instance: dial tcp 127.0.0.1:80: connect: connect

If you do not get any errors like that in the log, then it means that a consumer (in the container) accepted the connection and simply closed it without returning something back.
In your case, I would strace the lighttpd process in the container to check if it is indeed receiving the connection. Then, do the same with the forkproxy process. It’s an interesting puzzle and I would like to know the answer.

Something I hadn’t thought about is the kernel, I am running Armbian on a ARM64 device, it could be the kernel configuration is missing a module.

I will try strace, and then look into the configuration as well.

It’s probably a kernel configuration, it works outside of Snap.
The strace of lighttpd showed it was not receiving the request.
And to check forkproxy I had to strace attach pid, it didn’t show any request.

In Snap I could curl directly to the lxdbr0 ip the container had, and it would receive the page fine.

Now to figure out why with the Snap LXD it is not receiving the connection.

Can you give some info like which dev board you have and which kernel it runs? Would be useful for future reference.

Board: Libre Computer Le Potato based on S905X-CC AKA meson64
Rootfs: Ubuntu Bionic from debootstrap
Kernel: 4.19.20 from kernel 4.19.y with patches.

I checked the Armbian build system for kernel source and patches.

I tried installing lxd snap in devmode however it still didn’t work, I am not sure how to determine what the cause is.

Try with

lxd.check-kernel

It is a utility provided by the snap. It uses the .config of your Linux kernel.

Also, if the Linux kernel .config is somewhere available online, you can even run the following and get a nice report of what’s in there and what’s missing.

CONFIG=mykernel.config lxd.check-kernel

There are no disabled modules, only these not loaded.

--- Namespaces ---
newuidmap is not installed
newgidmap is not installed

--- Misc ---
Macvlan: enabled, not loaded
Vlan: enabled, not loaded
Advanced netfilter: enabled, not loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded

Compared to a fresh Ubuntu 18.04 x86_64 install it looks similar.

I would suggest to load these modules (obviously on the host) and then restart the LXD snap.
As far as I understand, when you connect from the host to the container using the proxy device, it does not make use of netfilter/masquarade. But try them anyway for completeness.

The warnings under the Namespaces section should not be relevant for this LXD snap package.

I loaded all the modules which were not loaded, and it did not improve.

I can’t think of what would cause it to work fine without snap, but not in snap even in devmode which as far as I know doesn’t have any protection.

I also rebuilt the kernel with recent changes that enabled netfilter modules.

Now that I have a fresh Ubuntu 18.04 x86_64(in a VM) installed, I decided to try to strace a few things, to see if there was any difference.

On Ubuntu strace -f -p {forkproxy_pid} causes it to close, which didn’t happen when I ran strace on Armbian.

On Armbian it shows many calls of
epoll_pwait(8, [{EPOLLIN, {u32=0, u64=0}}], 10, -1, NULL, 8) = 1
and
nanosleep({tv_sec=1, tv_nsec=10000000}, NULL) = 1
Originally I thought this was just a loop waiting for requests, however by filtering those out this line became of interest.
<futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
Could this be the culprit? Is this not an event of waiting for a request to be made, and is it having difficulty establishing a connection to the container?

Indeed, if you try to strace the forkproxy process (from the LXD snap package), the process dies. You may need to restart the container that has the proxy device in order for LXD to start it again.
I assume that it’s a feature, a security feature :-). But I do not think that forkproxy checks for ptrace(2).
I do not think it’s a general snap feature because you can strace other snap processes like lxd itself.

@brauner, should there be new issues filed for these on github?

  1. That the forkproxy process dies when you try to strace -f -p. Workaround: restart the container.
  2. That forkproxy (TCP to TCP) does not seem to work on mainline (arm64). That’s from the LXD snap package, while it works with the LXD deb package.

@nomi: How did you install snapd on Armbian? From which repository did it come from?

apt-cache policy snapd shows http://ports.ubuntu.com bionic-updates/main arm64 Packages

I just added a bunch of printf to forkproxy for debugging, it will wait on line 673 until curl -I -v 127.0.0.1 on host, then it starts looping not waiting on that line anymore, it always continues on line 683

Looking into this further on line 590 f.Fd() returns a value of 4, and on line 680 where it is accessed the key being used is a value of 0, on Ubuntu x86_64 this value is 4.

This is good analysis and it’s helpful for the bug report on github.

I had the opinion that you had this issue only with the snap package of LXD but not the deb package.
Looking back this thread, I do not find any such reference and most likely I cached some info from a different thread of this forum.
Can you verify that there is no difference in this issue on ARM64, whether LXD is a snap package or is compiled (or deb package)?

On ARM64 it only has the issue with the snap package. The deb package(v3.0.3) works fine, which just made me realize that forkproxy had changes between 3.0.3 and 3.11, sorry I missed this.

Though the LXD that I compiled was only used to debug forkproxy, attaching the listener to the snap LXD container, it worked on Ubuntu x86_64, but not ARM64.

As an experiment I hardcoded the curFd value on line 680 to 4 and it worked fine.

So this is indeed the problem, It seems to be setting the listenerMap variable correctly on line 663, on line 680 it is 0 though.

After a good nap I have to correct myself. It should be happening on both the snap package and deb package, it worked on the repository deb package(version 3.0.3) because forkproxy had changes since 3.0.3 which added epoll calls.

After looking into the problem, This issue is happening on LXD 3.11 with both(snap + deb/compiled)

Somewhere between the epoll_ctl call and the epoll_wait call it appears to be returning the wrong value, I haven’t had time to look further into this however.

Well I managed to fix the two lines. If only one line was changed, it wouldn’t work.

Change line 663 to *(*C.int)(unsafe.Pointer(&ev.data)) = C.int(f.Fd())
and
Change line 680 to curFd := *(*C.int)(unsafe.Pointer(&events[i].data))

This works on both ARM64 and x86_64, tested with my http proxy example.

For some reason the struct epoll_event(uint32_t events, epoll_data_t data) is not matching with the unsafe.Sizeof call. On ARM64 the epoll_data_t is 8 bytes after the event pointer, on x86_64 it is 4.

On ARM64

unsafe.Pointer(&ev) = 0x44202fa4d0
unsafe.Sizeof(ev.events) = 4
# correct, 4 bytes after the pointer
unsafe.Pointer(uintptr(unsafe.Pointer(&ev) + unsafe.Sizeof(ev.events)) = 0x44202fa4d4
# 8 bytes after the pointer?
unsafe.Pointer(&ev.data) = 0x44202fa4d8

On X86_64

unsafe.Pointer(&ev) = 0x4202ec000
unsafe.Sizeof(ev.events) = 4
# 4 bytes after the pointer
unsafe.Pointer(uintptr(unsafe.Pointer(&ev) + unsafe.Sizeof(ev.events)) = 0x4202ec004
# same
unsafe.Pointer(&ev.data) = 0x4202ec004

Note: This is as far as I will go, I will let you take it from here. I really enjoyed the ride! :smile:

@brauner can you look at this and send a PR with the right fix?