Curl/git don't work properly inside an lxd container on wsl/ubuntu instance and disconnects frequently

I raised this as a bug (curl/git don't work properly inside an lxd container on wsl/ubuntu instance and disconnects frequently · Issue #11246 · lxc/lxd · GitHub) but got requested to discuss it here. The issue I have is that using curl inside the lxd container to download an (large) network resource has peer disconnects, whereas running it outside the lxd container nothing happens and curl finishes nicely.

Required information

Ubuntu 22.10
Kernel (5.15.79.1)
Snapcraft (7.2.9.post31+git31a3990c)
LXD (5.9-76c110d)
curl (7.81.0)

Issue description

The issue is intermittant, but when run inside the lxd container curl and git terminate the download due to a peer disconnected error. The behavior seems affected by the internet connection i.e. some wireless networks the problem doesn’t occur at all, on others the problem always occurs.

Steps to reproduce

Use snapcraft (Snapcraft.yaml reference | Snapcraft documentation) and run command(s) inside the (lxd-)container that fetches a large resource from internet such as git clone or flutter precache (invokes curl to download the flutter sdk and dart sdk ). For more details see: ubuntu - How to fix unstable network for lxc/lxd container that causes curl/git commands to terminate only in the container? - Stack Overflow

For transparency, my ubuntu installation runs on windows (wsl), the ubuntu installation in general seems to behave fine, has no problem with downloads in general. It’s only inside the lxd container it misbehaves.

This could be an environmental MTU issue.

What is the output of ip a and ip r on the host and inside the container when connected to a problematic wifi network?

What does lxc network show <network> and lxc config show <instance> --expanded show?

> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 12:19:2d:82:86:59 brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 9e:bc:7b:f4:33:79 brd ff:ff:ff:ff:ff:ff
4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
5: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:02:96:ff brd ff:ff:ff:ff:ff:ff
    inet 172.24.126.152/20 brd 172.24.127.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe02:96ff/64 scope link
       valid_lft forever preferred_lft forever
7: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:6e:6a:18 brd ff:ff:ff:ff:ff:ff
    inet 10.141.116.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fd42:5bc8:65c1:cfc6::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe6e:6a18/64 scope link
       valid_lft forever preferred_lft forever
23: veth37ea9c21@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether 16:5c:de:94:45:68 brd ff:ff:ff:ff:ff:ff link-netnsid 1

>ip r
default via 172.24.112.1 dev eth0 proto kernel
10.141.116.0/24 dev lxdbr0 proto kernel scope link src 10.141.116.1
172.24.112.0/20 dev eth0 proto kernel scope link src 172.24.126.152

ip a & ip r on ubuntu machine

> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
22: eth0@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:1a:0e:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.141.116.123/24 metric 100 brd 10.141.116.255 scope global dynamic eth0
       valid_lft 2089sec preferred_lft 2089sec
    inet6 fd42:5bc8:65c1:cfc6:216:3eff:fe1a:e05/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 3408sec preferred_lft 3408sec
    inet6 fe80::216:3eff:fe1a:e05/64 scope link
       valid_lft forever preferred_lft forever

> ip r
default via 10.141.116.1 dev eth0 proto dhcp src 10.141.116.123 metric 100
10.141.116.0/24 dev eth0 proto kernel scope link src 10.141.116.123 metric 100
10.141.116.1 dev eth0 proto dhcp scope link src 10.141.116.123 metric 100

ip a & ip r on lxd container (via sudo snapcraft pull source --use-lxd --verbosity=verbose --shell)

> sudo lxc --project snapcraft network list
+--------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
|  NAME  |   TYPE   | MANAGED |      IPV4       |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+--------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| bond0  | bond     | NO      |                 |                           |             | 0       |         |
+--------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| eth0   | physical | NO      |                 |                           |             | 0       |         |
+--------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| lxdbr0 | bridge   | YES     | 10.141.116.1/24 | fd42:5bc8:65c1:cfc6::1/64 |             | 4       | CREATED |
+--------+----------+---------+-----------------+---------------------------+-------------+---------+---------+

lxc network list

> sudo lxc --project snapcraft config show local:snapcraft-liquid-pos-on-amd64-for-amd64-562949953981096 --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu buildd jammy amd64
  image.os: Ubuntu
  image.series: jammy
  raw.idmap: both 1000 0
  security.syscalls.intercept.mknod: "true"
  volatile.base_image: fd60987aa0ac97cbe2027d3a9e3b75dc1bc87f9f278096a99a106fb9e594af39
  volatile.cloud-init.instance-id: f3fb1346-78b5-45ee-a2a7-1758acc9f23d
  volatile.eth0.hwaddr: 00:16:3e:1a:0e:05
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":0,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000001,"Nsid":1,"Maprange":999999999},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":0,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000001,"Nsid":1,"Maprange":999999999}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":0,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000001,"Nsid":1,"Maprange":999999999},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":0,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000001,"Nsid":1,"Maprange":999999999}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 5326a8a2-43a9-4d79-831a-10aab480e9ad
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

lxc config show --expanded

> sudo lxc --project snapcraft network show lxdbr0
config:
  ipv4.address: 10.141.116.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:5bc8:65c1:cfc6::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/snapcraft-liquid-pos-on-amd64-for-amd64-562949953981096?project=snapcraft
- /1.0/instances/snapcraft-liquid-pos-on-amd64-for-amd64-8725724278304174?project=snapcraft
- /1.0/profiles/default
- /1.0/profiles/default?project=snapcraft
managed: true
status: Created
locations:
- none

lxd network show <network>

@tomp I captured tcpdump while replicating the download see info.md · GitHub the thing that stands out to me between ubuntu → lxc the number of packets almost double.

Can you try doing lxc network set lxdbr0 bridge.mtu=1400 and then restarting the problem containers and see if that helps.

It didn’t work, with mtu 1500 most of the time it terminates around 6% and occassionally it goes to 20 or 60% of the download, but with the mtu 1400 first time it reached 17%, and the second time 37% and then curl continues get stuck, looks like it’s still transfering but it’s not receiving any data?

And you confirmed that the interface inside the container got the 1400 MTU by running ip l and checking?

well I run ifconfig inside the container, but yes:

Launching shell on build environment...
snapcraft-liquid-pos-on-amd64-for-amd64-8725724278304174 ../project# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1400
        inet 10.141.116.194  netmask 255.255.255.0  broadcast 10.141.116.255
        inet6 fe80::216:3eff:feae:64aa  prefixlen 64  scopeid 0x20<link>
        inet6 fd42:5bc8:65c1:cfc6:216:3eff:feae:64aa  prefixlen 64  scopeid 0x0<global>
        ether 00:16:3e:ae:64:aa  txqueuelen 1000  (Ethernet)
        RX packets 59  bytes 21259 (21.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 69  bytes 8876 (8.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Hrm yeah, I’m not sure then.

If it only happens on some wifi connects to some locations it suggests some sort of external issue with the upstream ISP.

Does it work OK on the same ISP but with ethernet?

Unfortunately I don’t have any cables here to double check directly with ethernet cables but I did make some tcpdumps before see my gist above very distinctively it shows an increase of retransmission packets between ubuntu and lxc - from 2-3% to 22-23%.

One thing that I’m going to try out to see if i can replace the lxdbr0 with a macvlan so traffic will flow from ( router → windows → lxc ) instead of (router → windows → ubuntu → lxc).

Also fyi I have three location in total, two of them problematic, one of them works. All three use the same provider.

@tomp I didn’t have any luck getting macvlan working, but afterwards I had to whipe my ubuntu and couldn’t and now snapcraft is getting peer disconnected downloads- so I’ve reproduced the problem with this minimum example:

echo -e "[boot]\nsystemd=true" | sudo tee /etc/wsl.conf > /dev/null
sudo apt-get update && sudo apt-get upgrade
wsl.exe --shutdown
sudo snap install lxd
sudo lxd init
sudo lxc launch images:ubuntu/focal wired-bluejay
sudo lxc exec wired-bluejay -- curl -o dart-sdk-linux-x64.zip https://storage.googleapis.com/flutter_infra_release/flutter/472e34cbbcd461c748973e7e735558ab200d4f5e/dart-sdk-linux-x64.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  8  246M    8 20.8M    0     0  2780k      0  0:01:30  0:00:07  0:01:23 2951k
curl: (56) OpenSSL SSL_read: Connection reset by peer, errno 104

I also installed a virtualbox installation of ubuntu 22.10, while connected to trouble some wlan connection it /works/ out of the box. What are things that I can try to diagnose?

So you are saying that virtualbox with LXD container works OK, but LXD container inside WSL has problems?

It certainly looks that way. Same machine, same wifi network, it’s the only machine I have for testing though so am limited in what information I can provide.

phr34k@LAPTOP-9M3JQBHP:~$ sudo lxc exec wired-bluejay -- curl -o dart-sdk-linux-x64.zip https://storage.googleapis.com/flutter_infra_release/flutter/472e34cbbcd461c748973e7e735558ab200d4f5e/dart-sdk-linux-x64.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 10  246M   10 25.8M    0     0  1859k      0  0:02:15  0:00:14  0:02:01 2100k
curl: (56) OpenSSL SSL_read: Connection reset by peer, errno 104
phr34k@LAPTOP-9M3JQBHP:~$ uname -a
Linux LAPTOP-9M3JQBHP 5.15.79.1-microsoft-standard-WSL2 #1 SMP Wed Nov 23 01:01:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
phr34k@ubnt:~$ sudo lxc exec wired-bluejay -- curl -o dart-sdk-linux-x64.zip https://storage.googleapis.com/flutter_infra_release/flutter/472e34cbbcd461c748973e7e735558ab200d4f5e/dart-sdk-linux-x64.zip
[sudo] password for phr34k:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  246M  100  246M    0     0  1828k      0  0:02:18  0:02:18 --:--:-- 1721k
phr34k@ubnt:~$ uname -a
Linux ubnt 5.19.0-26-generic #27-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 23 20:44:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@tomp to revisit this. I believe this is reproducable with vanilla ubuntu installation also. The other day I was having a play with ubuntu installed on virtualbox. I assigned bridged network to virtual machine running ubuntu. Inside of the lxd containers my curl downloads started to fail in an identical fashion, whereas the ones done outside lxd worked fine. Is this something you might be able to take a second pass at?

Did you trying lowering your MTU to 1300 on the bridge and in the container?

I did have a play around with that but didn’t really seem to make a difference, but maybe you have any luck reproducing the behavior on your end those new insights.

I occasionally get somewhat similar behaviour - particularly in big apt installs, it will fail after a couple of dozen file requests and I have to redo it, or it will take a long time to pull all the headers. This is on a wired connection but the internet connection itself is wireless (5G).

I had automatically assumed it was my fault in my own environment and was bracing myself to have to troubleshoot, but maybe it’s a bug after all? It happens under Ubuntu 22.10 (since that’s all I use now), but it used to happen occasionally under Debian 11 as well.

@hereisjames the behavior is consistent for me, can reproduce it easilly. I dont know if its an os/kernel or an lxd specific thing.

My linux expierence is limited. I’ve reported it here @lxd, @wsl issue tracker and on ubuntus launchpad.

I spent a fair amount of time getting collecting network traces to diagnose. Would be great if somebody had some luck reproducing this all!