Current status of Docker Swarm in LXD: We can't get it to work

Hello

Running a simple Docker Swarm on LXD v.5.0 on Ubuntu Focal Kernel v. 5.13.0-40-generic - in a Ubuntu Focal LXD container - we see that the service containers we spin up are unreachable on the network. “Normal” Docker containers work fine and port forwarding works fine in those cases, but Docker Swarm is not functional and we cannot reach the containers neither via localhost or the LXD container public IP.

We have trawled through ancient posts 4+ years old on this subject and have tried literally everything suggested:

  • Made sure br_netfilter is enabled and as far as we could tell the old issues with that should have been solved in kernel v5.3 already
  • Enabled ipv4 forwarding as suggested in one post
  • Tried running the LXD container privileged

No obvious errors in the docker logs other than warnings concerning ip_vs kernel module not being available, but as far as I can gather that should not be the cause here (I may be wrong)

Given the lack of posts for a few years now on this topic, it almost seems to me this is a “solved issue” already and in that case we may be missing something obvious.

Does anybody out there have Docker Swarm working in LXD, and if so, what is the magic sauce? :slight_smile:

Please ask if you want/need any additional details

1 Like

Same problem. Also can’t find solution…

@Webdock.io try this on host:

#cat /etc/modules-load.d/modules.conf 
overlay
bonding
br_netfilter
iptable_mangle
iptable_nat
ip_vs
ip_vs_dh
ip_vs_ftp
ip_vs_lblc
ip_vs_lblcr
ip_vs_lc
ip_vs_nq
ip_vs_rr
ip_vs_sed
ip_vs_sh
ip_vs_wlc
ip_vs_wrr
nf_nat
xfrm_user
xt_conntrack
xt_MASQUERADE
sysctl net.ipv4.ip_forward=1

Hello

Did enabling all these modules on the host solve the issue? Which modules did you expose to the container (or are you running it privileged?)

Thanks

I too have this issue. Did any of you got it working?

@stgraber If you don’t mind, can you please have a look into this?

Any update on this?

@rnz Have you found a solution to this? Did you get it working?

No idea if this will help or not, but here is the profile I use for creating an LXD container for running kubernetes in LXD containers. Maybe it will be of help. I haven’t tried swarm in LXD yet. You will likely have to modify it somewhat because this uses “sw1a” which is an openvswitch so it will need modification there e.g. “lxdbr0”.

config:
limits.cpu: “4”
limits.memory: 8GB
limits.memory.swap: “false”
linux.kernel_modules: ip_tables,ip6_tables,nf_nat,overlay,br_netfilter
raw.lxc: “lxc.apparmor.profile=unconfined\nlxc.cap.drop= \nlxc.cgroup.devices.allow=a\nlxc.mount.auto=proc:rw
sys:rw\nlxc.mount.entry = /dev/kmsg dev/kmsg none defaults,bind,create=file”
security.nesting: “true”
security.privileged: “true”
description: Kubernetes LXD WeaveNet
devices:
eth0:
name: eth0
nictype: bridged
parent: sw1a
type: nic
root:
path: /
pool: local
type: disk
name: k8s-weavenet

@Gilbert_Standen Hmm. Okay. I see. Thanks for your input.

By the way, do you think Kubernetes will work in an “unprivileged” container? If not, as I see that you are running your container as a privileged one, are there any workarounds?

Not that this helps… I didn’t want to run as privileged but did the rest the same as you folks. In the end I saw a log message produced by Docker itself that indicated that some of the kernel modules weren’t be passed through to the container, and read elsewhere in this forum that they needing reviewed due to security concerns which is understandable.

For us Docker Swarm will have to live in LXD VMs for now, but with this we’re still at the same layer of abstraction as vSphere and others.

@mratt

Can you please share that list of kernel modules Docker depended (the ones you saw in the log) on when you tried to run your container unprivileged?

I’m also trying to run Kubernetes in LXD container.

Hi all,

I have been following this discussion and I was a bit irritated by the post of @rnz. Everything started to make sense after I read this article on the kubernetes blog on the kernel modules for virtual IP management.

I finalised my LXD config and have a fully working docker swarm with 4 nodes running in a small cluster of 3 boxes. None of the LXC containers (not VMs) runs privileged, which is a big pro.

Below you find the relevant profile section from my lxd preseed configuration.

profiles: 
- name: docker
  config:
    # the security settings are needed for docker
    security.nesting: true 
    security.syscalls.intercept.mknod: true 
    security.syscalls.intercept.setxattr: true

    linux.kernel_modules: bridge,ip_tables,ip6_tables,iptable_nat,iptable_mangle,netlink_diag,nf_nat,overlay,br_netfilter,bonding,ip_vs,ip_vs_dh,ip_vs_ftp,ip_vs_lblc,ip_vs_lblcr,ip_vs_lc,ip_vs_nq,ip_vs_rr,ip_vs_sed,ip_vs_sh,ip_vs_wlc,ip_vs_wrr,xfrm_user,xt_conntrack,xt_MASQUERADE

    # containers don't like swap ;)
    limits.memory.swap: false

    # limit the memory and cpu resources
    #   use the limits suited for your environment
    limits.memory: 16GB  
    limits.cpu: 2 

have fun

1 Like

@phish108 Which image are you using? Focal or Jammy?

@capriciousduck I have a fresh installation and use ubuntu jammy across the board.

Oh, I am using Jammy too but I can’t get it to work. I’m using the same config you posted here.

Can you please have a look at these logs:

May 26 08:13:11 testswarm dockerd[2012]: time="2022-05-26T08:13:11.807139983Z" level=error msg="Failed to delete firewall mark rule in sbox ingress (ingress): reexec failed: exit status 8"
May 26 08:13:11 testswarm dockerd[2012]: time="2022-05-26T08:13:11.980604635Z" level=warning msg="rmServiceBinding 8b4d8b4de33fc81610b11ef2a2ee7cded5d4921090c413e3b18e6b938b0b372d possible transient state ok:false entries:0 set:false "
May 26 08:13:11 testswarm dockerd[2012]: time="2022-05-26T08:13:11.980967837Z" level=error msg="Failed to delete real server 10.0.1.3 for vip 10.0.1.2 fwmark 257 in sbox lb_jv51 (lb-payf): operation not permitted"
May 26 08:13:11 testswarm dockerd[2012]: time="2022-05-26T08:13:11.981008803Z" level=error msg="Failed to delete service for vip 10.0.1.2 fwmark 257 in sbox lb_jv51 (lb-payf): operation not permitted"
May 26 08:13:12 testswarm dockerd[3207]: time="2022-05-26T08:13:12Z" level=error msg="set up rule failed, [-t mangle -D INPUT -d 10.0.1.2/32 -j MARK --set-mark 257]:  (iptables failed: iptables --wait -t mangle -D INPUT -d 10.0.1.2/32 -j MARK --set-mark 257: iptables: Bad rule (does a matching rule exist in that chain?).\n (exit status 1))"
May 26 08:13:12 testswarm dockerd[2012]: time="2022-05-26T08:13:12.064951917Z" level=error msg="Failed to delete firewall mark rule in sbox lb_jv51 (lb-payf): reexec failed: exit status 8"
May 26 08:13:13 testswarm dockerd[2012]: time="2022-05-26T08:13:13.009389653Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint jv51cr359vj2ydxwtvgn8tdwn da2a89b9a193cf9c2c6eb5ed4017c3c59fd9e354afa5ac827a92e989d6856a0b], retrying...."
May 26 08:13:35 testswarm dockerd[2012]: time="2022-05-26T08:13:35.535501937Z" level=error msg="Not continuing with pull after error: errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"
May 26 08:13:35 testswarm dockerd[2012]: time="2022-05-26T08:13:35.535562511Z" level=info msg="Ignoring extra error returned from registry: unauthorized: authentication required"
May 26 08:13:58 testswarm dockerd[2012]: time="2022-05-26T08:13:58.045445286Z" level=info msg="ignoring event" container=6b44e473f16ca563953b458d4cc8f345a6b955971067ba3580353c90b88aa278 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

And this is the stack I’m trying to deploy:

version: "3.9"

services:
  proxy:
    container_name: proxy
    image: nginx:stable-alpine
    ports:
      - "80:80"
      - "443:443"
    networks:
      intranet:
       

networks:
  intranet:
    driver: overlay
    attachable: false

I’m not sure where I’m doing wrong.

@capriciousduck, not sure what is going on on your system. From the logs I have the impression that your container is not directly connected to the net but bridged. In this setup the mothership appears to refuse altering the firewall rules on the way out.

In my setup all lxc containers are connected via OVS to the outside world. This is quite different from the simpler but default “bridged network“. In this setup each lxc container has its own independent NIC with all the bells and whistles being controlled by the container not by the host.

I also started by deploying a service manually before automating the deployment via a stack. This makes debugging much easier. Check if your service works with —entrypoint-mode dnsrr. If this is the case, then virtual IPs are not exposed to your lxc containers (yet). If you use the same profile as I do, try to restart your lxc nodes to pick up the updated profile.

Hello @phish108

Sorry for the delay.

With DNS round robin as well, things don’t work. Below are the logs.

Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.660584536Z" level=error msg="fatal task error" error="No such container: nginx.1.w07r4w164pc759uf4z325byuv" module=node/agent/taskmanager node.id=5cko4ckidad20i49igeym71go service.id=ntt82bbm3o1x6pm4ly43tizzp task.id=w07r4w164pc759uf4z325byuv
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.660038617Z" level=error msg="fatal task error" error="No such container: nginx.2.ypsjex8ak8858j3b2yxhpum2d" module=node/agent/taskmanager node.id=5cko4ckidad20i49igeym71go service.id=ntt82bbm3o1x6pm4ly43tizzp task.id=ypsjex8ak8858j3b2yxhpum2d
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.661176867Z" level=error msg="error reading the kernel parameter net.ipv4.neigh.default.gc_thresh1" error="open /proc/sys/net/ipv4/neigh/default/gc_thresh1: no such file or directory"
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.661221715Z" level=error msg="error reading the kernel parameter net.ipv4.neigh.default.gc_thresh2" error="open /proc/sys/net/ipv4/neigh/default/gc_thresh2: no such file or directory"
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.661256123Z" level=error msg="error reading the kernel parameter net.ipv4.neigh.default.gc_thresh3" error="open /proc/sys/net/ipv4/neigh/default/gc_thresh3: no such file or directory"
Jun 27 09:55:35 testswarm systemd[1]: Started Docker Application Container Engine.
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.685344941Z" level=info msg="API listen on /run/docker.sock"
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.793911931Z" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.794003500Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
Jun 27 09:55:35 testswarm dockerd[1211901]: time="2022-06-27T09:55:35.794035896Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_quiescent_template" error="open /proc/sys/net/ipv4/vs/expire_quiescent_template: no such file or directory"

Hey @phish108 - were you successful in publishing the ports to the each of the nodes with routing mesh?

Seems only to be possible with privileged containers, possibly due to this kernel restriction?