Socket Overload?


#1

Hi all,

I have an LXD 3.0.2 single server running ~250 containers.

I am seeing some very slow lxc commands, which sometimes completely freeze. When they freeze, I usually get “Error: Failed to fetch http://unix.socket/1.0: 500 Internal Server Error”, which requires me to restart lxd to keep talking to the API.

Any ideas?


#2

The LXD logs (lxd.log) should have some indication as to what could be the problem.

How do you manage 250 containers in terms of memory? Do they all fit in the memory of your server?


#3

Hi Simos,

LXD logs don’t show anything other than restarting LXD. Sometimes it is restarted cleanly, other time I need to restart it to be able to run lxc command again.

Here it is:

    lvl=info msg="LXD 3.0.2 is starting in normal mode" path=/var/lib/lxd t=2018-11-20T14:28:18+0000
    lvl=info msg="Kernel uid/gid map:" t=2018-11-20T14:28:18+0000
    lvl=info msg=" - u 0 0 4294967295" t=2018-11-20T14:28:18+0000
    lvl=info msg=" - g 0 0 4294967295" t=2018-11-20T14:28:18+0000
    lvl=info msg="Configured LXD uid/gid map:" t=2018-11-20T14:28:18+0000
    lvl=info msg=" - u 0 100000 65536" t=2018-11-20T14:28:18+0000
    lvl=info msg=" - g 0 100000 65536" t=2018-11-20T14:28:18+0000
    lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-11-20T14:28:18+0000
    lvl=info msg="Initializing local database" t=2018-11-20T14:28:18+0000
    lvl=info msg="Initializing database gateway" t=2018-11-20T14:28:18+0000
    address= id=1 lvl=info msg="Start database node" t=2018-11-20T14:28:18+0000
    lvl=info msg="Raft: Restored from snapshot 1-6639-1542707987637" t=2018-11-20T14:28:18+0000
    lvl=info msg="Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]" t=2018-11-20T14:28:18+0000
    lvl=info msg="Raft: Node at 0 [Leader] entering Leader state" t=2018-11-20T14:28:18+0000
    lvl=info msg="Dqlite: starting event loop" t=2018-11-20T14:28:18+0000
    lvl=info msg="LXD is socket activated" t=2018-11-20T14:28:18+0000
    lvl=info msg="Starting /dev/lxd handler:" t=2018-11-20T14:28:18+0000
    lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock t=2018-11-20T14:28:18+0000
    lvl=info msg="REST API daemon:" t=2018-11-20T14:28:18+0000
    lvl=info msg=" - binding Unix socket" socket=/var/lib/lxd/unix.socket t=2018-11-20T14:28:18+0000
    lvl=info msg="Initializing global database" t=2018-11-20T14:28:18+0000
    lvl=info msg="Dqlite: handling new connection (fd=21)" t=2018-11-20T14:28:18+0000
    lvl=info msg="Dqlite: connected address=0 attempt=0" t=2018-11-20T14:28:18+0000
    lvl=info msg="Initializing storage pools" t=2018-11-20T14:28:18+0000
    lvl=info msg="Initializing networks" t=2018-11-20T14:28:18+0000
    lvl=info msg="Pruning leftover image files" t=2018-11-20T14:28:18+0000
    lvl=info msg="Done pruning leftover image files" t=2018-11-20T14:28:18+0000
    lvl=info msg="Loading configuration" t=2018-11-20T14:28:18+0000
    lvl=info msg="Connected to MAAS controller" t=2018-11-20T14:28:18+0000
    lvl=info msg="Pruning expired images" t=2018-11-20T14:28:18+0000
    lvl=info msg="Done pruning expired images" t=2018-11-20T14:28:18+0000
    lvl=info msg="Updating instance types" t=2018-11-20T14:28:18+0000
    lvl=info msg="Expiring log files" t=2018-11-20T14:28:18+0000
    lvl=info msg="Done expiring log files" t=2018-11-20T14:28:18+0000
    lvl=info msg="Done updating instance types" t=2018-11-20T14:28:21+0000

Memory is plentiful and I am still playing with the limits I will set to containers, but usually I have around 20GB of free RAM.


#4

The logs show the launching of LXD at 14:28 and all lines are related to 14:28.
Is that the initial start of LXD or an unexpected restart?


#5

That is an “unexpected” start. When the socket stops responding, I get a restart like that without anything else in the log before that. If I have lxc monitor running, it exits with EOF.

Error: websocket: close 1006 (abnormal closure): unexpected EOF

An easy way to cause this, is to run lxc list one or two times.


(Stéphane Graber) #6

Any sign of a crash in journalctl -u lxd?


#7

I ran lxc list twice, and it crashed half way through the second run. This is what I get on crash time:

Nov 21 06:56:30 Hub lxd[69218]: fatal error: unexpected signal during runtime execution
Nov 21 06:56:30 Hub lxd[69218]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7f1d92760503]
Nov 21 06:56:30 Hub lxd[69218]: runtime stack:
Nov 21 06:56:30 Hub lxd[69218]: runtime.throw(0xffd87f, 0x2a)
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/runtime/panic.go:616 +0x81
Nov 21 06:56:30 Hub lxd[69218]: runtime.sigpanic()
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/runtime/signal_unix.go:372 +0x28e
Nov 21 06:56:30 Hub lxd[69218]: goroutine 37645 [syscall]:
Nov 21 06:56:30 Hub lxd[69218]: runtime.cgocall(0xdabfb2, 0xc420a07708, 0xc420a07710)
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/runtime/cgocall.go:128 +0x64 fp=0xc420a076d8 sp=0xc420a076a0 pc=0x40c4f4
Nov 21 06:56:30 Hub lxd[69218]: gopkg.in/lxc/go-lxc%2ev2._Cfunc_go_lxc_get_cgroup_item(0x7f1d400114a0, 0x7f1d1c0150b0, 0x0)
Nov 21 06:56:30 Hub lxd[69218]:         _cgo_gotypes.go:638 +0x4e fp=0xc420a07708 sp=0xc420a076d8 pc=0x8f134e
Nov 21 06:56:30 Hub lxd[69218]: gopkg.in/lxc/go-lxc%2ev2.(*Container).cgroupItem.func2(0x7f1d400114a0, 0x7f1d1c0150b0, 0x7f1d1c0150b0)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/gopkg.in/lxc/go-lxc.v2/container.go:780 +0x60 fp=0xc420a07740 sp=0xc420a07708 pc=0x901650
Nov 21 06:56:30 Hub lxd[69218]: gopkg.in/lxc/go-lxc%2ev2.(*Container).cgroupItem(0xc4206499e0, 0xfd43e9, 0xc, 0x0, 0x0, 0x0)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/gopkg.in/lxc/go-lxc.v2/container.go:780 +0x93 fp=0xc420a07798 sp=0xc420a07740 pc=0x8f7d23
Nov 21 06:56:30 Hub lxd[69218]: gopkg.in/lxc/go-lxc%2ev2.(*Container).CgroupItem(0xc4206499e0, 0xfd43e9, 0xc, 0x0, 0x0, 0x0)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/gopkg.in/lxc/go-lxc.v2/container.go:805 +0x8a fp=0xc420a077d8 sp=0xc420a07798 pc=0x8f7fda
Nov 21 06:56:30 Hub lxd[69218]: main.(*containerLXC).CGroupGet(0xc4200d8200, 0xfd43e9, 0xc, 0x0, 0x5dc, 0xc42047bed0, 0x2)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/container_lxc.go:3528 +0x8b fp=0xc420a07820 sp=0xc420a077d8 pc=0xbdfb4b
Nov 21 06:56:30 Hub lxd[69218]: main.(*containerLXC).processesState(0xc4200d8200, 0xc420a18390)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/container_lxc.go:5976 +0x39a fp=0xc420a078f8 sp=0xc420a07820 pc=0xbfb5ba
Nov 21 06:56:30 Hub lxd[69218]: main.(*containerLXC).RenderState(0xc4200d8200, 0xc4209c2254, 0xf, 0x10f9e60)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/container_lxc.go:3081 +0x91d fp=0xc420a07a20 sp=0xc420a078f8 pc=0xbdafbd
Nov 21 06:56:30 Hub lxd[69218]: main.containerState(0xc420304f70, 0xc420182800, 0xfc8d73, 0x3)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/container_state.go:32 +0x1a7 fp=0xc420a07ab0 sp=0xc420a07a20 pc=0xc21847
Nov 21 06:56:30 Hub lxd[69218]: main.(*Daemon).createCmd.func1(0x10e58a0, 0xc420514700, 0xc420182800)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:290 +0x78a fp=0xc420a07bd8 sp=0xc420a07ab0 pc=0xd8509a
Nov 21 06:56:30 Hub lxd[69218]: net/http.HandlerFunc.ServeHTTP(0xc4201c04b0, 0x10e58a0, 0xc420514700, 0xc420182800)
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44 fp=0xc420a07c00 sp=0xc420a07bd8 pc=0x695dc4
Nov 21 06:56:30 Hub lxd[69218]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc420246000, 0x10e58a0, 0xc420514700, 0xc420182800)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/gorilla/mux/mux.go:162 +0xed fp=0xc420a07d00 sp=0xc420a07c00 pc=0x6fe5ed
Nov 21 06:56:30 Hub lxd[69218]: main.(*lxdHttpServer).ServeHTTP(0xc4200adfd0, 0x10e58a0, 0xc420514700, 0xc420414300)
Nov 21 06:56:30 Hub lxd[69218]:         /build/lxd-Rxo8GA/lxd-3.0.2/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/api.go:76 +0xa4 fp=0xc420a07d58 sp=0xc420a07d00 pc=0xb91be4
Nov 21 06:56:30 Hub lxd[69218]: net/http.serverHandler.ServeHTTP(0xc4201bc1a0, 0x10e58a0, 0xc420514700, 0xc420414300)
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/net/http/server.go:2697 +0xbc fp=0xc420a07d88 sp=0xc420a07d58 pc=0x698afc
Nov 21 06:56:30 Hub lxd[69218]: net/http.(*conn).serve(0xc42047e320, 0x10e8120, 0xc420576d80)
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/net/http/server.go:1830 +0x651 fp=0xc420a07fc8 sp=0xc420a07d88 pc=0x694de1
Nov 21 06:56:30 Hub lxd[69218]: runtime.goexit()
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc420a07fd0 sp=0xc420a07fc8 pc=0x463091
Nov 21 06:56:30 Hub lxd[69218]: created by net/http.(*Server).Serve
Nov 21 06:56:30 Hub lxd[69218]:         /usr/lib/go-1.10/src/net/http/server.go:2798 +0x27b

(Stéphane Graber) #8

Ok, I think we found a fix for that particular one a while back (some very obscure Go threading issue) and that will be in the soon to be released 3.0.3.


#9

Thanks for that. I will patiently wait for 3.0.3 to make its way to xenial-backports.