Adding/removing unix-char devices from container occasionally hangs lxd, requires restart

I’m looking for some guidance in debugging an issue I’m seeing across all of my deployments.

As a general background, I have a daemon written in python3 that runs on ubuntu 20.04 and 22.04 (so lxd versions 5.0.1 and 4.0.9). It monitors udev for add and remove events, and adds/removes the device (typically a unix-char device) for a particular container.

Occasionally, an lxd device add/remove will block forever. When this state is encountered, any further device add/remove to any container will similarly hang.

At that point, I need to restart lxd (I do “sudo systemctl restart snap.lxd.daemon”) to get lxd to be responsive again.

The python3 daemon uses pylxd to interact with lxd, although I have also tried doing the device add/remove using the python subprocess module and lxc command-line. The same issue occurs in either case, so I don’t think it makes a difference.

I’m not sure how to debug this. I’m on ubuntu 22.04 and 20.04. The problem occurs in both environments. I’ve looked at journalctl and didn’t see anything particularly interesting. Is there another log happening anywhere that I can take a look at? Any other guidance or theories or things to try to further debug this issue? I’ll try anything at this point.

Hmm, that’s a bit of an odd one.

The way I’d try to debug this is:

This dumps all ongoing goroutines in LXD, the curl can be done whenever you get a hang.

1 Like

Thanks for the logging tip, and thanks for helping!

Got the hang and grabbed the log. Would you mind taking a look to see if there’s anything interesting in there?

Debug Log
goroutine 3608158 [running]:
runtime/pprof.writeGoroutineStacks({0x1be3b40, 0xc00025a000})
	/snap/go/9952/src/runtime/pprof/pprof.go:694 +0x70
runtime/pprof.writeGoroutine({0x1be3b40?, 0xc00025a000?}, 0x0?)
	/snap/go/9952/src/runtime/pprof/pprof.go:683 +0x2b
runtime/pprof.(*Profile).WriteTo(0x17d0e00?, {0x1be3b40?, 0xc00025a000?}, 0xc?)
	/snap/go/9952/src/runtime/pprof/pprof.go:332 +0x14b
net/http/pprof.handler.ServeHTTP({0xc0010440a1, 0x9}, {0x1bebf60, 0xc00025a000}, 0x2e302e302e373231?)
	/snap/go/9952/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0x1bebf60?, 0xc00025a000}, 0xc000194700)
	/snap/go/9952/src/net/http/pprof/pprof.go:371 +0x13e
net/http.HandlerFunc.ServeHTTP(0x7fce3b9ff3c0?, {0x1bebf60?, 0xc00025a000?}, 0x418b85?)
	/snap/go/9952/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc0010440ab?, {0x1bebf60, 0xc00025a000}, 0xc000194700)
	/snap/go/9952/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000b6e270?}, {0x1bebf60, 0xc00025a000}, 0xc000194700)
	/snap/go/9952/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc0006afea0, {0x1becc60, 0xc0001cd440})
	/snap/go/9952/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/snap/go/9952/src/net/http/server.go:3071 +0x4db

goroutine 1 [select, 969 minutes]:
main.(*cmdDaemon).Run(0xc00011ae70, 0x0?, {0xc000121800, 0x0, 0x0?})
	/build/lxd/parts/lxd/src/lxd/main_daemon.go:83 +0x63f
github.com/spf13/cobra.(*Command).execute(0xc000179680, {0xc00003c0b0, 0x4, 0x4})
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0xc000179680)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918
main.main()
	/build/lxd/parts/lxd/src/lxd/main.go:220 +0x1a49

goroutine 26 [IO wait, 969 minutes]:
internal/poll.runtime_pollWait(0x7fce3b9ff870, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000188b00?, 0x2?, 0x0)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000188b00)
	/snap/go/9952/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000188b00)
	/snap/go/9952/src/net/fd_unix.go:172 +0x35
net.(*UnixListener).accept(0x4a2fa6?)
	/snap/go/9952/src/net/unixsock_posix.go:166 +0x1c
net.(*UnixListener).Accept(0xc0001a85d0)
	/snap/go/9952/src/net/unixsock.go:260 +0x3d
net/http.(*Server).Serve(0xc0002a01c0, {0x1bebd80, 0xc0001a85d0})
	/snap/go/9952/src/net/http/server.go:3039 +0x385
github.com/lxc/lxd/lxd/endpoints.(*Endpoints).serve.func1()
	/build/lxd/parts/lxd/src/lxd/endpoints/endpoints.go:429 +0x25
gopkg.in/tomb%2ev2.(*Tomb).run(0xc00064e960, 0xc0001302d0?)
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x36
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:159 +0xee

goroutine 25 [IO wait, 969 minutes]:
internal/poll.runtime_pollWait(0x7fce3b9ffa50, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc0003159e0?, 0xce62ff2878?, 0x1)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).RawRead(0xc0003159e0, 0xc000088de0)
	/snap/go/9952/src/internal/poll/fd_unix.go:766 +0x145
os.(*rawConn).Read(0xc00011ed20, 0x203001?)
	/snap/go/9952/src/os/rawconn.go:31 +0x56
github.com/mdlayher/socket.(*Conn).read(0xc0001a8600, {0x18c6b6f?, 0x203000?}, 0xc0002af170)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/mdlayher/socket@v0.2.3/conn.go:576 +0xea
github.com/mdlayher/socket.(*Conn).Accept(0xc0001a8600, 0x0)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/mdlayher/socket@v0.2.3/conn.go:313 +0xe5
github.com/mdlayher/vsock.(*listener).Accept(0xc00045d300)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/mdlayher/vsock@v1.1.1/listener_linux.go:32 +0x2a
github.com/mdlayher/vsock.(*Listener).Accept(0xc00011ed28)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/mdlayher/vsock@v1.1.1/vsock.go:133 +0x25
crypto/tls.(*listener).Accept(0xc00000edc8)
	/snap/go/9952/src/crypto/tls/tls.go:66 +0x2d
net/http.(*Server).Serve(0xc0002a0380, {0x1be9d40, 0xc00000edc8})
	/snap/go/9952/src/net/http/server.go:3039 +0x385
github.com/lxc/lxd/lxd/endpoints.(*Endpoints).serve.func1()
	/build/lxd/parts/lxd/src/lxd/endpoints/endpoints.go:429 +0x25
gopkg.in/tomb%2ev2.(*Tomb).run(0xc00064e960, 0xc00018b7a0?)
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x36
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:159 +0xee

goroutine 36 [select, 969 minutes]:
database/sql.(*DB).connectionOpener(0xc00039ca90, {0x1becbb8, 0xc00038cfc0})
	/snap/go/9952/src/database/sql/sql.go:1226 +0x8d
created by database/sql.OpenDB
	/snap/go/9952/src/database/sql/sql.go:794 +0x18d

goroutine 50 [syscall, 969 minutes]:
os/signal.signal_recv()
	/snap/go/9952/src/runtime/sigqueue.go:151 +0x2f
os/signal.loop()
	/snap/go/9952/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/snap/go/9952/src/os/signal/signal.go:151 +0x2a

goroutine 27 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x7fce3b9ffb40, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000188a00?, 0xa6?, 0x0)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000188a00)
	/snap/go/9952/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000188a00)
	/snap/go/9952/src/net/fd_unix.go:172 +0x35
net.(*UnixListener).accept(0xc0012b5200?)
	/snap/go/9952/src/net/unixsock_posix.go:166 +0x1c
net.(*UnixListener).Accept(0xc0000b1ec0)
	/snap/go/9952/src/net/unixsock.go:260 +0x3d
github.com/lxc/lxd/lxd/endpoints/listeners.(*StarttlsListener).Accept(0xc0000b1f20)
	/build/lxd/parts/lxd/src/lxd/endpoints/listeners/starttls.go:36 +0x64
net/http.(*Server).Serve(0xc0002a00e0, {0x1beaaf0, 0xc0000b1f20})
	/snap/go/9952/src/net/http/server.go:3039 +0x385
github.com/lxc/lxd/lxd/endpoints.(*Endpoints).serve.func1()
	/build/lxd/parts/lxd/src/lxd/endpoints/endpoints.go:429 +0x25
gopkg.in/tomb%2ev2.(*Tomb).run(0xc00064e960, 0x0?)
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x36
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:159 +0xee

goroutine 28 [select, 969 minutes]:
database/sql.(*DB).connectionOpener(0xc0000b25b0, {0x1becbb8, 0xc00031ff80})
	/snap/go/9952/src/database/sql/sql.go:1226 +0x8d
created by database/sql.OpenDB
	/snap/go/9952/src/database/sql/sql.go:794 +0x18d

goroutine 393 [select, 969 minutes]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b0c8, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0xc000b00740?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 224 [syscall]:
syscall.Syscall(0x0, 0x1d, 0xc000599000, 0x1000)
	/snap/go/9952/src/syscall/asm_linux_amd64.s:20 +0x5
golang.org/x/sys/unix.read(0x16a43c0?, {0xc000599000?, 0x18c27cf?, 0x6?})
	/build/lxd/parts/lxd/src/.go/pkg/mod/golang.org/x/sys@v0.0.0-20220722155257-8c9f86f7a55f/unix/zsyscall_linux.go:1366 +0x4d
golang.org/x/sys/unix.Read(...)
	/build/lxd/parts/lxd/src/.go/pkg/mod/golang.org/x/sys@v0.0.0-20220722155257-8c9f86f7a55f/unix/syscall_unix.go:157
main.deviceNetlinkListener.func1(0xc0005aa0c0?, 0xc0005069c0?, 0xc000506a20?, 0xc000506a80?)
	/build/lxd/parts/lxd/src/lxd/devices.go:100 +0xa6
created by main.deviceNetlinkListener
	/build/lxd/parts/lxd/src/lxd/devices.go:97 +0x1ca

goroutine 325 [semacquire, 2 minutes]:
sync.runtime_SemacquireMutex(0x7fce000a3370?, 0xd8?, 0x0?)
	/snap/go/9952/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0x2bb8218)
	/snap/go/9952/src/sync/mutex.go:162 +0x165
sync.(*Mutex).Lock(...)
	/snap/go/9952/src/sync/mutex.go:81
github.com/lxc/lxd/lxd/device.unixRunHandlers(0xc00100add8?, 0xc000c49c08)
	/build/lxd/parts/lxd/src/lxd/device/device_utils_unix_events.go:94 +0x73
github.com/lxc/lxd/lxd/device.unixRegisterHandler.func1({0xc0013a8b90?, 0xc000c49df0?}, {0x18c48c9?, 0x10?})
	/build/lxd/parts/lxd/src/lxd/device/device_utils_unix_events.go:55 +0x91
github.com/lxc/lxd/lxd/fsmonitor/drivers.(*fanotify).getEvents(0xc0008cb9c0, 0xc0009ba300?)
	/build/lxd/parts/lxd/src/lxd/fsmonitor/drivers/driver_fanotify.go:200 +0xff3
created by github.com/lxc/lxd/lxd/fsmonitor/drivers.(*fanotify).load
	/build/lxd/parts/lxd/src/lxd/fsmonitor/drivers/driver_fanotify.go:67 +0x313

goroutine 2917 [syscall, 968 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x12e7, 0xc000340670, 0x1000004, 0x0, 0x0)
	/snap/go/9952/src/syscall/asm_linux_amd64.s:43 +0x5
os.(*Process).blockUntilWaitable(0xc000643b00)
	/snap/go/9952/src/os/wait_waitid.go:32 +0x9c
os.(*Process).wait(0xc000643b00)
	/snap/go/9952/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/snap/go/9952/src/os/exec.go:132
github.com/lxc/lxd/shared/subprocess.(*Process).start.func3()
	/build/lxd/parts/lxd/src/shared/subprocess/proc.go:173 +0x6c
created by github.com/lxc/lxd/shared/subprocess.(*Process).start
	/build/lxd/parts/lxd/src/shared/subprocess/proc.go:170 +0x699

goroutine 11811 [IO wait]:
internal/poll.runtime_pollWait(0x7fce3b9ff1e0, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000815100?, 0xc000056a00?, 0x0)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000815100)
	/snap/go/9952/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000815100)
	/snap/go/9952/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc000b99a58)
	/snap/go/9952/src/net/tcpsock_posix.go:139 +0x28
net.(*TCPListener).Accept(0xc000b99a58)
	/snap/go/9952/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0002a0460, {0x1bebd20, 0xc000b99a58})
	/snap/go/9952/src/net/http/server.go:3039 +0x385
github.com/lxc/lxd/lxd/endpoints.(*Endpoints).serve.func1()
	/build/lxd/parts/lxd/src/lxd/endpoints/endpoints.go:429 +0x25
gopkg.in/tomb%2ev2.(*Tomb).run(0xc00064e960, 0xc000c003f0?)
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x36
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/build/lxd/parts/lxd/src/.go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:159 +0xee

goroutine 323 [select]:
main.deviceEventListener(0xc0008a82c0)
	/build/lxd/parts/lxd/src/lxd/devices.go:538 +0x212
created by main.(*Daemon).init
	/build/lxd/parts/lxd/src/lxd/daemon.go:1428 +0x3ddf

goroutine 281 [syscall, 969 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x96c, 0xc000343e70, 0x1000004, 0x0, 0x0)
	/snap/go/9952/src/syscall/asm_linux_amd64.s:43 +0x5
os.(*Process).blockUntilWaitable(0xc000604ae0)
	/snap/go/9952/src/os/wait_waitid.go:32 +0x9c
os.(*Process).wait(0xc000604ae0)
	/snap/go/9952/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/snap/go/9952/src/os/exec.go:132
github.com/lxc/lxd/shared/subprocess.(*Process).start.func3()
	/build/lxd/parts/lxd/src/shared/subprocess/proc.go:173 +0x6c
created by github.com/lxc/lxd/shared/subprocess.(*Process).start
	/build/lxd/parts/lxd/src/shared/subprocess/proc.go:170 +0x699

goroutine 324 [chan receive, 969 minutes]:
github.com/lxc/lxd/lxd/fsmonitor/drivers.(*fanotify).load.func1()
	/build/lxd/parts/lxd/src/lxd/fsmonitor/drivers/driver_fanotify.go:62 +0x32
created by github.com/lxc/lxd/lxd/fsmonitor/drivers.(*fanotify).load
	/build/lxd/parts/lxd/src/lxd/fsmonitor/drivers/driver_fanotify.go:61 +0x2c5

goroutine 394 [select, 969 minutes]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b0f8, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 348 [IO wait, 969 minutes]:
internal/poll.runtime_pollWait(0x7fce3b9ff4b0, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000922600?, 0x0?, 0x0)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000922600)
	/snap/go/9952/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000922600)
	/snap/go/9952/src/net/fd_unix.go:172 +0x35
net.(*UnixListener).accept(0x410dbd?)
	/snap/go/9952/src/net/unixsock_posix.go:166 +0x1c
net.(*UnixListener).Accept(0xc000b0a5a0)
	/snap/go/9952/src/net/unixsock.go:260 +0x3d
github.com/lxc/lxd/lxd/seccomp.NewSeccompServer.func1()
	/build/lxd/parts/lxd/src/lxd/seccomp/seccomp.go:1052 +0x55
created by github.com/lxc/lxd/lxd/seccomp.NewSeccompServer
	/build/lxd/parts/lxd/src/lxd/seccomp/seccomp.go:1050 +0x20a

goroutine 395 [select, 9 minutes]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b128, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 396 [select, 969 minutes]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b158, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0xc000b1a120?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 397 [select, 9 minutes]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b188, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 398 [select]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b1b8, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 399 [select]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b1e8, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 400 [select]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b218, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 401 [select]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b248, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 418 [select, 969 minutes]:
github.com/lxc/lxd/lxd/task.(*Task).loop(0xc00038b278, {0x1becbb8, 0xc000b007c0})
	/build/lxd/parts/lxd/src/lxd/task/task.go:68 +0x15f
github.com/lxc/lxd/lxd/task.(*Group).Start.func1(0x0?)
	/build/lxd/parts/lxd/src/lxd/task/group.go:59 +0x3d
created by github.com/lxc/lxd/lxd/task.(*Group).Start
	/build/lxd/parts/lxd/src/lxd/task/group.go:58 +0x2f3

goroutine 3430 [syscall, 968 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x140f, 0xc000074e70, 0x1000004, 0x0, 0x0)
	/snap/go/9952/src/syscall/asm_linux_amd64.s:43 +0x5
os.(*Process).blockUntilWaitable(0xc0003997a0)
	/snap/go/9952/src/os/wait_waitid.go:32 +0x9c
os.(*Process).wait(0xc0003997a0)
	/snap/go/9952/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/snap/go/9952/src/os/exec.go:132
github.com/lxc/lxd/shared/subprocess.(*Process).start.func3()
	/build/lxd/parts/lxd/src/shared/subprocess/proc.go:173 +0x6c
created by github.com/lxc/lxd/shared/subprocess.(*Process).start
	/build/lxd/parts/lxd/src/shared/subprocess/proc.go:170 +0x699

goroutine 3607904 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x7fce3b9ff0f0, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000a45000?, 0xc000a47000?, 0x0)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc000a45000, {0xc000a47000, 0x1000, 0x1000})
	/snap/go/9952/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000a45000, {0xc000a47000?, 0xc000ac5db0?, 0x0?})
	/snap/go/9952/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc00045fd58, {0xc000a47000?, 0xc000c0ef98?, 0x8000dd?})
	/snap/go/9952/src/net/net.go:183 +0x45
bufio.(*Reader).Read(0xc000ab4780, {0xc000bea1c1, 0x1, 0x1000000010000?})
	/snap/go/9952/src/bufio/bufio.go:236 +0x1b4
github.com/lxc/lxd/lxd/endpoints/listeners.BufferedUnixConn.Read(...)
	/build/lxd/parts/lxd/src/lxd/endpoints/listeners/starttls.go:94
net/http.(*connReader).backgroundRead(0xc000bea1b0)
	/snap/go/9952/src/net/http/server.go:672 +0x3f
created by net/http.(*connReader).startBackgroundRead
	/snap/go/9952/src/net/http/server.go:668 +0xca

goroutine 3608159 [IO wait]:
internal/poll.runtime_pollWait(0x7fce3b9ff3c0, 0x72)
	/snap/go/9952/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000a30a80?, 0xc000b6e281?, 0x0)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/snap/go/9952/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc000a30a80, {0xc000b6e281, 0x1, 0x1})
	/snap/go/9952/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000a30a80, {0xc000b6e281?, 0xc0000880e0?, 0x0?})
	/snap/go/9952/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0006b2010, {0xc000b6e281?, 0xc00004a0c0?, 0x0?})
	/snap/go/9952/src/net/net.go:183 +0x45
net/http.(*connReader).backgroundRead(0xc000b6e270)
	/snap/go/9952/src/net/http/server.go:672 +0x3f
created by net/http.(*connReader).startBackgroundRead
	/snap/go/9952/src/net/http/server.go:668 +0xca

goroutine 3607915 [semacquire, 2 minutes]:
sync.runtime_SemacquireMutex(0xc0002d0460?, 0x4?, 0xc001258010?)
	/snap/go/9952/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc0008cb9d0)
	/snap/go/9952/src/sync/mutex.go:162 +0x165
sync.(*Mutex).Lock(...)
	/snap/go/9952/src/sync/mutex.go:81
github.com/lxc/lxd/lxd/fsmonitor/drivers.(*common).Unwatch(0xc0008cb9c0, {0xc000bfed98, 0x14}, {0xc00057e018, 0x17})
	/build/lxd/parts/lxd/src/lxd/fsmonitor/drivers/common.go:61 +0x76
github.com/lxc/lxd/lxd/fsmonitor.(*fsMonitor).Unwatch(0xc000794640, {0xc000bfed98, 0x14}, {0xc00057e018, 0x17})
	/build/lxd/parts/lxd/src/lxd/fsmonitor/fsmonitor.go:31 +0x119
github.com/lxc/lxd/lxd/device.unixUnregisterHandler(0xc0008ee790, {0x1c04d88, 0xc0005e8b40}, {0xc00047c540, 0x14})
	/build/lxd/parts/lxd/src/lxd/device/device_utils_unix_events.go:84 +0x2fb
github.com/lxc/lxd/lxd/device.(*unixCommon).Stop(0xc000c7f270)
	/build/lxd/parts/lxd/src/lxd/device/unix_common.go:205 +0x4d
github.com/lxc/lxd/lxd/instance/drivers.(*lxc).deviceStop(0xc0005e8b40, {0x7fce391827e0, 0xc000c7f270}, 0x1, {0x0, 0x0})
	/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_lxc.go:1500 +0x1c5
github.com/lxc/lxd/lxd/instance/drivers.(*common).devicesUpdate(0xc0005e8b40, {0x1c04d88, 0xc0005e8b40}, 0x4?, 0xc000ac1a40?, 0x18c4f01?, 0x6?, 0x1, 0x1)
	/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_common.go:1272 +0x458
github.com/lxc/lxd/lxd/instance/drivers.(*lxc).Update(_, {0x0, {0x0, 0x0}, 0x0, 0x0, {0xc000b50ef8, 0x7}, {0x0, 0x0}, ...}, ...)
	/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_lxc.go:4199 +0x1932
main.doProfileUpdateInstance(_, {_, _}, {_, {_, _}, _}, {_, _}, {0xe, ...})
	/build/lxd/parts/lxd/src/lxd/profiles_utils.go:229 +0x4c2
main.doProfileUpdate(0xc0004d6000, {0x18c7317, 0x7}, {0xc00004ce52, 0x13}, 0x7856e2?, 0xc00003d450, {0xc00154ba70, {0x0, 0x0}, ...})
	/build/lxd/parts/lxd/src/lxd/profiles_utils.go:147 +0x8a6
main.profilePut(0xc0004d6000, 0xc000194500)
	/build/lxd/parts/lxd/src/lxd/profiles.go:501 +0x686
main.(*Daemon).createCmd.func1.3({0x19de768?, 0xc00011abe8?, 0x0?})
	/build/lxd/parts/lxd/src/lxd/daemon.go:679 +0xef
main.(*Daemon).createCmd.func1({0x1bebf60, 0xc0003bc0e0}, 0xc000194500)
	/build/lxd/parts/lxd/src/lxd/daemon.go:688 +0x15b9
net/http.HandlerFunc.ServeHTTP(0xc000194200?, {0x1bebf60?, 0xc0003bc0e0?}, 0x18bd0bc?)
	/snap/go/9952/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0005a6240, {0x1bebf60, 0xc0003bc0e0}, 0xc000194000)
	/build/lxd/parts/lxd/src/.go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
main.(*lxdHttpServer).ServeHTTP(0xc00045ca40, {0x1bebf60, 0xc0003bc0e0}, 0xc000194000)
	/build/lxd/parts/lxd/src/lxd/api.go:170 +0xdc
net/http.serverHandler.ServeHTTP({0x1be8bd8?}, {0x1bebf60, 0xc0003bc0e0}, 0xc000194000)
	/snap/go/9952/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000a365a0, {0x1becc60, 0xc001463fb0})
	/snap/go/9952/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/snap/go/9952/src/net/http/server.go:3071 +0x4db

So a couple of things:

  1. The unix-char and unix-block device types already monitors for devices appearing/disapearing on the host system and adds/removes the device automatically if used with the required: false setting. See https://linuxcontainers.org/lxd/docs/master/reference/devices_unix_char/#hotplugging . I’m not sure whether this is in LXD 4.0 LTS, but it is in LXD 5.0 LTS series. So perhaps you don’t need your additional daemon at all on those systems.
  2. Further more that device monitoring subsystem was recently reworked to be more efficient and correct (https://github.com/lxc/lxd/pull/11152, ), and there was a very recent bug fix to prevent deadlocks (https://github.com/lxc/lxd/pull/11235).

These changes will be in the soon to be released LXD 5.0.2 LTS, so you may find that the issue is fixed then.

From your stack trace I can see some go routines blocking on the mutex fixed in that deadlock PR, so its likely it will fix the issue.

But once you’re running LXD 5.0.2 please let us know if it still occurs. Thanks.

Thanks, @tomp !

I do use required: false for unix-char and unix-block devices, but I still need the daemon. The devices that I’m managing for my containers move around nondeterministically.

How straightforward or worthwhile would it be for me to try to get a later (soon to be 5.0.2) lxd going on my ubuntu 22.04 to test out the fix in advance?

Also, will these fixes be rolled out onto ubuntu 20.04?

LXD isn’t part of any Ubuntu release, so no, it won’t be part of Ubuntu 20.04 (or 22.04 for that matter).

If you know the source file exists at the time you add the device you could try setting required: true to avoid LXD trying to manage the dynamic add/remove. That may work around your issue now.

However LXD 5.0.2 should be out this week, and that can be installed on Ubuntu Jammy or Focal.
But I don’t think the fixes will make it into the LXD 4.0 LTS series as that is only getting critical bugs and security updates now.

Thank you again, @tomp, for the guidance!

Setting required: true indeed works around my issue, and I don’t think I actually need LXD’s dynamic add/remove management for this particular use case, so this is a great solution for me!

When LXD 5.0.2 is available, I’ll be happy to retest with required: false, just to add another datapoint to whether the deadlock issue has been resolved, since I have a scenario that repros the issue quite readily.

1 Like