As discussed here, I’m unable to run LXD over the past couple weeks, perhaps coinciding with the update to LXD 3.15 (though I’m not sure). Running any LXC command on the command line simply hangs, and never executes. My cluster has three nodes, and all appear affected by the same issue. Looking at active connections on each machine, they have massive numbers apparently communicating with each other. For instance:
**aaron@codewerks-alpha** :$ sudo netstat -antpl |grep 8443 |grep ESTABLISHED |grep lxd | wc -l 9297 **aaron@codewerks-baker** : $ sudo netstat -antpl |grep 8443 |grep ESTABLISHED |grep lxd | wc -l 8815 **aaron@codewerks-charlie** : $ sudo netstat -antpl |grep 8443 |grep ESTABLISHED |grep lxd | wc -l 9643
These numbers can get over 100K when left long enough. On the advice of a helpful fellow on that Github issue, I wrote a script to kick LXD over and try fresh. This is effective in rubbing out the connections, but they begin exploding in size again, and I’m never able to run any LXC command.
I can see from other posts here that 3.15 is looking particularly problematic? But I’m at my wits’ end here. My cluster isn’t in production yet but it’s killing my ability to build my application and I have no idea what to do to fix this. Any suggestions for troubleshooting steps would be quite welcome!