I have a LXD cluster composed of 32 nodes and connected with LXD clustering feature
It was working fine I remember
I didn’t touch it for two days and I typed
lxc ls
today and I got
Error: disk I/O error
.
Then I checked journal log and I got (Really long. x100 of these)
May 23 19:05:12 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:12+0000
May 23 19:05:13 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:13+0000
May 23 19:05:13 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:13+0000
May 23 19:05:14 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:14+0000
May 23 19:05:15 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:15+0000
May 23 19:05:15 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:15+0000
May 23 19:05:16 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:16+0000
May 23 19:05:16 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:16+0000
May 23 19:05:17 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:17+0000
May 23 19:05:18 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:18+0000
May 23 19:05:18 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:18+0000
May 23 19:05:19 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:19+0000
May 23 19:05:20 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:20+0000
May 23 19:05:20 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:20+0000
May 23 19:05:21 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:21+0000
May 23 19:05:22 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:22+0000
May 23 19:05:22 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T19:05:22+0000
May 23 19:05:23 node00 lxd[12790]: lvl=warn msg="Failed to get current cluster nodes: failed to fecth nodes: disk I/O error" t=2018-05-23T19:05:23+0000
What can cause this disk I/O problem?
I’ve done S.M.A.R.T result after disk test and I couldn’t find any errors
Also, sudo zpool status
does not show any problems too
This log below is the very start of the problem
May 22 18:25:12 node00 lxd[12790]: lvl=warn msg="Raft: Failed to contact 2 in 1.51905941s" t=2018-05-22T18:25:12+0000
May 23 18:06:26 node00 lxd[12790]: lvl=warn msg="Raft: Failed to contact 2 in 1.500121466s" t=2018-05-23T18:06:26+0000
May 23 18:06:26 node00 lxd[12790]: lvl=warn msg="Raft: Failed to contact 2 in 1.560505846s" t=2018-05-23T18:06:26+0000
May 23 18:06:26 node00 lxd[12790]: lvl=warn msg="Raft: Failed to contact 3 in 1.500327995s" t=2018-05-23T18:06:26+0000
May 23 18:06:26 node00 lxd[12790]: lvl=warn msg="Raft: Failed to contact quorum of nodes, stepping down" t=2018-05-23T18:06:26+0000
May 23 18:06:32 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T18:06:32+0000
May 23 18:06:32 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T18:06:32+0000
May 23 18:06:33 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T18:06:33+0000
May 23 18:06:33 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T18:06:33+0000
May 23 18:06:34 node00 lxd[12790]: lvl=warn msg="failed to rollback transaction after error (failed to fecth nodes: disk I/O error): cannot rollback - no transaction is active" t=2018-05-23T18:06:34+0000
Help me!