nordex1
(David)
July 7, 2021, 3:23pm
1
I’ve been googling for a while about this error, and there are indeed some topics that describe it but after reading all of it, I am even more confused
When I try to start any of three containers that I have, the following error comes.
root@brix:/# lxc start kubemaster
Error: Failed to create file "/var/snap/lxd/common/lxd/virtual-machines/kubemaster/backup.yaml": open /var/snap/lxd/common/lxd/virtual-machines/kubemaster/backup.yaml: read-only file system
Try `lxc info --show-log kubemaster` for more info
The lxc info --show-log kubemaster
really gives me no clue what is happening, but as someone pointed out in some of the posts here, the problem might be the free space, and really, the loop devices are 100% full:
root@brix:/# df -H
Filesystem Size Used Avail Use% Mounted on
udev 4.1G 0 4.1G 0% /dev
tmpfs 815M 1.4M 814M 1% /run
/dev/sda4 112G 27G 79G 26% /
tmpfs 4.1G 0 4.1G 0% /dev/shm
tmpfs 5.3M 0 5.3M 0% /run/lock
tmpfs 4.1G 0 4.1G 0% /sys/fs/cgroup
/dev/sda1 536M 8.3M 528M 2% /boot/efi
/dev/sda2 5.3G 22M 5.0G 1% /home
/dev/loop1 71M 71M 0 100% /snap/lxd/20326
/dev/loop2 74M 74M 0 100% /snap/lxd/19647
/dev/loop0 59M 59M 0 100% /snap/core18/2066
/dev/loop3 59M 59M 0 100% /snap/core18/2074
/dev/loop4 34M 34M 0 100% /snap/snapd/12398
/dev/loop5 34M 34M 0 100% /snap/snapd/12159
tmpfs 1.1M 0 1.1M 0% /var/snap/lxd/common/ns
tmpfs 815M 0 815M 0% /run/user/1000
lsblk gives the following output:
root@brix:/# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.4M 1 loop /snap/core18/2066
loop1 7:1 0 67.6M 1 loop /snap/lxd/20326
loop2 7:2 0 70.4M 1 loop /snap/lxd/19647
loop3 7:3 0 55.5M 1 loop /snap/core18/2074
loop4 7:4 0 32.3M 1 loop /snap/snapd/12398
loop5 7:5 0 32.3M 1 loop /snap/snapd/12159
loop6 7:6 0 18.6G 0 loop
├─lvm_default-LXDThinPool_tmeta 253:0 0 1G 0 lvm
│ └─lvm_default-LXDThinPool-tpool 253:2 0 16.6G 0 lvm
│ ├─lvm_default-LXDThinPool 253:3 0 16.6G 1 lvm
│ ├─lvm_default-virtual--machines_kubemaster.block 253:4 0 18.6G 0 lvm
│ ├─lvm_default-virtual--machines_kubemaster 253:5 0 96M 0 lvm
│ ├─lvm_default-virtual--machines_node1.block 253:6 0 37.3G 0 lvm
│ ├─lvm_default-virtual--machines_node2.block 253:7 0 37.3G 0 lvm
│ ├─lvm_default-virtual--machines_node1 253:8 0 96M 0 lvm
│ └─lvm_default-virtual--machines_node2 253:9 0 96M 0 lvm
└─lvm_default-LXDThinPool_tdata 253:1 0 16.6G 0 lvm
└─lvm_default-LXDThinPool-tpool 253:2 0 16.6G 0 lvm
├─lvm_default-LXDThinPool 253:3 0 16.6G 1 lvm
├─lvm_default-virtual--machines_kubemaster.block 253:4 0 18.6G 0 lvm
├─lvm_default-virtual--machines_kubemaster 253:5 0 96M 0 lvm
├─lvm_default-virtual--machines_node1.block 253:6 0 37.3G 0 lvm
├─lvm_default-virtual--machines_node2.block 253:7 0 37.3G 0 lvm
├─lvm_default-virtual--machines_node1 253:8 0 96M 0 lvm
└─lvm_default-virtual--machines_node2 253:9 0 96M 0 lvm
sda 8:0 0 119.2G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
├─sda2 8:2 0 5G 0 part /home
├─sda3 8:3 0 8G 0 part [SWAP]
└─sda4 8:4 0 105.8G 0 part /
Questions:
Why are the sizes of the loop devices 1-5 so small 32-70 megabytes?
Is space on the loop6 dynamic?
If space on the loop6 is dynamic, why then the loops 1-5 are not dynamic as well?
I have currently 2 snapshots for each container so 6 snapshots altogether, is this problem related to the number of snapshots?
What is the best way to aproach and solve this problem, and start the containers?
tomp
(Thomas Parrott)
July 7, 2021, 3:42pm
2
Can you show output of sudo lvs
and sudo vgs
please.
nordex1
(David)
July 7, 2021, 7:34pm
3
Sure:
root@brix:/# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
LXDThinPool lvm_default twi-aotz-- 16.62g 98.83 2.52
virtual-machines_kubemaster lvm_default Vwi-aotz-k 96.00m LXDThinPool 9.11
virtual-machines_kubemaster-kubemaster1 lvm_default Vri---tz-k 96.00m LXDThinPool virtual-machines_kubemaster
virtual-machines_kubemaster-kubemaster1.block lvm_default Vri---tz-k <18.63g LXDThinPool
virtual-machines_kubemaster-kubemaster2 lvm_default Vri---tz-k 96.00m LXDThinPool virtual-machines_kubemaster
virtual-machines_kubemaster-kubemaster2.block lvm_default Vri---tz-k <18.63g LXDThinPool virtual-machines_kubemaster.block
virtual-machines_kubemaster.block lvm_default Vwi-aotz-k <18.63g LXDThinPool virtual-machines_kubemaster-kubemaster1.block 25.14
virtual-machines_node1 lvm_default Vwi-aotz-k 96.00m LXDThinPool 9.05
virtual-machines_node1-node1 lvm_default Vri---tz-k 96.00m LXDThinPool virtual-machines_node1
virtual-machines_node1-node1--2 lvm_default Vri---tz-k 96.00m LXDThinPool virtual-machines_node1
virtual-machines_node1-node1--2.block lvm_default Vri---tz-k 37.25g LXDThinPool virtual-machines_node1.block
virtual-machines_node1-node1.block lvm_default Vri---tz-k 37.25g LXDThinPool
virtual-machines_node1.block lvm_default Vwi-aotz-k 37.25g LXDThinPool virtual-machines_node1-node1.block 12.02
virtual-machines_node2 lvm_default Vwi-aotz-k 96.00m LXDThinPool 8.98
virtual-machines_node2-node2 lvm_default Vri---tz-k 96.00m LXDThinPool virtual-machines_node2
virtual-machines_node2-node2--2 lvm_default Vri---tz-k 96.00m LXDThinPool virtual-machines_node2
virtual-machines_node2-node2--2.block lvm_default Vri---tz-k 37.25g LXDThinPool virtual-machines_node2.block
virtual-machines_node2-node2.block lvm_default Vri---tz-k 37.25g LXDThinPool
virtual-machines_node2.block lvm_default Vwi-aotz-k 37.25g LXDThinPool virtual-machines_node2-node2.block 12.70
root@brix:/# vgs
VG #PV #LV #SN Attr VSize VFree
lvm_default 1 19 0 wz--n- 18.62g 0
root@brix:/#
stgraber
(Stéphane Graber)
July 8, 2021, 12:45am
4
The loop devices are the LXD snap itself, they are read-only volumes so it’s correct that they report 100% usage. It’d be a bug if they weren’t.
The unexpected read-only
output could be coming from a kernel error, what’s the output of dmesg
?
nordex1
(David)
July 9, 2021, 7:13am
5
Confirmed disk failure.
Somehow I subconsciously decided to omit this fact, although I did check dmesg, because the disk is relatively new.
Hopefully I can (learn how to) export and import the VMs to the new disk.
dmesg errors:
Jul 1 13:19:16 brix kernel: [66671.001588] Buffer I/O error on device dm-9, logical block 10241
Jul 1 13:19:22 brix kernel: [66676.868411] JBD2: Detected IO errors while flushing file data on dm-5-8
Jul 1 16:14:33 brix kernel: [ 9762.421975] Buffer I/O error on dev dm-5, logical block 539, lost async page write
Jul 2 15:44:49 brix kernel: [94377.289883] Buffer I/O error on dev dm-8, logical block 539, lost async page write
These dm devices are indeed related to the LXD containers:
root@brix:/# dmsetup deps -o devname /dev/dm-*
/dev/dm-0: 1 dependencies : (loop6)
/dev/dm-1: 1 dependencies : (loop6)
/dev/dm-2: 2 dependencies : (lvm_default-LXDThinPool_tdata) (lvm_default-LXDThinPool_tmeta)
/dev/dm-3: 1 dependencies : (lvm_default-LXDThinPool-tpool)
/dev/dm-4: 1 dependencies : (lvm_default-LXDThinPool-tpool)
/dev/dm-5: 1 dependencies : (lvm_default-LXDThinPool-tpool)
/dev/dm-6: 1 dependencies : (lvm_default-LXDThinPool-tpool)
/dev/dm-7: 1 dependencies : (lvm_default-LXDThinPool-tpool)
/dev/dm-8: 1 dependencies : (lvm_default-LXDThinPool-tpool)
/dev/dm-9: 1 dependencies : (lvm_default-LXDThinPool-tpool)
root@brix:/#
After the disk check I got another error:
root@brix:/# lxc start kubemaster
Error: virtiofsd failed to bind socket within 10s
Try `lxc info --show-log kubemaster` for more info
root@brix:/# lxc info --show-log kubemaster
Name: kubemaster
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/06/22 10:03 UTC
Status: Stopped
Type: virtual-machine
Profiles: default
Pid: 2802
Resources:
Processes: 0
Disk usage:
root: 4.29GB
Snapshots:
kubemaster1 (taken at 2021/06/24 18:48 UTC) (stateless)
kubemaster2 (taken at 2021/06/30 18:26 UTC) (stateless)
Error: open /var/snap/lxd/common/lxd/logs/kubemaster/qemu.log: no such file or directory
root@brix:/#
Here are some additional information from SMART
root@brix:/# smartctl -A /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-77-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 --- Old_age Always - 9412
12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 2046
170 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
171 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
173 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 24
174 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 117
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 --- Old_age Always - 0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 010 Pre-fail Always - 100
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
194 Temperature_Celsius 0x0022 063 063 --- Old_age Always - 37 (Min/Max 21/63)
199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0
233 Media_Wearout_Indicator 0x0033 094 100 001 Pre-fail Always - 15725736
234 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 8014
241 Total_LBAs_Written 0x0030 253 253 --- Old_age Offline - 5376
242 Total_LBAs_Read 0x0030 253 253 --- Old_age Offline - 3945
249 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 2904
root@brix:/#
Thank you for your support. Keep up the good work!