I want to be able to track how close an instance is to filling its assigned disk quota in grafana.
On a previous cluster (running LXD), I could easily track this with the metrics:
This system was using LVM for it’s storage pools and the metric for each instance would accurately reflect how much space was remaining on the instance, making it simple to find out if an instance had run out of its assigned quota.
We now have a new cluster running incus and this time the storage backend is BTFS. Running the same query as above (With the lxd_ → incus_ substitution), the graph is wildly off, seemingly showing that each instance has the ability to consume the entire disk and summing disk usage across all instances. I can confirm that disk quotas are working as several instances have hit our lower limits and needed increments, but unfortunately we cannot track or alert on this as we could previously.
Is there a better way to track the disk usage and available quota per instance in the metrics? and is this just a limitation of BTFS, or does it also apply to any of the other storage backends?
Context:
incus v6.0
default storage pool is local disk on each cluster member
If you have BTRFS, incus is utilising subvolumes of BTRFS. So you can just monitor these subvolumes using tools related to to BTRFS itself, without any incus interference.
Do you have any advice or suggestions for tools that could help with that?
Ideally I want something that can export this information to our monitoring system, so we can track the usage across our instances and clusters.
A quick search online is telling me that I can run btrfs qgroup show -reF /path/to/Subvolume on a machine to see the current usage and limit on a subvolume (command listed on ArchWIki- BTRFS for checking quota of a subvolume).
But that doesn’t integrate nicely into any metrics system:
I’d have to add custom metrics exporters onto any cluster members with BTRFS storage pools.
Adjust all dashboards to try and determine whether an instance is using a storage method with viable disk usage reporting in incus metrics, or whether it’s using BTRFS and thus needs to use data from another source
I have similar requirements and I don’t think such thing exists.
I am using container snapshots which is backed by BTRFS subvolume. When showing disk usage, I would like to organize a running container and its snapshots into one qgroup. However, incus is not helping anything like that, BTRFS has no direct knowledge for that. It might be worth to write some scripts for that.
Edit: Spend few hours with ChatGPT to come with this:
incus_qgroup.sh
#!/bin/bash
if [ "$#" -lt 1 ]; then
echo "Usage: $0 <btrfs_path> [--debug]"
exit 1
fi
declare -A instances_map
declare -A current_qgroup_map
declare -A targeted_qgroup_map
declare -A qgroup_size_map
btrfs_path="$1"
debug=false
if [ "$#" -eq 2 ] && [ "$2" == "--debug" ]; then
debug=true
fi
output=$(sudo btrfs qgroup show -c "$btrfs_path")
lines=()
while IFS= read -r line; do
lines+=("$line")
done <<< "$output"
for line in "${lines[@]}"; do
read -r group_id referenced_size exclusive_size children path <<< "$line"
# Check incus containers or virtual-machines or custom images
if [[ $path == @incus-*/containers* ]] || [[ $path == @incus-*/virtual-machines* ]] || [[ $path == @incus-*/custom* ]]; then
IFS='/' read -r -a path_segments <<< "$path"
instance_name=${path_segments[2]}
if [[ -n ${instances_map[$instance_name]} ]]; then
IFS=' ' read -r -a entries <<< "${instances_map[$instance_name]}"
else
entries=()
fi
entry="$group_id:$path:$referenced_size:$exclusive_size"
# Put instance subvolume at the beginning of array
if [[ ${path_segments[1]} == *-snapshots ]]; then
entries+=("$entry")
else
entries=("$entry" "${entries[@]}")
fi
instances_map[$instance_name]="${entries[*]}"
fi
# Check incus qgroup
if [[ $group_id == */0 ]]; then
IFS=',' read -r -a groups <<< "$children"
current_qgroup_map[$group_id]="${groups[*]}"
qgroup_size_map[$group_id]="$referenced_size"
fi
done
for instance in "${!instances_map[@]}"; do
children=()
IFS=' ' read -r -a entries <<< "${instances_map[$instance]}"
for entry in "${entries[@]}"; do
IFS=':' read -r group_id path <<< "$entry"
children+=("$group_id")
done
first_child="${children[0]}"
qgroup_id="${first_child#0/}/0" # Convert 0/number to number/0
qgroup_size="${qgroup_size_map[$qgroup_id]}"
targeted_qgroup_map[$qgroup_id]="${children[*]}"
echo "Instance: $instance [$qgroup_size]"
for entry in "${entries[@]}"; do
IFS=':' read -r group_id path referenced_size exclusive_size <<< "$entry"
echo " $group_id: $path [$referenced_size, $exclusive_size]"
done
done
if [ "$debug" = true ]; then
echo ""
echo "Current Qgroup Map:"
for qgroup in "${!current_qgroup_map[@]}"; do
IFS=' ' read -r -a children <<< "${current_qgroup_map[$qgroup]}"
echo " $qgroup: ${children[*]}"
done
echo ""
echo "Targeted Qgroup Map:"
for qgroup in "${!targeted_qgroup_map[@]}"; do
IFS=' ' read -r -a children <<< "${targeted_qgroup_map[$qgroup]}"
echo " $qgroup: ${children[*]}"
done
echo ""
echo "BTRFS Commands:"
fi
btrfs_changed=false
for qgroup in "${!current_qgroup_map[@]}"; do
if [[ -z ${targeted_qgroup_map["$qgroup"]} ]]; then
IFS=' ' read -r -a children <<< "${current_qgroup_map[$qgroup]}"
for child in "${children[@]}"; do
sudo btrfs qgroup remove --no-rescan $child $qgroup $btrfs_path
$debug && echo " btrfs qgroup remove $child $qgroup $btrfs_path"
done
sudo btrfs qgroup destroy $qgroup $btrfs_path 2> /dev/null
$debug && echo " btrfs qgroup destroy $qgroup $btrfs_path"
btrfs_changed=true
fi
done
for qgroup in "${!targeted_qgroup_map[@]}"; do
if [[ -z ${current_qgroup_map[$qgroup]} ]]; then
sudo btrfs qgroup create $qgroup $btrfs_path
$debug && echo " btrfs qgroup create $qgroup $btrfs_path"
fi
IFS=' ' read -r -a current_children <<< "${current_qgroup_map[$qgroup]}"
IFS=' ' read -r -a targeted_children <<< "${targeted_qgroup_map[$qgroup]}"
for child in "${targeted_children[@]}"; do
if [[ ! " ${current_children[@]} " =~ " ${child} " ]]; then
sudo btrfs qgroup assign --no-rescan $child $qgroup $btrfs_path 2> /dev/null
$debug && echo " btrfs qgroup assign $child $qgroup $btrfs_path"
btrfs_changed=true
fi
done
done
if [ "$btrfs_changed" = true ]; then
sudo btrfs quota rescan $btrfs_path
$debug && echo " btrfs quota rescan $btrfs_path"
else
$debug && echo " (No commands executed)"
fi