Lxc export backups bad deduplication

Tests with Borg backup on two fresh empty containers quickly filled up hetzners backup storage space, there i thought borgs deduplication would take care of it.
Both test containers are freshly launched ubuntu 22.04’s and both empty.

Read that creating a tarball with --sort=name could help borg’s dedup, is the lxc export function doing this?
Forgive me if i sound stupid, just recently started collecting some environmental data that i would like not to loose and manual backups do not cut it anymore hence learning borg.

What I read about borg is that is does file chunking before doing deduplication. According to man 1 tar:

Using --sort=name ensures the member ordering in the created archive is uniform and reproducible.

Sounds like something that could help with dedup, indeed. However, this sorting is done per directory so if one has a huge directory (think a few hundred thousands files), I would worry about the performance hit during the lxc export.

That said, it might be interesting to check how much space saving you can get by manually redoing those exports and have them sorted by name. If you get very good numbers, maybe tar could be made configurable for exports.

Run a few more tests and i get close to 9% better deduplication when sorting.
Having an option for exports could be useful.

Lacking a lot knowledge and understanding, though one of the things i now learned is that --optimized-storage creates exports faster but the tradeoff is that it needs a lot more borg backup space which i did not realize when i made this post.

Then seeing how this two fresh ubuntu 22.04 containers which are doing nothing go from taking up 1.01GB on first cycle to 1.38GB after 10 backup cycles where almost no data has changed, except for shutdowns before export and restarts afterwards baffled me a bit that borg needs so much overhead with time.

The part in the script that is doing the sorting looks like this without variables to distinguish the minute apart backups,

lxc export c1 /backups/c1.tar --compression=none
tar -xpf /backups/c1.tar -C /backups
rm /backups/c1.tar
find /backups/ -printf "%P\n" -type f -o -type l -o -type d | tar -cpf "/backups/c1-sorted.tar" --sort=name --no-recursion -C /backups/ -T -
borg create --compression lz4  blah blah blah /backups/
borg prune blah blah

I hope i am doing the right thing in my script, after all i was able to rsync an extracted container to another machine and restore it there functioning as far as i can tell.

Tests follow:

export c1 /backups/c1.tar --compression=none
–sort=name stats follow without --optimized-storage option
First container

Number of files: 1
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.70 GB            868.01 MB            859.66 MB
All archives:                1.70 GB            868.01 MB            859.69 MB

                       Unique chunks         Total chunks
Chunk index:                     619                  628
------------------------------------------------------------------------------

After Second Container,

Number of files: 1
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.66 GB            856.63 MB            150.38 MB
All archives:                3.36 GB              1.72 GB              1.01 GB

                       Unique chunks         Total chunks
Chunk index:                     798                 1239
------------------------------------------------------------------------------


After 10 cycles of sorted name backups,

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.67 GB            859.63 MB             15.97 MB
All archives:               33.73 GB             17.28 GB              1.38 GB

                       Unique chunks         Total chunks
Chunk index:                    1352                12382
------------------------------------------------------------------------------

export c1 /backups/c1.tar --compression=none
Results without --sort-name and without --optimized-storage option

One container.

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.72 GB            868.29 MB            859.60 MB
All archives:                1.72 GB            868.29 MB            859.63 MB

                       Unique chunks         Total chunks
Chunk index:                     618                  628
------------------------------------------------------------------------------

second container,

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.66 GB            856.90 MB            134.91 MB
All archives:                3.38 GB              1.73 GB            994.57 MB

                       Unique chunks         Total chunks
Chunk index:                     792                 1235
------------------------------------------------------------------------------

After 10 cycles,

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.67 GB            859.90 MB             23.62 MB
All archives:               33.95 GB             17.29 GB              1.51 GB

                       Unique chunks         Total chunks
Chunk index:                    1479                12394
------------------------------------------------------------------------------

lxc export c1 /backups/c1.tar --compression=none --optimized-storage
(Cycle of 10 runs, from zfs to ext4)

------------------------------------------------------------------------------                                                                                                                                                                                
                       Original size      Compressed size    Deduplicated size                                                                                                                                                                                
This archive:                1.79 GB            873.84 MB            873.84 MB                                                                                                                                                                                
All archives:               36.09 GB             17.57 GB             17.57 GB
                                                                                                                                
                       Unique chunks         Total chunks                                                                       
Chunk index:                   12924                12933                                                                       
------------------------------------------------------------------------------ 

lxc export c1 /tank/test/c1.tar --compression=none --optimized-storage
(Cycle of 10 runs, , on zfs)

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.17 GB            641.93 MB            641.93 MB
All archives:               30.08 GB             15.28 GB             15.28 GB

                       Unique chunks         Total chunks
Chunk index:                   10703                10712
------------------------------------------------------------------------------