Where's my data?[edit | edit source]
Finding out how much disk space something is using on Btrfs is always an interesting task. Unfortunately, it isn't as easy as counting the extents (data blocks) that a file consists of.
Consider the following scenario with a SQL dump of this Wiki site. The dump is 11MiB, which we can see from the
# ls -lh wiki.sql
-rw-r--r-- 1 root root 11M Oct 25 17:46 wiki.sql
ls doesn't tell us the disk space required to store this file. Btrfs supports transparent compression, so the file might be smaller on-disk. One of the tools that can calculate the actual usage of a file is
compsize. This extremely useful tool can calculate how much of a file is compressed, and with what compression algorithm, as well as how much is shared and referenced.
# compsize wiki.sql
Processed 1 file, 88 regular extents (88 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 19% 2.1M 10M 10M zstd 19% 2.1M 10M 10M
SQL files compress really well. This 10MiB file is actually only 2.1MiB on-disk.
Can we now determine the full disk usage of this file? Unfortunately not. Remember that Btrfs also supports
snapshots. This complicates the calculation quite a bit because a file, or a part of a file, can be shared across several files.
Let's make a copy of the SQL file using
cp --reflink wiki.sql wiki.bak.
# ls -lh
total 21M -rw-r--r-- 1 root root 11M Oct 25 18:06 wiki.bak -rw-r--r-- 1 root root 11M Oct 25 17:46 wiki.sql
The two files are of course identical. Because we used a
reflink copy, the second file will share all its data with the first. This is not the same as a
hard link, as Btrfs will make sure that future writes to either file remain only associated to the individual file, thanks to the Copy-on-Write (CoW) principle of Btrfs.
compsize, we can see that the disk usage remains unchanged, but the Referenced data has doubled.
# compsize wiki.*
Processed 2 files, 88 regular extents (176 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 19% 2.1M 10M 20M zstd 19% 2.1M 10M 20M
Compsize is limited to calculate the disk usage only from the viewpoint of the files it checks. That means if files elsewhere in the filesystem share data with the selected files, they are not considered in the calculation.
Bookend extents[edit | edit source]
With Btrfs, a file's data is stored in blocks called extents. An extent can be between 4KiB and 128MiB, and is immutable. It means that once the extent is written, it can not be altered by future writes. Thanks to the Copy-on-Write (CoW) feature of Btrfs, any changes to a file will be written to a new extent, leaving the previous extents intact.
It is this property that helps make Btrfs resilient to power loss/crashes as partial writes won't damage the existing extents and the filesystem will recover all data as it was before the interrupted write.
Because extents are immutable, it can, with some workloads, lead to excessive unusable disk space.
Let's consider the following file which consists of one extent of 95MiB.
# compsize file
Processed 1 file, 1 regular extents (1 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 95M 95M 95M none 100% 95M 95M 95M
Now we alter a large part of the file, but not all of it. Because the extent is immutable, the changes will be stored as a new extent.
# compsize file
Processed 1 file, 2 regular extents (2 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 181M 181M 95M none 100% 181M 181M 95M
What happens is that the original extent of 95MiB remains, but with only 9 MiB remaining of the original data. Then a new extent of 86MiB is created to hold the changed data. The end result is that 181MiB actual disk space is used for a file that is only 95MiB.
Once an extent no longer has any referenced data, it will automatically be released back into usable space.
btdu - sampling disk usage profiler for btrfs[edit | edit source]
Analysing the whole filesystem is a difficult task, especially with Btrfs's unique features such as snapshots, subvolumes, reflinks and compression.
On a large filesystem it is far too slow to enumerate every single extent and cross reference them. Instead, btdu uses a random sampling pattern and continuously updates the results. This makes it very fast to gather a rough idea on how the filesystem is used, while the resolution increases gradually.
Btdu is a great tool to find out where disk space is used, and by what. In the example above with bookend extents we could see the unusable space on individual files using
compsize. With btdu it is easy to analyse the whole filesystem and see all files that contribute with bookend extents. Btdu calls this
By using the arrow keys it's easy to drill down to find the files that contribute the most.
In the following screenshot we can see how much actual disk space various snapshots of a Linux Mint system takes.
Recovering unusable space[edit | edit source]
Btrfs will only release extents if all references to it have been removed. That means the data in the extent has to be fully rewritten.
The easiest way to achieve this is by making a full copy of the file using
cp --reflink=never and then replacing the original file with the copy.
Another way is to use
btrfs filesystem defragment, however it is not guaranteed to rewrite the entire file which may leave some unreachable parts, or even increase disk space usage. Read more about defragmenting a Btrfs filesystem on the dedicated page Btrfs/Defrag.
A third way is to use
btrfs-extent-same from the duperemove package. The idea us to make a full copy of the original file and then reflink back all of its data. This should remove any bookend extents.
- First, make a full copy of the file:
# cp --reflink=never file newfile
- Next, use
btrfs-extent-same <len> <file1> <offset_file1> <file2> <offset_file2>:
# btrfs-extent-same 100000000 newfile 0 file 0
Deduping 2 total files (0, 100000000): newfile (0, 100000000): file 1 files asked to be deduped i: 0, status: 0, bytes_deduped: 100000000 100000000 total bytes deduped in this operation
# compsize file
Processed 1 file, 1 regular extents (6 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 95M 95M 95M none 100% 95M 95M 95M