Btrfs/btdu

From Forza's ramblings

Where's my data?[edit | edit source]

Woodpecker sitting on a branch.
Great spotted woodpecker hiding in the canopy of an Oak tree.

Finding out how much disk space something is using on Btrfs is always an interesting task. Unfortunately, it isn't as easy as counting the extents (data blocks) that a file consists of.

Consider the following scenario with a SQL dump of this Wiki site. The dump is 11MiB, which we can see from the ls output.

# ls -lh wiki.sql
-rw-r--r-- 1 root root 11M Oct 25 17:46 wiki.sql

However, ls doesn't tell us the disk space required to store this file. Btrfs supports transparent compression, so the file might be smaller on-disk. One of the tools that can calculate the actual usage of a file is compsize. This extremely useful tool can calculate how much of a file is compressed, and with what compression algorithm, as well as how much is shared and referenced.

# compsize wiki.sql
Processed 1 file, 88 regular extents (88 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       19%      2.1M          10M          10M
zstd        19%      2.1M          10M          10M

SQL files compress really well. This 10MiB file is actually only 2.1MiB on-disk.

Can we now determine the full disk usage of this file? Unfortunately not. Remember that Btrfs also supports reflinks and snapshots. This complicates the calculation quite a bit because a file, or a part of a file, can be shared across several files.

Let's make a copy of the SQL file using cp --reflink wiki.sql wiki.bak.

# ls -lh
total 21M
-rw-r--r-- 1 root root 11M Oct 25 18:06 wiki.bak
-rw-r--r-- 1 root root 11M Oct 25 17:46 wiki.sql

The two files are of course identical. Because we used a reflink copy, the second file will share all its data with the first. This is not the same as a hard link, as Btrfs will make sure that future writes to either file remain only associated to the individual file, thanks to the Copy-on-Write (CoW) principle of Btrfs.

Using compsize, we can see that the disk usage remains unchanged, but the Referenced data has doubled.

# compsize wiki.*
Processed 2 files, 88 regular extents (176 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       19%      2.1M          10M          20M
zstd        19%      2.1M          10M          20M

Compsize is limited to calculate the disk usage only from the viewpoint of the files it checks. That means if files elsewhere in the filesystem share data with the selected files, they are not considered in the calculation.

Bookend extents[edit | edit source]

With Btrfs, a file's data is stored in blocks called extents. An extent can be between 4KiB and 128MiB, and is immutable. It means that once the extent is written, it can not be altered by future writes. Thanks to the Copy-on-Write (CoW) feature of Btrfs, any changes to a file will be written to a new extent, leaving the previous extents intact.

It is this property that helps make Btrfs resilient to power loss/crashes as partial writes won't damage the existing extents and the filesystem will recover all data as it was before the interrupted write.

Because extents are immutable, it can, with some workloads, lead to excessive unusable disk space.

Let's consider the following file which consists of one extent of 95MiB.

# compsize file
Processed 1 file, 1 regular extents (1 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL      100%       95M          95M          95M
none       100%       95M          95M          95M

Now we alter a large part of the file, but not all of it. Because the extent is immutable, the changes will be stored as a new extent.

# compsize file
Processed 1 file, 2 regular extents (2 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL      100%      181M         181M          95M
none       100%      181M         181M          95M

What happens is that the original extent of 95MiB remains, but with only 9 MiB remaining of the original data. Then a new extent of 86MiB is created to hold the changed data. The end result is that 181MiB actual disk space is used for a file that is only 95MiB.

Once an extent no longer has any referenced data, it will automatically be released back into usable space.

btdu - sampling disk usage profiler for btrfs[edit | edit source]

Analysing the whole filesystem is a difficult task, especially with Btrfs's unique features such as snapshots, subvolumes, reflinks and compression.

Btdu is a Btrfs specific disk space analyser tool. It provides an intuitive tree like structure of all of the space usage, split in the different types of data and allocation.

Screenshot of btdu in a text terminal
Btdu user interface in its entry point/root view

On a large filesystem it is far too slow to enumerate every single extent and cross reference them. Instead, btdu uses a random sampling pattern and continuously updates the results. This makes it very fast to gather a rough idea on how the filesystem is used, while the resolution increases gradually.

Btdu is a great tool to find out where disk space is used, and by what. In the example above with bookend extents we could see the unusable space on individual files using compsize. With btdu it is easy to analyse the whole filesystem and see all files that contribute with bookend extents. Btdu calls this UNREACHABLE data.

Screenshot of a text based terminal with btdu showing various data points.
The sum of all unreachable (bookend) extents can become quite large.

By using the arrow keys it's easy to drill down to find the files that contribute the most.

Btdu text terminal
A InnoDB logfile contributes to more than 50% of the unreachable space on this filesystem.

In the following screenshot we can see how much actual disk space various snapshots of a Linux Mint system takes.

Text terminal with a list of snapshots
Disk usage distribution between snapshots

Recovering unusable space[edit | edit source]

Btrfs will only release extents if all references to it have been removed. That means the data in the extent has to be fully rewritten.

The easiest way to achieve this is by making a full copy of the file using cp --reflink=never and then replacing the original file with the copy.

Another way is to use btrfs filesystem defragment, however it is not guaranteed to rewrite the entire file which may leave some unreachable parts, or even increase disk space usage. Read more about defragmenting a Btrfs filesystem on the dedicated page Btrfs/Defrag.

A third way is to use btrfs-extent-same from the duperemove package. The idea us to make a full copy of the original file and then reflink back all of its data. This should remove any bookend extents.

  • First, make a full copy of the file:
# cp --reflink=never file newfile
  • Next, use btrfs-extent-same <len> <file1> <offset_file1> <file2> <offset_file2>:
# btrfs-extent-same 100000000 newfile 0 file 0
Deduping 2 total files
(0, 100000000): newfile
(0, 100000000): file
1 files asked to be deduped
i: 0, status: 0, bytes_deduped: 100000000
100000000 total bytes deduped in this operation
# compsize file
Processed 1 file, 1 regular extents (6 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL      100%       95M          95M          95M
none       100%       95M          95M          95M
NOTE: Rewriting and defragmenting files will unshare data with other files and snapshots, potentially increasing disk usage.