Blog/Btrfs Deduplication

From Forza's ramblings

2020-09-12: Deduplication With Btrfs[edit | edit source]

Bee hotel tubes opened with nests of the European orchard bee, at different development stage.

A rather unique feature of Btrfs is the concept of cloning files, or parts of file. We usually refer to this as reflinking.

Reflinking allows a user to make an instant copy of a file. It is similar to a hard link with a big difference. When the original file or the copy is modified, Copy-on-Write (CoW), ensures that the files remain unique from each other.

In other words, if you reflink a to b, and you then write new data to a, b will be kept unique. If you instead had made a hard-link with ln a b, any writes to a, would also happen to b because they are in fact the same file.

File cloning (reflink, copy-on-write) is easiest done with cp --reflink <source file> <destination file>

I wrote a guide today on how to use Bees to automatically deduplicate your filesystem. Head over to Btrfs/Deduplication/Bees! There are several other tools that also support deduplication. Take a look at Btrfs/Deduplication for a short list.