Btrfs/Deduplication/Bees

Deduplication with Bees[edit | edit source]

Bees is perhaps the most unique Deduplication tool as it is specifically made to work only with Btrfs filesystems. It is different in several ways:

It uses a very lightweight database (hash table) to keep track of deduplication progress. You can stop and restart and it continues from where it left off.
Memory usage is fixed and never grows beyond database size and a small amount for the bees runtime, whereas most other dedup tools grow to multi gigabyte RAM usage)
It runs continuously in the background, deduplicating any newly written data.
It runs only on whole filesystems, not on a limited set of files.
Bees can split extents to achieve better deduplication results.

Make sure you have a recent kernel. Some kernel versions has bugs triggered by Bees, so please check https://github.com/Zygo/bees/blob/master/docs/btrfs-kernel.md before continuing.

Installation[edit | edit source]

Installation instruction can be found on their Github repository here: https://github.com/Zygo/bees/blob/master/docs/install.md.

Please note that at the time of writing, the last stable version (v0.8) has some issues. Before compiling the program, please read the known issues section.

Configuration[edit | edit source]

First you need to determine the UUID of the filesystem you want to run Bees on. Use btrfs filesystem show to list all Btrfs filesystems with their UUID's.

# btrfs filesystem show

Label: 'btrfs-root'  uuid: 446d32cb-a6da-45f0-9246-1483ad3420e0
        Total devices 1 FS bytes used 35.04GiB
        devid    1 size 229.47GiB used 79.03GiB path /dev/sda3

Label: '6TB'  uuid: fe0a1142-51ab-4181-b635-adbf9f4ea6e6
        Total devices 2 FS bytes used 3.48TiB
        devid    2 size 2.72TiB used 2.21TiB path /dev/sdc2
        devid    3 size 1.82TiB used 1.30TiB path /dev/sdb2

Label: 'btrfs-boot'  uuid: 1128e72e-b00f-4c2a-a1e1-afa89f3c11cc
        Total devices 1 FS bytes used 70.55MiB
        devid    1 size 1.00GiB used 256.00MiB path /dev/sda2

Label: 'usb-backup'  uuid: df68a30d-d26e-4b9c-9606-a130e66ce63d
        Total devices 1 FS bytes used 581.47GiB
        devid    1 size 927.51GiB used 591.02GiB path /dev/sdd1

We'll use the fe0a1142-51ab-4181-b635-adbf9f4ea6e6 in this guide.

Now we can create the Bees configuration file /etc/bees/6TB.conf directly, or copy the sample configuration file located at /etc/bees/beesd.conf.sample and edit it. You can use any name on the file with a .conf file extension.

File: /etc/bees/6TB.conf

# UUID of the filesystem 
UUID=fe0a1142-51ab-4181-b635-adbf9f4ea6e6

# Specify the bees database size. It has to be a multiple of 128KiB
DB_SIZE=$((256*1024*1024)) # 256MiB in bytes

The database size determines how efficient Bees will be on your filesystem. The database has to fit in RAM, so make sure you have enough RAM available to run Bees with the selected database size.

Note: the optimal database size for a compressed filesystem is 2-4x larger for the same data on an uncompressed filesystem.

Unique data size	Database size	Average dedupe extent size
1TiB	4GiB	4KiB
1TiB	1GiB	16KiB
1TiB	256MiB	64KiB
1TiB	128MiB	128KiB <- recommended
1TiB	16MiB	1024KiB
64TiB	1GiB	1024KiB

Running Bees[edit | edit source]

Once you have the configuration set up we can run the Bees daemon beesd <uuid>

# beesd fe0a1142-51ab-4181-b635-adbf9f4ea6e6

I recommend to run bees inside a screen terminal or as a daemon. Bees also comes with a systemd unit file: https://github.com/Zygo/bees/tree/master/scripts

# systemctl enable --now [email protected]

Created symlink /etc/systemd/system/basic.target/[email protected] → /lib/systemd/system/[email protected].

Bees Stats[edit | edit source]

Bees will store various statistics and other files:

Path	Contents
/run/bees/uuid.status	Current running stats of bees.
/run/bees/mnt/uuid/.beeshome/beescrawl.dat	Bees crawler stats
/run/bees/mnt/uuid/.beeshome/beeshash.dat	Bees database
/run/bees/mnt/uuid/.beeshome/beesstats.txt	Bees statistics, database usage.

Known issues[edit | edit source]

Bees version not showed[edit | edit source]

The package for the last stable version (v0.8), didn't return its version number at the compilation, leading the binary to return an error instead of its version.

You can fix it by manually set it before compilation with:

# sed -i 's/BEES_VERSION ?=.*/BEES_VERSION ?= v0.8/' ./Makefile

Too much information is logged[edit | edit source]

By default, beesd is configured to be the much verbose by default, which is not ideal in some situation. It can be changed by launching beesd with -v 6 (warning level), or set in the config file.

File: /etc/bees/6TB.conf

# UUID of the filesystem 
UUID=fe0a1142-51ab-4181-b635-adbf9f4ea6e6

# Specify the bees database size. It has to be a multiple of 128KiB
DB_SIZE=$((256*1024*1024)) # 256MiB in bytes

## Options to apply, see `beesd --help` for details
OPTIONS="-P -v 6"

Anonymous

Search

Btrfs/Deduplication/Bees

Namespaces

More

Page actions

Contents

Deduplication with Bees[edit | edit source]

Installation[edit | edit source]

Configuration[edit | edit source]

Running Bees[edit | edit source]

Bees Stats[edit | edit source]

Known issues[edit | edit source]

Bees version not showed[edit | edit source]

Too much information is logged[edit | edit source]

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Btrfs/Deduplication/Bees

Deduplication with Bees[edit | edit source]

Installation[edit | edit source]

Configuration[edit | edit source]

Running Bees[edit | edit source]

Bees Stats[edit | edit source]

Known issues[edit | edit source]

Bees version not showed[edit | edit source]

Too much information is logged[edit | edit source]

Navigation

Wiki tools

Page tools

Categories