Blog/Duperemove
2021-07-13: Duperemove with Btrfs[edit | edit source]
I've just updated my Deduplication guide with Duperemove. It is a tool that can deduplicate blocks of data between files on Linux filesystems that supports the Linux kernel fideduperange (previously Btrfs specific extent-same ioctl). Currently Btrfs and XFS supports this.
Head over to the articles over at https://wiki.tnonline.net/w/Btrfs/Deduplication and https://wiki.tnonline.net/w/Btrfs/Deduplication/Duperemove
The following example is what I often use myself:
chrt -i 0 duperemove -A -h -d -r -v -b4k --dedupe-options=noblock,same --lookup-extents=yes --io-threads=1
Adjust the block size to suit your needs. It is good to start with 64-128k.
chrt is used to change the scheduling class of a process. It is similar to nice and renice, but changes the scheduling class instead of priority within a class. I am using chrt --idle 0
to let duperemove only using spare cpu cycles so that it doesn't slow down the system too much.
chrt command line options
-o, --other Set scheduling policy to SCHED_OTHER (time-sharing scheduling). This is the default Linux scheduling policy. -f, --fifo Set scheduling policy to SCHED_FIFO (first in-first out). -r, --rr Set scheduling policy to SCHED_RR (round-robin scheduling). When no policy is defined, the SCHED_RR is used as the default. -b, --batch Set scheduling policy to SCHED_BATCH (scheduling batch processes). The priority argument has to be set to zero. -i, --idle Set scheduling policy to SCHED_IDLE (scheduling very low priority jobs). The priority argument has to be set to zero. -d, --deadline Set scheduling policy to SCHED_DEADLINE (sporadic task model deadline scheduling). The priority argument has to be set to zero. See also --sched-runtime, --sched-deadline and --sched-period. The relation between the options required by the kernel is runtime ⇐ deadline ⇐ period. chrt copies period to deadline if --sched-deadline is not specified and deadline to runtime if --sched-runtime is not specified. It means that at least --sched-period has to be specified. See sched(7) for more details.