Linux/dm-cache

From Forza's ramblings

dm-cache[edit | edit source]

Logo. Illustration in black and white drawing of stacked books, ink and a brush

Device Mapper Cache (dm-cache) is a Linux kernel feature that enhances storage performance by adding a block-level cache on a separate, faster device. This setup can significantly boost random read/write performance on slower primary storage devices (such as HDDs) by caching frequently accessed blocks on a fast SSD or NVMe device.

The dm-cache tool is a collection of scripts designed to make setting up a Device Mapper Cache device straightforward, even for systems with existing data. Unlike alternatives like LVM cache and Bcache, dm-cache can be configured on devices with pre-existing filesystems, making it a safer and more convenient choice for systems with existing data.

Setup[edit | edit source]

dm-cache can be configured either with the dmsetup.sh script or through OpenRC init scripts for a persistent setup.

The source code is available at https://git.tnonline.net/Forza/dm-cache and is released under the GPLv3 license.

Requirements[edit | edit source]

dm-cache utilises the dmsetup utility which usually can be found in the lvm2 or device-mapper packages.

Three devices are required to setup a cache.

  • origin: The slow device.
  • cache: A fast SSD or NVMe device, which can vary in size.
  • meta: A small device that stores dm-cache metadata.

The metadata device size depends on how many cache blocks fit on the cache device. With the default setting, it should be a least 0.01% of the cache device size. If the cache device is 50GiB, and a cache block size of 128KiB, a metadata device of 5MiB is enough. Smaller block sizes requires more metadata and memory, while larger block sizes may reduce dm-cache's effectiveness.

It is important to mount the filesystem using the /dev/mapper/dm-name path and not with the filesystem UUID as is commonly done. This is because the kernel might associate the UUID to the origin device instead of the dm-cache device, and this can cause data loss!

# dmesg
BTRFS warning: duplicate device /dev/sdj1 devid 1 generation 182261 scanned by mount (13706)

Use the provided 90-dmcache.rules udev rule that prevents this issue by removing the /dev/disk/by-uuid/ symlink to the origin device.

Configuration[edit | edit source]

The following options are available:

  • dmname: Choose a new name for the assembled dm-cache. It will be exposed as a block device as `/dev/mapper/dmname`
  • origindev: Path to the slow device that shoulf be accelerated with dm-cache. Use a stable device ID, not FS UUID.
  • cachedev: The fast cache device, usually an SSD or NVME disk.
  • metadev: A small decice to hold cache metadata.
  • cachemode: Choose writethrough or writeback cache.
    • writethrough cache (default): Write through caching prohibits cachedev content from being different from origindev content. This mode only accelerates reads, but should allow the origin device to be used without the cache dev after a crash.
    • writeback cache: When write back cache is used. Writes are written to the cachedev first, before being synced in the background to the origin dev. If the system crashes, the dm-cache must be assembled again before use to avoid serious filesystem damage. If the cachedev fails, the filesystem can be irrevokably damaged!
  • cacheblock: The size of cache blocks in sectors. dm-cache promotes and demotes only whole blocks. Too large block size wastes cache discs, reducing its effectiveness, while too small has more memory and metadata overhead.
  • cachepolicy: Cache policy affects how dm-cache promotes and demotes data from the cachedev. This is an advanced option. Leave it as default.
  • readahead: Linux block device read-ahead value in sectors. The kernel calculates a suitable default if this is unset.

The Linux kernel documentation has more details on possible configuration options.

WARNING: Always mount the filesystem on the dm-cache device using the /dev/mapper/dm-name path. Using the filesystem UUID, as commonly done, can result in the kernel seeing the UUID from the origin device, potentially leading to data loss.

udev rules[edit | edit source]

To avoid risk of accessing the filesystem via the origin device instead of via the dm-cache device, the following udev rule can be used. It removes the UUID symlink pointing to the origin device.

File: /etc/udev/rules.d/90-dmcache.rules
ENV{ID_FS_UUID_ENC}=="df68a30d-d26e-4b9c-9606-a130e66ce63d", KERNEL=="sd*", SUBSYSTEM=="block", ACTION=="add|change", SYMLINK-="disk/by-uuid/$env{ID_FS_UUID_ENC}"
  • ID_FS_UUID_ENC, means the filesystem's UUID.
  • sd* means the rule should match any /dev/sd* devices. Adjust if you use other names such as vd*, nvme*, etc.

The filesystem UUID can be found using blkid /dev/origindev.

# blkid /dev/sdj1
/dev/sdj1:
LABEL="usb-backup"
UUID="df68a30d-d26e-4b9c-9606-a130e66ce63d"
UUID_SUB="254fe753-d4d6-4ad1-9cc3-cd9f4c1bfa67"
BLOCK_SIZE="4096"
TYPE="btrfs"
PARTLABEL="Basic data partition"
PARTUUID="ac0ae9b1-8e32-4e33-b641-998bc0298d14"

mdev rules[edit | edit source]

Alpine Linux uses mdev instead of udev by default. The setup with mdev is slightly more complicated because it does not support removing existing symlinks. A workaround is using a shell script hook in /etc/mdev.conf.

  • install dmcache.mdev to /lib/mdev/dmcache. Make sure it has the executable bit set.
  • install dmcache-uuids to /etc/dmcache-uuids.
  • add the dmcache hook /lib/mdev/dmcache to mdev.conf at the persistent storage section.
File: /etc/mdev.conf
# persistent storage
dasd.*      root:disk 0660 */lib/mdev/persistent-storage
mmcblk.*    root:disk 0660 */lib/mdev/persistent-storage
nbd.*       root:disk 0660 */lib/mdev/persistent-storage
nvme.*      root:disk 0660 */lib/mdev/persistent-storage
sd[a-z].*   root:disk 0660 */lib/mdev/persistent-storage; /lib/mdev/dmcache
sr[0-9]+    root:cdrom 0660 */lib/mdev/persistent-storage
vd[a-z].*   root:disk 0660 */lib/mdev/persistent-storage
xvd[a-z].*  root:disk 0660 */lib/mdev/persistent-storage

Using OpenRC[edit | edit source]

The OpenRC init script can automate setting up and stopping dm-cache during boot.

  • Install conf.d/dmcache and init.d/dmscript
  • Modify conf.d/dmcache to suit your setup
  • Add a udev rule to block FS UUID device symlinks
  • Add dmcache to boot runlevel: rc-update add dmcache boot

Multiple devices[edit | edit source]

If you have several devices you can simply make a copy of the init.d and conf.d files to a new name. The filenames in init.d and conf.d must be the same.

  • cp /etc/conf.d/dmcache /etc/conf.d/dmcache new
  • ln -s /etc/init.d/dmcache /etc/init.d/dmcache new
  • update /etc/conf.d/dmcache.new
  • update udev rules
  • rc-service dmcache.new start
  • rc-update add dmcache.new boot

Using dmcache.sh[edit | edit source]

Edit dmcache.sh and add the devices and configuration options you need.

After starting dm-cache, you should remove the UUID symlink from /dev/disk/by-uuid/ which is pointing to your origin device, before attempting to mount the filesystem. Use the rule to remove the symlink persistently.

The dm-cache mapping created by dmcache.sh is not persistent. After a reboot, the dm-cache must be assembled again before the filesystem safely can be mounted.

Manually stopping dm-cache is done using dmsetup remove <dmname>.

Cache Statistics[edit | edit source]

Use cachestats.sh to get some statistics on the dm-cache performance.

# cachestats.sh --help
Usage: cachestats [-v|--verbose] [DEVICE_NAME or PATH] [DEVICE_NAME or PATH] ...
Options:
  -h, --help      Display this help message
  -v, --verbose   Display detailed information
# cachestats.sh -v data2
DEVICE
========
Device-mapper name:       /dev/mapper/data2
Origin size:              9 TiB
Discards:                 no_discard_passdown

CACHE
========
Size / Usage:             100 GiB / 100 GiB (100 %)
Read Hit Rate:            335116714 / 520317915 (64 %)
Write Hit Rate:           24739679 / 31858340 (77 %)
Dirty:                    0 bytes
Block Size:               128 KiB
Promotions / Demotions:   646797 / 646796
Migration Threshold:      1 MiB
Read-Write mode:          rw
Type:                     writeback
Policy:                   smq
Status:                   OK

METADATA
========
Size / Usage:             256 MiB / 10 MiB (3 %)
NOTE: You can also use wildcards, i.e. cachestats.sh data*

List all cache devices[edit | edit source]

The dmsetup utility can be used to list all known device-mapper devices.

List only cache devices.

# dmsetup ls --target cache
3t_backup   (254, 23)
data1       (254, 24)
data2       (254, 25)
data3       (254, 26)
usb_backup  (254, 27)

List all device-mapper devices and their relationships in a tree layout.

# dmsetup ls --tree
3t_backup (254:23)
 ├─ (8:114)
 ├─vg_800g-lv_cache_3t_backup_cache (254:3)
 │  └─ (8:0)
 └─vg_800g-lv_cache_3t_backup_meta (254:2)
    └─ (8:16)
data1 (254:24)
 ├─ (8:98)
 ├─vg_800g-lv_cache_data1_cache (254:6)
 │  └─ (8:0)
 └─vg_800g-lv_cache_data1_meta (254:4)
    └─ (8:16)
data2 (254:25)
 ├─ (8:130)
 ├─vg_800g-lv_cache_data2_cache (254:7)
 │  └─ (8:16)
 └─vg_800g-lv_cache_data2_meta (254:5)
    └─ (8:0)
data3 (254:26)
 ├─ (8:49)
 ├─vg_800g-lv_cache_data3_cache (254:14)
 │  └─ (8:16)
 └─vg_800g-lv_cache_data3_meta (254:15)
    └─ (8:0)
usb_backup (254:27)
 ├─ (8:145)
 ├─vg_800g-lv_cache_usb_backup_cache (254:1)
 │  └─ (8:0)
 └─vg_800g-lv_cache_usb_backup_meta (254:0)
    └─ (8:16)
vg_800g-3TB_meta1 (254:10)
 └─ (8:0)
vg_800g-3TB_meta2 (254:11)
 └─ (8:16)
vg_800g-6TB_meta1 (254:8)
 └─ (8:0)
vg_800g-6TB_meta2 (254:9)
 └─ (8:16)
vg_800g-virtiofs_meta1 (254:12)
 └─ (8:0)
vg_800g-virtiofs_meta2 (254:13)
 └─ (8:16)

dmsetup info can be used to display additional information for a cache device.

# dmsetup info /dev/mapper/data1
Name:              data1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      4480
Major, minor:      254, 24
Number of targets: 1