Linux/dm-cache

From Forza's ramblings

dm-cache[edit | edit source]

Logo. Illustration in black and white drawing of stacked books, ink and a brush

Device Mapper Cache, is a Linux kernel feature designed to enhance storage performance by implementing a block-level cache on a separate cache device. dm-cache is a tool that helps the user setup a cache device.

The goal with dm-cache is to improve random read/write performance of a slow HDD by using a small but fast SSD or NVME device.

The main advantage of dm-cache over LVM and Bcache is that it is possible to setup on devices that already have a filesystem with data on them. Both LVM and Bcache requires unformatted, empty devices (there are ways to get around, but can be risky).

Setup[edit | edit source]

dm-cache can be set up using dmsetup.sh or with OpenRC init scripts for persistent configuration.

Source code can he downloaded from https://git.tnonline.net/Forza/dm-cache and is released under the GPLv3 license.

Requirements[edit | edit source]

dm-cache utilises the dmsetup utility which usually can be found in lvm2 or device-mapper packages.

dm-cache requires three devices

  • origin: The slow device.
  • cache: A fast SSD or NVME device. Can be of any size.
  • meta: A small device that holds dm-cache metadata.

The metadata device size depends on how many cache blocks fit on the cache device. With default setting it should be a least 0.01% of the cache device size. If the cache device is 50GiB, and a cache block size of 128KiB, a metadata device of 5MiB is enough. Smaller block sizes requires more metadata and memory, while larger block sizes may reduce effectiveness of the cache by storing cold data.

It is important to mount the filesystem on the dm-cache using the /dev/mapper/dmname path and not with the filesystem UUID as is commonly done. This is because the kernel might still see the UUID from the origin device, and this can cause data loss!

If you're using Btrfs, the following message in the kernel log:

# dmesg
BTRFS warning: duplicate device /dev/sdj1 devid 1 generation 182261 scanned by mount (13706)

There is a udev rule that prevents this issue by removing the /dev/disk/by-uuid/ symlink to the origin device.

Configuration[edit | edit source]

The following options are available:

  • dmname: Choose a new name for the assembled dm-cache. It will be exposed as a block device as `/dev/mapper/dmname`
  • origindev: Path to the slow device that shoulf be accelerated with dm-cache. Use a stable device ID, not FS UUID.
  • cachedev: The fast cache device, usually an SSD or NVME disk.
  • metadev: A small decice to hold cache metadata.
  • cachemode: Choose writethrough or writeback cache.
    • writethrough cache (default): Write through caching prohibits cachedev content from being different from origindev content. This mode only accelerates reads, but should allow the origin device to be used without the cache dev after a crash.
    • writeback cache: When write back cache is used. Writes are written to the cachedev first, before being synced in the background to the origin dev. If the system crashes, the dm-cache must be assembled again before use to avoid serious filesystem damage. If the cachedev fails, the filesystem can be irrevokably damaged!
  • cacheblock: The size of cache blocks in sectors. dm-cache promotes and demotes only whole blocks. Too large block size wastes cache discs, reducing its effectiveness, while too small has more memory and metadata overhead.
  • cachepolicy: Cache policy affects how dm-cache promotes and demotes data from the cachedev. This is an advanced option. Leave it as default.
  • readahead: Linux block device read-ahead value in sectors. The kernel calculates a suitable default if this is unset.

The Linux kernel documentation has more details on possible configuration options.

NOTE: It is important to mount the filesystem on the dm-cache using the /dev/mapper/dmname path and not with the filesystem UUID as is commonly done. This is because the kernel might still see the UUID from the origin device, and this can cause data loss!

udev rules[edit | edit source]

To avoid risk of accessing the filesystem via the origin device instead of via the dm-cache device, the following udev rule can be used. It removes the UUID symlink pointing to the origin device.

File: /etc/udev/rules.d/90-dmcache.rules
ENV{ID_FS_UUID_ENC}=="df68a30d-d26e-4b9c-9606-a130e66ce63d", KERNEL=="sd*", SUBSYSTEM=="block", ACTION=="add|change", SYMLINK-="disk/by-uuid/$env{ID_FS_UUID_ENC}"
  • ID_FS_UUID_ENC, means the filesystem's UUID.
  • sd* means the rule should match any /dev/sd* devices. Adjust if you use other names such as vd*, nvme*, etc.

The filesystem UUID can be found using blkid /dev/origindev.

# blkid /dev/sdj1
/dev/sdj1:
LABEL="usb-backup"
UUID="df68a30d-d26e-4b9c-9606-a130e66ce63d"
UUID_SUB="254fe753-d4d6-4ad1-9cc3-cd9f4c1bfa67"
BLOCK_SIZE="4096"
TYPE="btrfs"
PARTLABEL="Basic data partition"
PARTUUID="ac0ae9b1-8e32-4e33-b641-998bc0298d14"

mdev rules[edit | edit source]

Alpine Linux uses mdev instead of udev by default. The setup with mdev is slightly more complicated because it does not support removing existing symlinks. A workaround is using a shell script hook in /etc/mdev.conf.

  • install dmcache.mdev to /lib/mdev/dmcache. Make sure it has the executable bit set.
  • install dmcache-uuids to /etc/dmcache-uuids.
  • add the dmcache hook /lib/mdev/dmcache to mdev.conf at the persistent storage section.
File: /etc/mdev.conf
# persistent storage
dasd.*      root:disk 0660 */lib/mdev/persistent-storage
mmcblk.*    root:disk 0660 */lib/mdev/persistent-storage
nbd.*       root:disk 0660 */lib/mdev/persistent-storage
nvme.*      root:disk 0660 */lib/mdev/persistent-storage
sd[a-z].*   root:disk 0660 */lib/mdev/persistent-storage; /lib/mdev/dmcache
sr[0-9]+    root:cdrom 0660 */lib/mdev/persistent-storage
vd[a-z].*   root:disk 0660 */lib/mdev/persistent-storage
xvd[a-z].*  root:disk 0660 */lib/mdev/persistent-storage

Using OpenRC[edit | edit source]

The OpenRC init script can automate setting up and stopping dm-cache during boot.

  • Install conf.d/dmcache and init.d/dmscript
  • Modify conf.d/dmcache to suit your setup
  • Add a udev rule to block FS UUID device symlinks
  • Add dmcache to boot runlevel: rc-update add dmcache boot

Multiple devices[edit | edit source]

If you have several devices you can simply make a copy of the init.d and conf.d files to a new name. The filenames in init.d and conf.d must be the same.

  • cp /etc/conf.d/dmcache /etc/conf.d/dmcache new
  • ln -s /etc/init.d/dmcache /etc/init.d/dmcache new
  • update /etc/conf.d/dmcache.new
  • update udev rules
  • rc-service dmcache.new start
  • rc-update add dmcache.new boot

Using dmcache.sh[edit | edit source]

Edit dmcache.sh and add the devices and configuration options you need.

After starting dm-cache, you should remove the UUID symlink from /dev/disk/by-uuid/ which is pointing to your origin device. The udev rule can also be used to achieve this.

The dm-cache mapping is not persistent. After a reboot, the dm-cache must be assembled before the filesystem safely can be mounted.

Manually stopping dm-cache is done with dmsetup remove <dmname>.

Cache Statistics[edit | edit source]

Use cachestats.sh to get some statistics on the dm-cache performance.

# cachestats.sh --help
Usage: cachestats.sh [DEVICE_NAME or PATH] [DEVICE_NAME or PATH] ...
  -h, --help  Display this help message
# cachestats.sh data2
DEVICE
========
Device-mapper name:       /dev/mapper/data2
Origin size:              9 TiB
Discards:                 no_discard_passdown

CACHE
========
Size / Usage:             100 GiB / 100 GiB (100 %)
Read Hit Rate:            335116714 / 520317915 (64 %)
Write Hit Rate:           24739679 / 31858340 (77 %)
Dirty:                    0 bytes
Block Size:               128 KiB
Promotions / Demotions:   646797 / 646796
Migration Threshold:      1 MiB
Read-Write mode:          rw
Type:                     writeback
Policy:                   smq
Status:                   OK

METADATA
========
Size / Usage:             256 MiB / 10 MiB (3 %)

List all cache devices[edit | edit source]

The dmsetup utility can be used to list all known device-mapper devices.

List only cache devices.

# dmsetup ls --target cache
3t_backup   (254, 23)
data1       (254, 24)
data2       (254, 25)
data3       (254, 26)
usb_backup  (254, 27)

List all device-mapper devices and their relationships in a tree layout.

# dmsetup ls --tree
3t_backup (254:23)
 ├─ (8:114)
 ├─vg_800g-lv_cache_3t_backup_cache (254:3)
 │  └─ (8:0)
 └─vg_800g-lv_cache_3t_backup_meta (254:2)
    └─ (8:16)
data1 (254:24)
 ├─ (8:98)
 ├─vg_800g-lv_cache_data1_cache (254:6)
 │  └─ (8:0)
 └─vg_800g-lv_cache_data1_meta (254:4)
    └─ (8:16)
data2 (254:25)
 ├─ (8:130)
 ├─vg_800g-lv_cache_data2_cache (254:7)
 │  └─ (8:16)
 └─vg_800g-lv_cache_data2_meta (254:5)
    └─ (8:0)
data3 (254:26)
 ├─ (8:49)
 ├─vg_800g-lv_cache_data3_cache (254:14)
 │  └─ (8:16)
 └─vg_800g-lv_cache_data3_meta (254:15)
    └─ (8:0)
usb_backup (254:27)
 ├─ (8:145)
 ├─vg_800g-lv_cache_usb_backup_cache (254:1)
 │  └─ (8:0)
 └─vg_800g-lv_cache_usb_backup_meta (254:0)
    └─ (8:16)
vg_800g-3TB_meta1 (254:10)
 └─ (8:0)
vg_800g-3TB_meta2 (254:11)
 └─ (8:16)
vg_800g-6TB_meta1 (254:8)
 └─ (8:0)
vg_800g-6TB_meta2 (254:9)
 └─ (8:16)
vg_800g-virtiofs_meta1 (254:12)
 └─ (8:0)
vg_800g-virtiofs_meta2 (254:13)
 └─ (8:16)

dmsetup info can be used to display additional information for a cache device.

# dmsetup info /dev/mapper/data1
Name:              data1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      4480
Major, minor:      254, 24
Number of targets: 1