Btrfs/Parent Transid Verify Failed

From Forza's ramblings

Parent Transid Verify Failed[edit | edit source]

Inc drawing on drawing paper
Branch of a maple tree with leaves and seeds

Parent Transid Verify Failed is perhaps the most dreaded error in Btrfs. It is often not possible to fix this error and you should look over your backups.

[ 4007.489730] BTRFS error (device vdb): parent transid verify failed on 30736384 wanted 10 found 8

Why does transid errors happen?[edit | edit source]

Btrfs stores its metadata in a tree-like structure called b-trees. There are the several FS trees, the Extent tree, Root tree, Chunk tree, Checksum tree and Device tree.

illustration if node dependencies in a subvolume
Illustrating the relationships between the different sets of keys (grouped here by inode number) in a subvolume

Copy-on-Write (CoW) means that a changed block of data is stored in a new extent, then the metadata is updated to point to the new extent. This way, if there is a crash or power-outage, either all the new data available, or the old data remain intact.

If we imagine a fs-tree as a virtual tree with a root, branches and leaves, it is easier to understand how tree updates happen:

  • When a some data is updated, COW ensures that the update is written to a new extent, leaving the old data intact.
  • A new leaf is created pointing to the new data
  • A branch is created pointing to the new leaf
  • The root is updated to point to the new branch

If there is a crash or power-outage before the root is updated, the root will still point to the old branch and the old leaves, and your old data remain intact.

Each part of the tree stores the transaction counter. What the Parent Transid Verify Failed means is that a paree is pointing to a child/leaf with mismatching transaction ID. This should normally not be possible because and it means that a fundamental part of Btrfs is broken.

Linux uses Flush/FUA (previously known as barriers) as a way to ensure that no additional data is written to disk before before the Flush/FUA is acknowledged. This is very important because storage devices can re-order writes in their write cache to optimise performance. Flush/FUA makes sure that the data on-disk is in a consistent order, even if there is a power loss.

So, when there is a Parent Transid Verify Failed, it often means that the storage was not honouring the Flush/FUA as it should! In some cases, a bus or device reset, in-flight data or data in the drive cache can be dropped, which could also lead to to an inconsistent filesystem.

Some other situations that can lead to transid errors are:

  • Logical bugs: The filesystem structures haven’t been properly updated and stored correctly.
  • Misdirected writes: the underlying storage does not store the data to the exact address as expected and overwrites some other data.
  • Block storage device (hardware or emulated) does not properly flush and persist data between transactions so they get mixed up.
  • Lost writes without proper error handling: writing the block worked as viewed on the filesystem layer, but there was a problem on the lower layers not propagated upwards.
  • Suspend mode sometimes confuse the kernel, drive cache, etc, leading to data loss causing same type of errors.

Repairing the filesystem[edit | edit source]

Do not attempt to repair the filesystem yourself, but seek advice from the developers at the #btrfs IRC channel or the Btrfs mailing list. There are advanced mount options such as -o ro,rescue=all and -o ro,usebackuproot that can assist in recovery, but do not attempt to use them without consulting with experienced admins first.

If the filesystem turned read-only before parent transid verify failed, it may be a false alarm. In this case umount it and try mounting it again.

It is usually not possible to make repairs and get back to a fully working state. If you can mount the filesystem as read-only, use this opportunity to make backups of your files. If it is not possible to mount the filesystem as read-only, then btrfs restore is a last option. Btrfs restore can ignore errors and copy the files from an unmounted filesystem to another device.

Preventing Parent Transid Verify Failed[edit | edit source]

If the device is lying about it's capability to support barriers, there is no way for Linux to know when the data is actually on-disk. It s therefore very hard to fully mitigate the risk. If your device is in a USB encloure, try switching to another enclosure or fit it directly on the SATA bus. It is relatively common that USB-SATA bridges used in enclosures do not implement all ATA features, don't have good error handling or simply lie about the device capabilities. A personal experience of this was a USB enclosure that didn't implement the USB Attached SCSI (UAS) protocol properly and lost writes during heavy load. Disabling the UAS kernel module helped in that case.

Disabling write cache is one possible mitigation of the problem. It slightly reduces performance, but it should prevent write cache reordering. Use hdparm to disable write cache.

hdparm -i is used to get drive information:

# hdparm -i /dev/sda
/dev/sda:

 Model=SAMSUNG SSD 830 Series, FwRev=CXM03B1Q, SerialNo=S0Z4NEAC325687
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=500118192
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

 * signifies the current active mode

hdparm -W 0 is used to disable write cache while hdparm -W 1 enables write cache:

# hdparm -W 0 /dev/sda
/dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)
# hdparm -i /dev/sda
/dev/sda:

 Model=SAMSUNG SSD 830 Series, FwRev=CXM03B1Q, SerialNo=S0Z4NEAC325687
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=500118192
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

 * signifies the current active mode

The write cache on NVME drives can be disabled using nvme set-feature -f 6 -v 0 -s.

Another option is smartctl --set wcache,off.

Write cache has to be disabled on each boot and on each resume from suspend or hibernation.

The Arch Wiki has a good write-up on how to use hdparm and make its settings persistent across reboot.