brett

The Stages of ZFS Data Loss Grief

hard drives stacked in a small computer

I use a widely-known and inexpensive method to add additional SATA storage with the Dell Perc H310. I found this old Host Bus Adapter (HBA) a long while back. This HBA can be flashed to IT mode by taping over a couple PCI pins to bypass the hardware RAID and use software RAID. 1

Since moving my home servers to Proxmox to manage virtualization, I setup disk passthrough to a VM managing my ZFS array. What could go wrong?

$ qm set 100 -scsi1 /dev/disk/by-id/…
$ qm set 100 -scsi2 /dev/disk/by-id/…
$ …

I’m sure this is fine.

Hours later, after a seemingly innocent reboot…

On the guest:

$ zpool list
no pools available

On the host:

$ zpool list
no pools available

Hmm.

1. Denial

$ zpool import tank
cannot import 'tank': I/O error
	Destroy and re-create the pool from
	a backup source.

Uh oh.

$ zpool import -F tank
cannot import 'tank': one or more devices is currently unavailable

This is not good.

2. Anger

My restic backups are stale because of some issues with my homelab. 🤦‍♂️

$ zpool import -N -o readonly=on -f tank
cannot import 'tank': I/O error
	Destroy and re-create the pool from
	a backup source.

Destroy and re-create the pool from a backup source.

At this point, most forums appear to suggest that the pool is lost forever.

3. Bargaining

Readonly should have worked 🤔

$ zpool import -N -o readonly=on -f -R tank
   pool: tank
     id: …
  state: ONLINE
status: Some supported features are not enabled on the pool.
	(Note that they may be intentionally disabled if the
	'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
	some features will not be available without an explicit 'zpool upgrade'.
 config:

	tank                        ONLINE
	  raidz2-0                  ONLINE
	    …                       ONLINE
	    …                       ONLINE
	    …                       ONLINE
	    …                       ONLINE
$ zpool import -F

Same output as above.

Online seems good, right?

$ zpool status
no pools available
$ zpool import -F -m tank
cannot import 'tank': one or more devices is currently unavailable

Well, here we go. Let’s find the txg to use for a rollback.

$ zpool import -FX tank
# seemingly hanging for a while…
^C^C^C^C

That option must not work the way I expected (forgive my impatience, dear reader).

4. Depression

At this point I pull down the latest snapshot from Backblaze and assess the damage.

$ zdb tank
zdb: can't open 'tank': No such file or directory

ZFS_DBGMSG(zdb) START:
ZFS_DBGMSG(zdb) END

What have I done to myself.

5. Acceptance

$ restic snapshots
repository … opened (version 2, compression level auto)
ID        Time                 Host           Tags                   Paths
--------------------------------------------------------------------------
20ee6d7b  …                    restic-remote  restic                 /data

Deep breath.

$ restic restore 20ee6d7b --target ./data

6. ?

Ok, wait a minute. Let’s try that mysterious -X flag again, but with more patience.

$ zpool import -FX tank
# … waiting … staring … go get dinner … waiting … put baby to bed …

Exit 0! It worked!

$ ls /mnt/tank
files files files!

Immediately:

$ rsync -ahP /mnt/tank elsewhere:/mnt/pond/tank

I later discovered from some folks on a ZFS forum that’s better to avoid disk passthrough for ZFS pools in VMs, but this may depend on the HBA controller.

Now, I pass the entire HBA controller to guest VMs instead of individual disks when using ZFS. Lesson learned.

Thank you FreeBSD, Truenas, r/zfs communities and datahoarders.


1: There are several instructions to flash the Dell Perc H310 HBA to IT mode: video walkthrough, ServeTheHome post, and TrueNAS forum thread