ZFS Woke up to drive pool not found this morning. At a loss as to what to do? Any suggestions would be welcomed!

Quazione · Jul 7, 2021

I was having issues keeping the system running. So turned the system off, and ran a memory test, and replaced memory. When I rebooted the pool is unknown, and this is in the log file. Could booting from a CD and Running a memory test mess with the Headers on each drive???? How is that possible.

Code:

Jul  7 05:59:04 CSS-NAS-12TB Trying to mount root from zfs:freenas-boot/ROOT/11.3-U5 []...
Jul  7 05:59:04 CSS-NAS-12TB GEOM: ada1: the secondary GPT header is not in the last LBA.
Jul  7 05:59:04 CSS-NAS-12TB GEOM_PART: integrity check failed (ada1, GPT)
Jul  7 05:59:04 CSS-NAS-12TB GEOM: ada2: the secondary GPT header is not in the last LBA.
Jul  7 05:59:04 CSS-NAS-12TB GEOM_PART: integrity check failed (ada2, GPT)
Jul  7 05:59:04 CSS-NAS-12TB GEOM: ada3: the secondary GPT header is not in the last LBA.
Jul  7 05:59:04 CSS-NAS-12TB GEOM_PART: integrity check failed (ada3, GPT)
Jul  7 05:59:04 CSS-NAS-12TB GEOM: ada4: the secondary GPT header is not in the last LBA.
Jul  7 05:59:04 CSS-NAS-12TB GEOM_PART: integrity check failed (ada4, GPT)

I have attached the log file, and yes I know there are some bad sectors on ada3 but it has been that way since I built it and has worked fine for over 11 months.

SirDice · Jul 7, 2021

GhostBSD, pfSense, TrueNAS, and all other FreeBSD Derivatives

Quazione said:
Could booting from a CD and Running a memory test mess with the Headers on each drive????

Unlikely.

ralphbsz · Jul 7, 2021

I don't think that the memtest you did destroyed the gpart "headers" on the drive. More likely, the drives have become somehow inaccessible. It's probably not that the drives have self-destructed, since it hit all four of them. Suggestion: Look at whatever connects to the drives (cabling, power supplies, HBAs).

When I say "headers", that's sort of a joke: There are two copies of the GPT partition information on the drive, the primary is at the beginning of the disk, the secondary at the end. So it should be called "tailer". My suggestion: Boot from something, see whether the drives show up at all as /dev/adaxxx, check their identity and size, and use gpart manually to examine what is wrong.

SirDice · Jul 7, 2021

Code:

GEOM: ada1: the secondary GPT header is not in the last LBA.

That suggests the drive changed its size at some point. It, somehow, became larger than when the original partition table was created.

chungy · Jul 7, 2021

SirDice said:
Code:

GEOM: ada1: the secondary GPT header is not in the last LBA.

That suggests the drive changed its size at some point. It, somehow, became larger than when the original partition table was created.

Still not even strictly an error. FreeBSD works with just the start-of-disk GPT label intact, and the gpart recover command exists to fix it to be compliant with the GPT standard.

ralphbsz · Jul 7, 2021

SirDice said:
That suggests the drive changed its size at some point. It, somehow, became larger than when the original partition table was created.

Sensible theory. Alas, the current size of the four disk drives can be found in the log file, and it is 5,860,533,168 sectors, which is very accurately 3TB (if you use decimal TB), so the current size of the disks is fine.

Here is a nasty theory: In the bad old days, when UEFI and GPT first showed up, there were "helpful" BIOSes that would check the GPT partition tables, and if the backup at the end of the disk didn't match the copy at the beginning of the disk, they would automatically overwrite the backup copy. Even if the thing at the beginning of the disk is not a valid partition table. Which is obviously insane and stupid, but that didn't stop them from "trying to be helpful". We ran into that because our on-disk format didn't use partition tables (since we typically used many hundreds to tens of thousands of disk drives, partitioning them wasn't useful), and that aforementioned BIOS was clobbering actual data stored near the end of the disk. So we put protective GPTs on all our disks, in case other damn (dumb?) tools were also trying to be "helpful". This is when we learned that in order for the GPT mechanism to work correctly, the knowledge about the size of the disk needs to be perfect, otherwise the backup copy of the GPT is put in a slightly wrong place. And getting the size of the disk perfect is actually not completely trivial, for example on 512e drives with sector offsets, or RAID controllers that reserve a few sectors at the end of the disk for themselves, to store metadata.

Add to that: the OP is running FreeNAS, their disks are in a SES enclosure, so it is possible that something in the software stack is messing with the size of the disk, or the backup GPT copy.

I like chungy's recommendation: Boot, figure out what's going on, try to use gpart commands (for example recover) to fix things.

vmb · Jul 9, 2021

Just a thought, are the Western Digital WD30EZRX drives you are using in your ZFS NAS Shingled Magnetic Recording (SMR) drives?

I couldn't find any info myself on Western Digital Green WD30EZRX, but the Red WD30EFAX is SMR. As the more expensive drive is SMR, it could mean that the Green version is also SMR. It would be unlikely that a cheaper drive would be CMR for the same capacity.

covacat · Jul 9, 2021

they may have been dd-ed at drive level from a former install which used slightly different drives

Cath O'Deray · Jul 10, 2021

Quazione said:
… booting from a CD and Running a memory test …

What was the CD?

ZFS Woke up to drive pool not found this morning. At a loss as to what to do? Any suggestions would be welcomed!

Quazione

Attachments

SirDice

Administrator

ralphbsz

SirDice

Administrator

chungy

ralphbsz

vmb

covacat

Cath O'Deray