Solved Migrate disks to new server

wayne47 · Jun 21, 2017

Running 11.0 with stock install of ZFS mirroring 2 disks. I want to move the physical disks to another server. When I used shutdown and moved the disks, the kernel booted but I got GPT errors and die at mountroot>

Disks run just fine when reinstalled back in the old server.

I looks like I need to do a zfs export but even with -f, that will not work if / is running on zfs, which is what the default install created. I tried booting a USB stick, it sees the pool as ONLINE but will not export it.

What's the correct method to move mirrored ZFS disks to another physical server? There appears to be no documentation regarding this in the ZFS docs.

k.jacker · Jun 21, 2017

With / on zfs you should be able to move the disks without booting problems.
With the pool online when booting from USB you should first check your boot parition gpart show and see if you boot from efi or legacy.
The new server might default to try booting in uefi mode while your old server is booting in legacy mode. If that's the case disable booting from uefi on your new server.

wayne47 · Jun 21, 2017

No luck. Old server is Asus with P5MT-R. New server is a Dell SC1435 with 2.2.5 BIOS (latest avail, I believe). I do not think the Dell supports UEFI.

gpart show on the Asus shows efi.
Dell will not boot via USB (crashes to mountroot>).
Booting the Dell from CDROM with the drives installed gpart show shows no pools.
Booting the Dell from the disks gives GPT error/corrupt message.

ralphbsz · Jun 22, 2017

wayne47 said:
Dell will not boot via USB (crashes to mountroot>).

Strange. That should work. It worries me that your computer can't do a normal USB boot. Are you sure you are using the correct FreeBSD USB media?

Booting the Dell from CDROM with the drives installed gpart show shows no pools.

gpart show doesn't show pools, it shows partitions. So please give us more detail: If you boot the Dell from CDROM, with the two disks attached:

Do you see the two disk drives? Look in the output of dmesg or look in /var/log/messages, and look for the lines where the two disk drives are being identified. The best way to do this would be: Look on the old machine for lines that look like:
ada3 at ata1 bus 0 scbus4 target 0 lun 0
ada3: <Hitachi HDS5C3030ALA630 MEAOAA10> ATA8-ACS SATA 3.x device
ada3: Serial Number MJ0351YNG9RZ6A
[FONT=verdana]write down the manufacturer, model and serial number of the drives, move them to the Dell machine, and make sure similar lines (same manufacturer, same model number, perhaps different /dev/adaX block device) show up.[/FONT]

If that succeeds, and the Dell actually sees the disk drive hardware (for example at adaX), then please post the output of gpart show adaX, perhaps even with explicit partition names if you have them using gpart show -l ...

That will help verify that the disks are actually accessible hardware wise, and what the GPT on them really contains.

Booting the Dell from the disks gives GPT error/corrupt message.

Who gives that error message? FreeBSD or the BIOS? When does it happen, what messages come right before and after it? Can you post the exact message?

By the way, my #1 complaint about people asking for help is that they don't give the details of error messages (and I've spent a lot of time assisting with technical support for large commercial systems installed at very sophisticated customers, who have the same problem). In order to not come over as obnoxious, I'll turn it into a joke: If you tell me that your file system isn't working and is giving "some error message about something", I can't help you, because I don't know how to fix "something". If you cut and paste the exact error message: "Error: connection to disk /dev/ada9 has been disrupted, a purple elephant has yanked out the SAS cable and stuck his nose into the SAS connector instead.", then I can give you technical advice: Send the elephant back to the zoo, reconnect the disk, check whether it also stepped on some power cables, and try again.

wayne47 · Jun 22, 2017

ralphbsz said:
Strange. That should work. It worries me that your computer can't do a normal USB boot. Are you sure you are using the correct FreeBSD USB media?

Yes. Correct media. Known problem. Something horrible was done to the loader after FreeBSD 10.3. Prior releases work fine, from that point on, USB is utterly broken on older hardware, you have to use CD/DVD. I can dig up details but do not want to clutter this post

ralphbsz said:
gpart show doesn't show pools, it shows partitions. So please give us more detail: If you boot the Dell from CDROM, with the two disks attached:

Do you see the two disk drives? Look in the output of dmesg or look in /var/log/messages, and look for the lines where the two disk drives are being identified. The best way to do this would be: Look on the old machine for lines that look like:
ada3 at ata1 bus 0 scbus4 target 0 lun 0
ada3: <Hitachi HDS5C3030ALA630 MEAOAA10> ATA8-ACS SATA 3.x device
ada3: Serial Number MJ0351YNG9RZ6A
[FONT=verdana]write down the manufacturer, model and serial number of the drives, move them to the Dell machine, and make sure similar lines (same manufacturer, same model number, perhaps different /dev/adaX block device) show up.[/FONT]

[FONT=verdana]Sorry. I usually shoot pix but my phone broke (been a bad couple of weeks). Got a cheap replacement today. Here:

[/FONT]

ralphbsz said:
If that succeeds, and the Dell actually sees the disk drive hardware (for example at adaX), then please post the output of gpart show adaX, perhaps even with explicit partition names if you have them using gpart show -l ...

That will help verify that the disks are actually accessible hardware wise, and what the GPT on them really contains.

No luck:

ralphbsz said:
Who gives that error message? FreeBSD or the BIOS? When does it happen, what messages come right before and after it? Can you post the exact message?

FreeBSD:

ralphbsz said:
By the way, my #1 complaint about people asking for help is that they don't give the details of error messages

Understood. Though you can be a lot more obnoxious if you solve the problem

Eric A. Borisch · Jun 22, 2017

You said zfs root on a mirror.. have you tried swapping the boot order of the disks (or swap cables) in the new system? (Long shot.)

Anything else intresting About the setup? Encryption? Custom kernel? (Guessing not, you said stock.)

ralphbsz · Jun 22, 2017

First, the USB boot problem: Didn't know that is has become broken. That's an unpleasant surprise. Sorry for harassing you about that.

Second, the GPT / GEOM problem: Very weird. To begin with, your da[01] device driver lines look pretty normal; both disks da0 and da1 are being recognized correctly. They are connected to a LSI logic SAS HBA (= disk controller); that comes from the "mpt0" being mentioned. That explains why their device names are "da" and not "ada": they are seen as SCSI devices by the FreeBSD kernel, with the LSI card providing a SCSI emulation of a native SATA disk.

OK, and this immediately brings up a mystery: If you look at the disks' part numbers, you see they are Hitachi HDN724040AL, which is a fine 4TB drive, a SATA desktop NAS model. It's trivial to find the spec for the drive at Hitachi's web site. Observe two important things: The disk is native SATA, and it has 4TB capacity. But: The OS doesn't know its size! Or to be more accurate: If you look at the capacity line, it has 0xFFFFFFFF = 4.2 billion sectors, or a capacity of 2TB. That immediately explains why GEOM complains: It tries to read the first copy of the GPT table (which is at the beginning of the disk), then it tries to read the second or backup copy (which is at the end). The "end" of the disk is about 2TB into a 4TB disk. My educated guess is that when GEOM tries to read the second copy, it gets invalid content (something that is not formatted as a GPT), and prints a generic message "corrupt or invalid". EDIT: I made a mistake in the original post, and calculated the reported capacity as 2PB, way too high.

So what do we learn from that: There is something wrong with your storage stock, somewhere between the disk and the FreeBSD kernel (including both), which makes it impossible to read the disk capacity. That's disturbing, because these are good quality disks, the LSI met is a fine controller, and FreeBSD isn't completely broken. Who is the culprit? Don't know, but my first guess is the LSI controller and its SCSI emulation.

Suggestions? Not many practical ones.

On the Dell machine, do you happen to have any other ports you can plug these disks in? Maybe some SATA ports directly on the motherboard? Do you have a different SAS or SATA controller card that you can use instead? Instead of fixing the problem, just circumvent it.
Check the firmware version of the drive and of your LSI card. On the drive, it is version A5E0; to find out whether that's current, you probably have to contact Hitachi tech support. Next question: Look at the "mpt" lines in your dmesg file, and find out the firmware version on your LSI card. If the FreeBSD mpt driver doesn't report the firmware version, then boot again, and go into the BIOS setup screen of the LSI controller. Then check whether it's up to date enough; that probably requires contacting LSI (now known as Broadcom, formerly as Avago) tech support. It might be a good idea to do firmware upgrades on the disk and the controller, but that's a very tall order for someone who doesn't have the tools ready.
If the first two suggestions have failed (they probably will, as they are impractical): Try the following commands, to see what else works and doesn't with these disk drives. First camcontrol identify and ... readcap, to see that a SCSI inquiry command makes it through to the drive unmolested, and to see what the storage stack thinks the capacity of the drive is. Then try reading the first little bit of the disk, which contains the first copy of the GPT, with dd if=/dev/daX of=/tmp/save.the.gpt bs=4096 count=64.
The only real hope is that someone who is familiar with the FreeBSD mpt stack sees this discussion and comes up with a sensible suggestion.

wayne47 · Jun 22, 2017

Will try swapping disks, did not think of that. Given Ralph's post it won't help though.

Eric A. Borisch said:
You said zfs root on a mirror.. have you tried swapping the boot order of the disks (or swap cables) in the new system? (Long shot.)

Anything else intresting About the setup? Encryption? Custom kernel? (Guessing not, you said stock.)

Nothing unusual about setup

wayne47 · Jun 22, 2017

Seems like the Dell SC1435 is limited to 2GB disk drives. Never even considered that would be an issue.

I don't think there is a work around. Thanks for the comments, at least they led to an answer.

Eric A. Borisch · Jun 22, 2017

wayne47 said:
Will try swapping disks, did not think of that. Given Ralph's post it won't help

Indeed. Like I said, longshot.

ralphbsz · Jun 23, 2017

wayne47 said:
Seems like the Dell SC1435 is limited to 2GB disk drives.

No, that makes NO sense. A quick look at Dell's website shows that this is a dual-socket server with one or two AMD Opteron 2200 chips. All Opteron are 64-bit chips. It has a handful of PCIe slots. This is a relatively modern machine, long past the 2GB limitation (which was overcome pretty universally in the late 90s).

And once the OS runs, the machine itself is out of the picture: FreeBSD talks to its mpt(4) device driver, which talks to the LSI logic hardware and firmware, which talks to the disk. The only thing that matters is the OS, the firmware, and the CPU (which is a modern Opteron). The BIOS is not involved.

Also, the disks are not being reported as 2GB ... they are being reported as 2TB (a thousand times larger), while in reality they are 4TB. And the disk size reporting is simply broken: their size is being reported as "-1" sectors (0xFFFFFFFF sectors). EDIT: The original post here said "reported as 2PB", not 2TB. Oops, my fault.

Question: Exactly what model LSI SCSI host adapter do you have in that server? All LSI adapters that are younger than about 10 years can use disks greater than 2GB. Please search your dmesg output for lines that talk about "met". Don't give up hope yet (but if you have more important things to work on, ignore me).

sko · Jun 23, 2017

ralphbsz said:
No, that makes NO sense. A quick look at Dell's website shows that this is a dual-socket server with one or two AMD Opteron 2200 chips. All Opteron are 64-bit chips. It has a handful of PCIe slots. This is a relatively modern machine, long past the 2GB limitation (which was overcome pretty universally in the late 90s).

I think he meant 2TB, not 2GB (which was an artificial limitation on RAM size by crippling the PAE on some variants of Windows). The disks are also reported as ~2TB (2.097.151 MB).
The disk size limitation would be completely unrelated to the CPU but inflicted by the storage controller (but for the same reason: 32bit addressing space).
This ancient machine only has an embedded SATA2 controller - lots of them had disk size limitations of ~2TB. I still have some 3ware and adaptec SATA2 HBAs/RAID controllers even from the mid-"00s" somewhere in the basement which were replaced exactly for that reason. The behaviour of these controllers when faced with bigger disks varied from reporting only the supported size, over not recognizing the disks at all, up to completely lock up the system at boot. With one controller that only reported (IIRC) 2.3TB size, data was corrupted after a few GB of writes. I think this is what specsheets actually mean by "undefined behaviour"

If this dinosaur really has to be the new server, try using a newer HBA - older LSI SAS2008 based cards like the IBM M1015 are available very cheap nowadays and still offer reasonable performance and reliability for a home server. PCIe is downwards-compatible, so these HBAs *should* work in that server even if it only has PCIe 1.0 slots.
OTOH: This machine might draw so much power that a newer platform could amortize within a few months... The next limitations you might run into with this Dell server is the 32GB RAM maximum and the fact that it only has one PCIe and one PCI-X slot...

ralphbsz · Jun 23, 2017

sko: You're probably right. I had my mind on the wrong track, by thinking that the "mpt" controller in use here must be a PCIe card. In reality, it's probably the built-in controller on the motherboard, and as such it is probably from 2003. That controller could have the SCSI/SAS 2TB limitation. And: In the posts above, I calculated the reported capacity wrong: 0xFFFFFFFF sectors is indeed 2TB, not 2PB. My bad!

wayne47 · Jun 23, 2017

Yes, I meant 2TB, as reported by the BIOS. There is an actual card and boot reports it as SAS5ira and LSILogic SAS1068-IR. I'll just keep the drives in this thing under 2 TB.

My plan was to use this server to replace a box that has been running since 1998. Guess I'll need to look for something a little newer.

sko · Jun 23, 2017

I found a datasheet for that chipset from supermicro: http://www.supermicro.com/manuals/other/LSI MegaRAID_Configuration_for_the_LSI_1068_Controller.pdf

This controller definitely has a 2TB limitation, so I'd be _very_ cautious with using larger disks on this controller. I'd also recommend you flash the IT firmware to get the HBA in passthrough-mode (thats what I have done with all SAS2008/2308/3008 controllers I've deployed for ZFS). Older controllers/firmware are notorious for their weird proprietary on-disk formats when using any RAID-mode or JBOD and even if it "works" you might not be able to access these disks on another controller.

ZFS is very resilient, but wonky firmware that may even fail to address the drive correctly (possibly with an overflowing/wrapping address pointer) is a master recipe for disaster. So at least get a more modern HBA (as said: LSI SAS2008 HBAs are selling very cheaply nowadays) or if some budget is available maybe even a newer system.
Lots of used Xeon E3-12xx v3 systems are currently hitting the market as they are old enough to fall into the regular replacement cycle of many companies. For a small home storage system even an Atom C2550 is easily sufficient and new boards are available for <300$.

I've also recycled lots of (VERY) old hardware for my home servers until a few years ago when I realized, that the power consumption and constant upgrading and replacement of dying parts was more expensive on the long run than just buying some new(er) hardware and only replace/upgrade the hard drives for the next few years...

phoenix · Jun 23, 2017

mpt(4) is limited to 2 TB drives, regardless of what firmware you load onto it. It's a hardware limitation. We ran into this bug awhile back on one of our ZFS servers. SAS1028/SAS1038 is the chip version.

mps(4) is the newer version of the LSI driver, which supports drives over 2 TB. Requires different hardware, though. SAS2xxx or newer.

sko · Jun 26, 2017

phoenix said:
mpt(4) is limited to 2 TB drives, regardless of what firmware you load onto it. It's a hardware limitation. We ran into this bug awhile back on one of our ZFS servers. SAS1038 is the chip version.

If this limitation is present on all devices/controllers that use the mps driver, it might be sensible to add a comment (e.g. "known issues") about this to the manpage?
At least the mpi_log_sas.h mentions this:

Code:

mpilib/mpi_log_sas.h:/* Compatibility Error : IME size limited to < 2TB */

But there seem to be no further notes on any 2TB limitation in other files (or my grep-fu was not strong enough

)

phoenix · Jun 28, 2017

That would be a good idea. Submit a documentation PR for the mpt(4) man page.

SirDice · Jun 28, 2017

phoenix said:
mps(4) is the newer version of the LSI driver, which supports drives over 2 TB. Requires different hardware, though. SAS2xxx or newer.

A few months ago I bought an LSI SAS9207-8i to replace a broken (and crappy) Promise SATA-300 card. Best money I ever spent.

Code:

root@molly:~ # mpsutil show adapter
mps0 Adapter:
       Board Name: SAS9207-8i
   Board Assembly: H3-25412-00K
        Chip Name: LSISAS2308
    Chip Revision: ALL
    BIOS Revision: 7.39.00.00
Firmware Revision: 20.00.02.00
  Integrated RAID: no

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max    Device
0       0001        0009       N         6.0     1.5    6.0    SAS Initiator
1       0002        000a       N         6.0     1.5    6.0    SAS Initiator
2       0003        000b       N         6.0     1.5    6.0    SAS Initiator
3       0004        000c       N         6.0     1.5    6.0    SAS Initiator
4       0005        000d       N         6.0     1.5    6.0    SAS Initiator
5       0006        000e       N         6.0     1.5    6.0    SAS Initiator
6       0007        000f       N         6.0     1.5    6.0    SAS Initiator
7       0008        0010       N         6.0     1.5    6.0    SAS Initiator

Works fine with the 4 x 3TB disks currently attached. Planning to attach a couple more disks (the pool is nearly full).

Solved Migrate disks to new server

Administrator