ZFS replay transaction error 2

gaulfinger · Oct 31, 2011

Has anyone see this one before?

Code:

Solaris: WARNING: ZFS replay transaction error 2, dataset zfs002/blm15, seq 0x5130, txtype 8

This is on an 8.2-RELEASE system. Hardware is Supermicro X9SCM-F motherboard with two LSI 9750-4i4e controllers. First controller has 12x 3TB Hitachi 0F12456 3TB Enterprise drives, the other was added yesterday, with 24 drives of the same. The motherboard sports 4 SSDs for boot, ZIL, and L2ARC. A single zpool is made up of 17 mirror pairs and two spares.

We upgraded yesterday to add the 2nd LSI card and relocated 20 drives from the original controller to it. Then we added 4 more drives. The system ran for about two hours and then starting locking up.

It appears that IO to/from the second card freezes, though tw_cli shows it's up and function properly. There were no error messages of any kind during these lockups.

This morning we swapped for a spare LSI card incase the new card was the source of the problem. It may have been. While we still get a lockup, we now get the above error message that we didn't before. Any ideas on what's going on?

Thanks!
Gary

Terry_Kennedy · Oct 31, 2011

gaulfinger said:
Code:

Solaris: WARNING: ZFS replay transaction error 2, dataset zfs002/blm15, seq 0x5130, txtype 8

This morning we swapped for a spare LSI card incase the new card was the source of the problem. It may have been. While we still get a lockup, we now get the above error message that we didn't before. Any ideas on what's going on?

The "error #" part reports a Unix errno of 2, ENOENT - No such file or directory. The rest of the message indicates where in the pool it thinks the error is.

The message comes from src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c.

Before proceeding further, do you have a tested backup of the data somewhere else, or some other way to recreate the data if you lose the pool?

gaulfinger · Oct 31, 2011

Terry,

Thanks for the info. There is recourse for data. Much of it is backup snapshots itself, but a lot of history.

My main concern is that there isn't a ZFS-level problem. From your post, it sounds like a Unix-level response passing through from an underlying problem. If that's the case, then we should be okay if we can get lower-level (SAS Controller, BIOS, etc.) running smoothly. We have a ticket open with LSI that may shed light on the lower levels.

Terry_Kennedy · Oct 31, 2011

gaulfinger said:
There is recourse for data. Much of it is backup snapshots itself, but a lot of history.

Ok, see below for some ideas.

My main concern is that there isn't a ZFS-level problem. From your post, it sounds like a Unix-level response passing through from an underlying problem. If that's the case, then we should be okay if we can get lower-level (SAS Controller, BIOS, etc.) running smoothly. We have a ticket open with LSI that may shed light on the lower levels.

The first thing to do is to make sure there isn't a 3Ware firmware or driver-level issue that's mangling the data. Looks like the latest from 3Ware is from September 14th of this year (link), so it is unlikely your drive shipped with the latest firmware.

It is unlikely that there is a physical error, as I'd expect that to produce an EIO error (5), and you didn't report any console messages about command timeouts, disk errors, or so on.

The next thing I'd suggest would be a # zpool scrub. Hopefully, that will complete and identify the file(s) which have corrupted data. You can almost certainly accomplish that using the FreeBSD install you're currently running.

If that doesn't clear the error, you'll need to move on to more complex things with a corresponding risk to data (or at least needing to upgrade your operating system - read on).

You can download one of the mfsBSD ISO images and boot from it, either the 8.2 + ZFS v28 one or the 9.0 Release Candidate one. Repeat the scrub but DO NOT use the zfs / zpool "upgrade" commands - those would make your pool un-importable on 8.2-RELEASE. Scrubbing using newer code (with many ZFS fixes) may correct the error. You will probably need to # zpool import the pool for the CD to operate on it.

The last thing to try would be to use the zfs / zpool upgrade commands (both) to upgrade the pool from version 4/15 to version 5/28. That will let you detach the ZIL device from the pool. [Your error is coming from an attempt by the ZIL to apply some pending changes.] At that point you can try the scrub again. Note that any pending changes in the ZIL will be discarded. If you upgrade the pool, you will no longer be able to mount it in 8.2-RELEASE since that version doesn't know about the newer ZFS components. You can upgrade to 8-STABLE, which does have the new ZFS code.

Note that there is a chance of getting a FreeBSD panic when scrubbing or otherwise operating on broken pools. If you have the ability to log the console (serial port, remote console, etc.) it would be handy to have that going, since otherwise the underlying fault may scroll off the screen.

You might also want to wait for a ZFS developer to reply before trying any of my suggestions - I'm just another user (with some large - 32TB - pools) with a bit of experience with ZFS corruption.

gaulfinger · Oct 31, 2011

Firmware was recent at 5.12.00.007 on the first controller and same on the card added 10/30. After the problems started with the second controller, it was upgraded to latest, 5.12.00.013. If anything, the hangs happened more rapidly afterward.

When we suspected that the 2nd controller could be faulty, it's replacement was back on 5.12.00.007, again matching the functioning controller. So right now both controllers are on 5.12.00.007. One thing we haven't tried is upgrading both to *13... Might be worth trying.

The zpool scrub is one of the "heavy loads" that locks it up, typically with <1% scanned. No errors reported from any scrub attempt. Best scan ran about 13% with no errors before it locked. Progress is different every time, which makes me think it's not tripping on any single drive failure, but something more "random." Scrubs were working well, to about 900 MBps - 6,500 IOps.

Thanks for tip on mfsBSD. I'll look into that. I was planning to migrate to a 9.0-RELEASE after it's generally available. I wouldn't want to upgrade so soon because of this

But if we do have undetected errors, that could be the way to find them.

Thank you again!
Gary

Terry_Kennedy · Oct 31, 2011

gaulfinger said:
The `zpool scrub` is one of the "heavy loads" that locks it up, typically with <1% scanned. No errors reported from any scrub attempt. Best scan ran about 13% with no errors before it locked.

When you say it "locks up", can you provide more details? In particular, on the console, can you switch virtual consoles with Alt-Fn? If you type a character, does it echo? Does the scrub process respond to Control-T?

Thanks for tip on mfsBSD. I'll look into that. I was planning to migrate to a 9.0-RELEASE after it's generally available. I wouldn't want to upgrade so soon because of this But if we do have undetected errors, that could be the way to find them.

9.0-RELEASE should have about the same ZFS code as 8-STABLE, so you could upgrade to that and then go to 9 once there's some field experience with it.

gaulfinger · Nov 1, 2011

The OS runs; console is fine. I can ssh in and I can use my IPMI web/java console. Just storage on that second controller is locked, which in turn locks all ZFS IO. But it's only disk-related commands that are hanging. For example, zpool scrub hangs, as does anything ls -lR or cat FILE... after that moment it locks. So far a simple ls has not hung - probably cached meta data supports that without additional reads. Hung processes pile up and can only be cleared by a reboot.

One test I ran to explore the extent of storage responsiveness was fdisk /dev/daX for each drive number. I could get partition info for every device before a lockup. Then after we observe IO stopping (zpool iostat goes to zero on all measures), any drive on controller 2 hangs the fdisk command, but controller 1 responds as expected.

I also did the same with a dd if=/dev/daX of=tmp count=1 when I was checking drives for valid headers. There were appropriate labels on all drives--except those controller 2 drives would hang the dd after a while.

gaulfinger · Nov 1, 2011

Both controllers upgraded to latest release of firmware. Still having the problem. We're building a boot USB to run offline diagnostics.

gaulfinger · Nov 2, 2011

Well, things got worse. While mid-boot into diagnostics, the system would lock up with a "tws_map_request" and go no further.

LSI was still working the problem, probably contention between the two cards. During our booting and testing, we actually saw the problem transfer from the 2nd card to the first!

We cut the diagnostics short and solved the problem by migrating drives from a pair of 9750 controllers to a single 9211-8i HBA. The new card with its new mpslsi.ko driver rocks! Just a simple zpool import and everything is running smoothly. We're running a regular 8.2 scrub now.

Terry_Kennedy · Nov 2, 2011

gaulfinger said:
LSI was still working the problem, probably contention between the two cards. During our booting and testing, we actually saw the problem transfer from the 2nd card to the first!

Support for the 3Ware cards is very good - I found a bug on the 9650 (twa) driver which only showed up under ZFS stress testing, and they had a patch for me within a week, and an updated was comitted to CVS within a month.