ZFS FreeBSD won't reboot after a huge copy

cmoivoila · Nov 16, 2014

Hi,

First, I don't know if I post in the right place, so please excuse - me both for the possible mistake and my average English.

I've been using FreeBSD 10-RELEASE for a while now and I'm used to automatic Root-On-ZFS setups. I recently ran into a really big issue: FreeBSD fails to boot after copying a huge (~1.6 TB) set of data to a ZFS dataset on the root pool. I only have a blinking cursor on the screen so I suppose it's either storage or bootloader related.

I give you the full story: I ordered a Dell R220 server, with 8 GB of ECC memory and two 3 TB drives. The idea was to use it as a replacement for an aging NAS showing strange behavior. The FreeBSD setup went fine, so did the few changes to do in order to get a NFS server running. In order to separate from the system and make the /etc/exports file easy, I created a new dataset on the pool that I called "medias".

Then, since the data is available over the network, I mounted the old smbfs NAS and started the huge copy from a freshly installed tmux terminal. I did no more than:
mount -t smbfs //user@address/of/the/share /mnt/share
cp -a /mnt/share/* /medias/

The copy went fine until after ~20 hours, when I had to put the new server in its final location so I halted it, and at the next boot after installation, FreeBSD won't boot, only showing a | and a blinking cursor. I tried both FreeBSD 10 and FreeBSD 9.3-RELEASE, tried to change the dataset name or compression options. Nothing changed.

Is there something I'm missing? Has anyone ever had that kind of issues? I'm a bit worried to be honest.

ralphbsz · Nov 17, 2014

Superficially, this makes no sense. Copying data in and out of a file system, in particular from a user (not root) account, should not be able to damage the system in any fashion. Even if you perform this copy as root, and run the root file system out of space, the machine should be able to reboot (although there will probably be quite a few error messages when booting). Sadly, we are missing a lot of details.

Did you ever successfully reboot after the first install? What happens if you try to install with the system removed from the network? Do you even get the boot menu? Have you tried disconnecting your disks, and booting from install media, to verify that the hardware is working? If that works, what happens if you boot from install media or a rescue disk but with your regular disks attached? Can you see the content of the disk (in the sense of verifying their partition tables, file system root blocks, and perhaps zpools)? If you boot into the BIOS's setup screen, can you see all the disks you expect?

My first hunch would be: hardware problem. It seems that the boot loader can't even get to your disks any more, and attempts to access them hang.

cmoivoila · Nov 17, 2014

Well, I agree: if I happened to read this, I would suspect a hardware problem. However, this happened at work and I have a lot of hardware to do tests. The first thing I tried was changing the hardware: we ordered more than one R220 since we are going to replace some ageing servers with these. Unfortunately, it happened again on another Dell Poweredge R220.

The first reboot after installation was fine. The system was installed from a freshly downloaded USB stick image of FreeBSD dumped to a USB key. Having an mfsBSD image available for network booting, I managed to boot the system with mfsBSD and import the ZFS pool without problems. That's when I really started to worry. zpool status reported that everything was fine.

What shocked me even more was that on the third try, I tried to dump random data (coming from /dev/random) on the ZFS pool and succeeded to reboot after ~100 GB of data copied. I then started the huge copy again only to find the next day that the server was unable to reboot.

If you need any further information, I can make a fourth try and report back anything helpful.

ralphbsz · Nov 18, 2014

This is really wild. To begin with, you are using good-quality stable hardware. And a file system and OS that are "known to be not totally broken". Since you rebooted at least once, your install was successful (I've seen systems get installed, and most of the data was still in memory cache, and people ran then for a week, and they were never be able to reboot).

Is the following assumption correct: the two drives boring, normal internal drives, hooked up via SATA, directly to the motherboard? Nothing exotic, like the two 3TB drives are an iSCSI logic unit, being served via TCP/IP being transported over Infiniband, and served from a NetApp server, or Shark arrays connected via an old Brocade you found in the trash bin? You are not using strange and unusual disk adapters, or RAID controllers?

My only idea: After it crashes, try to boot from recovery media. Then use that to examine the status of the two big drives. Are their partition tables OK (check with gpart)? What happens if you examine the two drives with zpool?

I think wasting another day on a fourth try is probably not very useful; probably more to be learned from doing an autopsy on the dead body first. But if you want to try something different: Why don't you install WITHOUT using ZFS-as-root? Make a small partition on one of the drives, 64GB or so, and do a tradtional UFS-based install on that. Once that works, add the remaining disk space on the two drives as a big ZFS file system. Then repeat your test. That might distinguish whether the problem is the hardware (if you just run the system long enough, disk writes go into the wrong place), versus software (ZFS self-destructs after while).

cmoivoila · Nov 18, 2014

Well the first time it happened I thought it was an installation or a copy problem, for all the next tries I carefuly watched the reboot sequences.
The two drives are consumer-grade 3TB SATA drives (Western Digital RED 3TB) directly connected to the motherboard and installed in the unit.
I'm going to dig into what happened today. I will post what I find later in the day. Initially I was thinking about building this setup with nanoBSD if a root on zfs was impossible but UFS partitions sounds like a faster alternative.

cmoivoila · Nov 18, 2014

Okay, I booted the dead system through mfsBSD again. First I looked to a gpart list to me it looked perfectly fine but since my self confidence took a shot. Here are the results:

gpart list ada0

Code:

root@mfsbsd:~ # gpart list ada0
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 5860533134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada0p1
  Mediasize: 524288 (512K)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0
  rawuuid: 0bd69a9b-6a4a-11e4-bf56-549f3505fe00
  rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
  label: gptboot0
  length: 524288
  offset: 20480
  type: freebsd-boot
  index: 1
  end: 1063
  start: 40
2. Name: ada0p2
  Mediasize: 2147483648 (2.0G)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0
  rawuuid: 0bf31e8f-6a4a-11e4-bf56-549f3505fe00
  rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
  label: swap0
  length: 2147483648
  offset: 544768
  type: freebsd-swap
  index: 2
  end: 4195367
  start: 1064
3. Name: ada0p3
  Mediasize: 2998444933120 (2.7T)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0
  rawuuid: 0c09bb77-6a4a-11e4-bf56-549f3505fe00
  rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
  label: zfs0
  length: 2998444933120
  offset: 2148028416
  type: freebsd-zfs
  index: 3
  end: 5860533127
  start: 4195368
Consumers:
1. Name: ada0
  Mediasize: 3000592982016 (2.7T)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0

gpart list ada1

Code:

root@mfsbsd:~ # gpart list ada1
Geom name: ada1
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 5860533134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada1p1
  Mediasize: 524288 (512K)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0
  rawuuid: 0c63db51-6a4a-11e4-bf56-549f3505fe00
  rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
  label: gptboot1
  length: 524288
  offset: 20480
  type: freebsd-boot
  index: 1
  end: 1063
  start: 40
2. Name: ada1p2
  Mediasize: 2147483648 (2.0G)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0
  rawuuid: 0c7f3495-6a4a-11e4-bf56-549f3505fe00
  rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
  label: swap1
  length: 2147483648
  offset: 544768
  type: freebsd-swap
  index: 2
  end: 4195367
  start: 1064
3. Name: ada1p3
  Mediasize: 2998444933120 (2.7T)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0
  rawuuid: 0c938aec-6a4a-11e4-bf56-549f3505fe00
  rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
  label: zfs1
  length: 2998444933120
  offset: 2148028416
  type: freebsd-zfs
  index: 3
  end: 5860533127
  start: 4195368
Consumers:
1. Name: ada1
  Mediasize: 3000592982016 (2.7T)
  Sectorsize: 512
  Stripesize: 4096
  Stripeoffset: 0
  Mode: r0w0e0

Then, I listed available pools using zpool import and I found my pool in a decent state:

Code:

root@mfsbsd:~ # zpool import
  pool: gaiafs
  id: 16740313239116738871
  state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

  gaiafs  ONLINE
  mirror-0  ONLINE
  gptid/0c09bb77-6a4a-11e4-bf56-549f3505fe00  ONLINE
  gptid/0c938aec-6a4a-11e4-bf56-549f3505fe00  ONLINE

So I carefully tried to mount it in an alternative location since it's a root-on-ZFS configuration by using zpool import -o altroot=/mnt/ gaiafs and the system mounted it successfully.

zpool status

Code:

root@mfsbsd:~ # zpool status
  pool: gaiafs
state: ONLINE
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  gaiafs  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  gptid/0c09bb77-6a4a-11e4-bf56-549f3505fe00  ONLINE  0  0  0
  gptid/0c938aec-6a4a-11e4-bf56-549f3505fe00  ONLINE  0  0  0

errors: No known data errors

I started a scrub, maybe something will show up? By the way, my apologies if this post is too long.

ralphbsz · Nov 18, 2014

I didn't check all the numbers in your GPT listing for accuracy, but as you say, it looks completely reasonable. And the fact that zpool status looks good means that at least the volume-level ZFS metadata is OK. A zfs scrub will show us whether the media itself is damaged (seems highly unlikely), or whether the ZFS file system metadata is damaged (possible but not likely). A less insane suspicion is that ZFS managed to damage the file system content, for example something in /boot or in the root file system, badly enough so reboot is impossible. Perhaps an errant write overwrote the boot loader? Unfortunately, I have no good idea for how to check that the boot loader on disk, or the content of the root file system are correct.

cmoivoila · Nov 18, 2014

I will have the zpool scrub results tomorrow since there is a huge amount of data to check: ~1.7TB. To me, what happened sounds like crazy. I've been trusting FreeBSD + ZFS setups for some years by now and I even run a similar server at home. Never ran into any problem.

In case we don't find anything, the setup will be rebuilt with an UFS system and ZFS data stored on disk partitions.

cmoivoila · Nov 19, 2014

Well, I just checked for zpool scrub results: No known data error / 0 errors repaired. It was a bit expected but ZFS seems not to have had any problem writting the data. Yet, the system is damaged and won't boot. I was just wondering: the FreeBSD bootloader is not physically written to the ZFS pool, right?

cmoivoila · Nov 24, 2014

Hi, back for more news. I finally finished to bring up a basic UFS setup by setting up a 32 GB partition on the first drive. (Did the same on the second to be able to do an UFS raid in case this would work). The ZFS pool was then built on the remaining space on both drives (~2.7 TB) and I started that huge file copy again by using the same cp -a as before. This finished today in the morning and the server was able to reboot and still works fine. I'm currently upgrading it to FreeBSD 10.1 and I think this will do the job until I find a way to get back to a root-on-ZFS setup. The issue is then either related to the bootloader or to ZFS that will break something when the array fills up with data.

Should I get in contact with someone involved in bug tracking or FreeBSD development?

wblock@ · Nov 24, 2014

If you can repeat the problem, particularly with 10.1, yes.