Solved Nuked disk with "gpart bootcode"

Howdy - yes, there should be backups, and my excuse is it's a home server and I very often start some project on it and then get called off to finish some actual paying project. So somewhere between "let's move everything that's not media or backups of other hosts to this new SSD boot drive that's just standard MBR and UFS2, including this billing stuff for a side project - after all this billing jail w/pgsql will be much faster on the SSD" and "what good tools are there to help me manage zfs snapshots" I obviously got distracted and never setup a cron job to dump the actual boot drive contents into the zfs pool where it would get backed-up and rotated off as needed.

So - power outage today and the UPS shut the host down. When I powered it back up, a message that my zfs pool was version "5000" but my loader was "28". Right there I should have stopped. I boot off UFS2 on this little SSD drive. ZFS is not in the picture at boot-time. This HP server can be weird about finding the USB-connected boot drive, and another SSD drive that did have ZFS and was just there for testing was on another USB port.

But I don't stop and think, I google around for how to update boot blocks, not even remembering the fact that a) there should not be ZFS bootblocks, this is not a root-on-ZFS anymore b) I should really run gpart show da0 first to see what the hell I'm writing to c) to boot off the USB port on this SATA<>USB bridge, GPT is not an option, so why am I even screwing with gpart?

So I ran something like gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0 and rebooted with the extra drive unplugged. And rebooted again, and then realized the box is not booting off this disk. I get on the remote console and feed it an 11.1 ISO and drop to the shell. Oh. I slowly realize what has happened. Where previously I (think) I had at least a /, /var and /usr/local partition, I now have gpart show da0 listing but two partitions - one is "freebsd-ufs", one is "swap". Also after importing the zfs pools, it hits me that I have zero backups, and stuff I was thinking was on the zfs pools is not, it's on the boot drive.

So, currently waiting on another run of scan_ffs on the drive I want to recover after doing a clean install onto a larger drive. And then going to look at how I can image the drive off to this new drive to have a backup. While I do that, any ideas on recovery? I mean, I know everything I need is on there, it's just a totally trashed partition table. But this is not my core skillset. Maybe in the early 2000's or so I'd be keen on the math and all, but these days - spread way too thin. Open to any ideas...
 
In addition to what SirDice said: what kind of ZFS pool are you working with? If this uses some kind of RAID setup then there is a good chance that you might be able to pick up the pool from another disk. Of course this heavily depends on the setup, and you haven't really given us much to go on there.
 
Please post the output so we also know what we're dealing with.

Sure:

Code:
[spork@media ~]$ gpart list da1
Geom name: da1
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 234441647
first: 0
entries: 8
scheme: BSD
Providers:
1. Name: da1a
   Mediasize: 115964116992 (108G)
   Sectorsize: 512
   Mode: r0w0e0
   rawtype: 7
   length: 115964116992
   offset: 0
   type: freebsd-ufs
   index: 1
   end: 226492415
   start: 0
2. Name: da1b
   Mediasize: 4070006272 (3.8G)
   Sectorsize: 512
   Mode: r0w0e0
   rawtype: 1
   length: 4070006272
   offset: 115964116992
   type: freebsd-swap
   index: 2
   end: 234441646
   start: 226492416
Consumers:
1. Name: da1
   Mediasize: 120034123776 (112G)
   Sectorsize: 512
   Mode: r0w0e0

Also scan_ffs has not found anything. I've also dd'd the entire disk to a file, so I have that available... Looking at "testdisk" later. Also going to dig in that dd image and see if I can find the fstab so I can get a real partition count.

If you look closely, you can see something is up - note the "Entries: 8" line...

Also this is NOT a zfs disk - the zfs pool is all fine. My stupidity is not moving/backing up important things FROM this disk TO the zfs pool...
 
Now I get it, you've overwritten your UFS filesystem with /boot/gptzfsboot. I'm not familiar with scan_ffs but an UFS filesystem usually keeps several backups of the main superblock (which determines the location of all the files on the filesystem) which should be traceable using fsck_ufs(8) (or fsck_ffs), have you tried that one yet?
 
Exactly! Extreme foot-shooting.

scan_ffs is just a utility ported from OpenBSD that tries to find lost partitions. testdisk is similar, but more complex.

fsck gets lost, as it's not even able to see the partitions I believe. One of my problems is I pretty much forgot all the fdisk/disklabel stuff once FreeBSD moved to gpart/GPT.

I have testdisk running right now doing a "deep scan" and it looks like it might be finding stuff. This is within the first "partition" which I believe used to be a slice in the old terminology:

Code:
TestDisk 7.1-WIP, Data Recovery Utility, March 2018
Christophe GRENIER <grenier@cgsecurity.org>
https://www.cgsecurity.org

Disk /dev/da1a - 115 GB / 108 GiB - CHS 14098 255 63
Analyse cylinder  10617/14097: 75%


  No partition          1676  34 23 15774 162  4  226492416
  No partition          4230  60 55 18328 188 36  226492416
  No partition          6784  87 24 20882 215  5  226492416
  No partition          9338 113 56 23436 241 37  226492416

I suspect if I can find offsets, then I can dd out chunks of the drive that represent each partition, mount them with md, fsck them, and then dump them back to the original drive. But testdisk might just be teasing me, not sure. I wish those columns had labels...
 
fsck gets lost, as it's not even able to see the partitions I believe. One of my problems is I pretty much forgot all the fdisk/disklabel stuff once FreeBSD moved to gpart/GPT.
My bad. Sorry, totally overlooked that part, ignore what I wrote above (I'm having problems with the heat here which doesn't help my concentration).

Still, not all hope is lost. A bootsector / partition table is about 512kb. /boot/gptzfsboot is 87kb. Still a problem, still trashed your disk, but most of the data is still present. It's basically only your partition table.

As bizarre as this may sound: you could try to use gpart to set up the exact same partition scheme as you had before. Be precise. The fun part is that gpart will only edit your partition table but will not mess with the rest of your system. Thus allowing you to fix your partition table manually, provided that you use the exact same data as before.

If you can do that then you should be able to access your UFS partition without any issues.

Of course the trick is to provide the correct data. If you do that then I also urge you to try and mount your setup readonly ( # mount -o ro /dev/ .... to prevent any further problems. In theory this should work.
 
Wow, this is crazy. So in 2016, I somehow managed to setup a "dangerously dedicated" disk, which is something I remember from the '90's. I guessed at that by searching through the raw disk image for something resembling my /etc/fstab, and in there I found this:

Code:
Device  Mountpoint      FStype  Options Dump    Pass#
/dev/da0a       /               ufs     rw      1       1
/dev/da0b       none            swap    sw      0       0

I went with your general idea of repartitioning the drive to then attempt an fsck with backup superblocks while I was waiting for the fstab search to finish. Instead of working on the actual device, I worked on a copy of the image that I dd'd off the disk. I used mdconfig to pretend it was a disk:

Code:
[root@media //mnt/External/backups]# mdconfig /tmp/ssdboot-backup-working.img
md0
[root@media //mnt/External/backups]# gpart show md0

=>        0  234441648  da1  BSD  (112G)
          0  226492416    1  freebsd-ufs  (108G)
  226492416    7949231    2  freebsd-swap  (3.8G)
  234441647          1       - free -  (512B)

What I thought was the most likely setup (/dev/da0s1, /dev/da0s1a, /dev/da0s1b) was just not working. dumpfs found some info, but even the superblock that provided did not do anything.

After I found that fstab (and after some time convincing myself it was somehow possible to do that in 2016), I did the following to recreate the layout:

Code:
[root@media //mnt/External/backups]# gpart create -s bsd md0
md0 created
[root@media //mnt/External/backups]# gpart show md0
=>        0  234441648  md0  BSD  (112G)
          0  234441648       - free -  (112G)
[root@media //mnt/External/backups]# gpart add -t freebsd-ufs md0
md0a added
[root@media //mnt/External/backups]# gpart show md0
=>        0  234441648  md0  BSD  (112G)
          0  234441648    1  freebsd-ufs  (112G)

So fsck_ffs failed on /dev/md0a (no superblock found). dumpfs had no info for me on /dev/md0a (which I found surprising), so I ran newfs -N /dev/md0a which suggested the first backup superblock would be at 192. That was the ticket:

Code:
[root@media //mnt/External/backups]# fsck_ffs -b 192 /dev/md0a
Alternate super block location: 192
** /dev/md0a
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=727443
SALVAGE? [yn] y

PARTIALLY ALLOCATED INODE I=12451892
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? [yn] y

PARTIALLY ALLOCATED INODE I=12451893
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? [yn] y

PARTIALLY ALLOCATED INODE I=12451894
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? [yn] y

PARTIALLY ALLOCATED INODE I=12451895
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? [yn] y

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
SUMMARY BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] y

1661624 files, 19154645 used, 8266218 free (204754 frags, 1007683 blocks, 0.7% fragmentation)

UPDATE STANDARD SUPERBLOCK? [yn] y


***** FILE SYSTEM IS CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****

Surprisingly few fixes in that run. The disk image is mounted and things seem to be there. I'm really torn between just doing a dump/ restore back to the drive, doing the same fixup steps to the drive, or doing a reinstall and selectively copying things back...
 
Just to wrap this up, I ended up nuking the disk, doing a fresh install of 11.1, and then just selectively copying things back over. So much cruft removed! And there is now a daily job that backs the data from this disk up to the backup pool. Lesson learned. :)
 
...and the boot disk failed yesterday. Scrounged up a temporary disk and restored from my happy backups.
 
Back
Top