UFS secondary GPT header is not in the last LBA - mount error 5

Hi all,

Weird situation at work today. One of the servers I couldn't get in anymore..my ssh key got refused. Logged on via the console (vnc), as root, and found out my authorized_keys file was full of garbage. Not even trying to figure out why this happened, I tried to edit the file and the machine rebooted.

Since then, at boot it gets stuck on the mountroot prompt.

Screen Shot 2017-03-31 at 15.48.38.png


As you can see, it's complaining about the secondary GPT header not being in the last LBA. In all honesty, I'm not sure what that means at all. The disk is partitioned like this (output from another identically setup server):


gpart show
=> 3 1073742069 vtbd0 GPT (512G)
3 125 1 freebsd-boot (63K)
128 2097152 2 freebsd-swap (1.0G)
2097280 1071644792 3 freebsd-ufs (511G)


Being completely new at the mountroot prompt, I tried the question mark first which gave me this output:

Screen Shot 2017-03-31 at 15.42.27.png


I don't know what those error 19 and 22 mean. My google-fu disappointed me with people talking about it but not explaining what it really is.

Then I tried to mount /dev/vtbd0p3 explicitly, which gave me I guess the most verbose message:
Screen Shot 2017-03-31 at 15.45.04.png


So here I am..knowing there's something wrong with my filesystem, but no clue what I can do. I tried to boot as single-user but it also needs to mount the system so no much progress/luck there.

The server is a Virtual Machine running in an Openstack environment. Has worked fine for several months, until today this thing happening.

Really appreciate any pointers to:
- what do each of these errors allude to
- what can I do if anything, to recover from this
- anything really making me understand what happened so I can avoid this issue, and not face it on the other machines that are similarly provisioned.. (bit nervous now as I'm the only guy pushing for BSD in the company for consistency and reliability... *heh* :) )

Appreciate all your inputs, ready to learn more from you guys!
 
You get similar errors and issues when a physical disk fails. So the obvious conclusion would be that the disk image itself is corrupted somehow. Maybe somebody tried to resize it? Or maybe it's been accidentally connected to another VM while the original was still actively using it. Or maybe the underlying storage has issues (think network issues on iSCSI for example).
 
As you can see, it's complaining about the secondary GPT header not being in the last LBA. In all honesty, I'm not sure what that means at all.
GPT keeps copies of its "stuff" at the beginning and end of the disk drive (first and last sector, give or take). This allows it to recover from problems where one copy or the other is damaged or erased. The older MBR scheme just used the first sector for boot and data at a fixed offset just past that for the partition table and was susceptible to having it erased. The classic problem is that something goes wrong and instead of writing data at the desired sector, it starts writing at the beginning of the disk. This was quite common in the days of the PC/XT and DOS, but somewhat less common now. A Google search for "clobbered boot sector" shows the extent of the problem with MBR.
I don't know what those error 19 and 22 mean. My google-fu disappointed me with people talking about it but not explaining what it really is.
The definitive (but terse) answers can be found in /usr/include/sys/errno.h:
Code:
#define ENODEV          19              /* Operation not supported by device */
#define EINVAL          22              /* Invalid argument */
As SirDice said, disks generally don't change size. Sorry I can't help with the "why", but I hope I explained the "what".
 
  • Thanks
Reactions: ced
In addition to the gpart / mount problems, you also have a real IO error: READ(offset=2..., length=16384) error 5, and 5 is EIO, meaning a hard read error.

Your underlying storage is toast; it might be completely burned to a crisp, or slightly too dark (a little scraping with a knife, put extra jam on it, and nobody will know there was a problem there).

In addition to the advice to try gdisk, here is another hint: dd the first 64 KB and the last 64KB of the afflicted disk into a file (in /tmp), and then do the same to a known good volume that is known to be working. Then compare the two (for me it's easiest to use hexdump -C to display them), while reading the documented format for partition tables (the Wikipedia entry for "GUID partition table" has a very clear layout). You will probably quickly find that one or both partition tables on the toasted drives are ... burned (that's a technical term for when you leave bread in the oven too long). Once you find out what happened to the raw disk, you can start attempting a root cause analysis of who might have done that.
 
  • Thanks
Reactions: ced
Thanks guys, appreciate the responses.
I spoke with the guy maintaining the Openstack environment and he saw that in the virsh interface that our VM's disk allocation changed. It used to be expanded on initial deployment but now returned back to it's small size. So yes..that kinda did go bad indeed.

Question out of curiousity: How can I do the gpart stuff when I get stuck at the mountroot prompt? I guess the answer is to use some kind of live-cd and operate from there, but I wouldn't be surprised there are better and easier ways in BSD I'm not aware of yet.. :)
 
Back
Top