Hard drive with GPT detected but not booting on Thinkpad.

Tracker · Mar 21, 2023

smithi said:
Why would it offer 'Windows Boot Manager' if the disk had no Windows?

I'm not sure actually - the BIOS gives that option in the booting priority section as well - but I'm not trying or intend to use Windows

T-Daemon said:
The system looks like a 12.0 version.

Code:

Mounting from zfs:backuppool/ROOT/12.0-p11

12.0 is End of Life since December 11, 2018 and unsupported.

The installation was a 12.0 but it was upgraded to a 12.3/4 iirc - maybe it's still booting from the original BE?

T-Daemon said:
Here in forums, normally, it's not accepted to give support on EoL versions. But on rare occasion we give, especially when it looks like a hardware issue.

Totally understand. I actually have another disk that's 13.1 and was built based upon this current disk - will eventually be using that (it has similar boot/partition) so these instructions will work for that too i think.

T-Daemon said:
The messages in the first image looks like a disk hardware problem.

When booting the installer media did you noticed similar error messages? If you don't know, plug it in, boot it up and look for it. Drop to "Shell", execute dmesg | less.

I did try to open the back cover in an attempt to replace the disk that I mentioned, but was unable to do so because the cover didn't seem to be coming off easily. Maybe I'll try to do it again now. You suspect it might have undone the connector?

Tried to boot from bootable USB - it works normally just fine.

smithi · Mar 21, 2023

Tracker said:
I'm not sure actually - the BIOS gives that option in the booting priority section as well - but I'm not trying or intend to use Windows

I don't get your reluctance to say which Thinkpad model you have? (posts #17, #23)

Without, I can't try to help.

No shame in old Thinkpads; I only retired my then 20 y.o. T23 2 years ago, powerboard failure after a great innings.

Tracker · Mar 21, 2023

smithi said:
I don't get your reluctance to say which Thinkpad model you have? (posts #17, #23)

Haha I just got a bit spooked with the guess of the laptop make with my logs last time - it's not shame just a bit of paranoia ?

But anyway now I opened the laptop cover as T-Daemon suggested to check and I believe the connector is fine - i probably am not great at dealing with hardware so might have caused the revolving pin to come out from one side. But the connection didn't seem to be lose (why else would it show the geli boot password input, right?). The error 5 of Solaris persists. And it gets stuck on mountroot prompt as described before. Not sure how to proceed

Tracker · Mar 22, 2023

smithi said:
Without, I can't try to help.

You have already helped - just a little more ?

Seems like something has corrupted the ZFS from the steps I took earlier? Doubt the model of the laptop has much to do with the issue at this stage.

Now I'm just worried how to access the data again

Tracker · Mar 22, 2023

Seems like ALL of the data is corrupted now? ?

This is what I get trying to import the pool from a bootable USB

T-Daemon · Mar 22, 2023

Tracker said:
Seems like something has corrupted the ZFS from the steps I took earlier?

That's not possible.

1. As you stated, the system booted properly after the partition table modification procedure as it should, once. If it were damaged or the pool data corrupted, it wouldn't have booted at all.

2. Another indication that the partition table is intact is, you were able to geli attach the provider from the FreeBSD installer USB. Second, zpool import -f shows a pool to import. None of the two actions were possible if the partition table were damaged.

As I said in post # 25, this is a disk hardware problem.

The error messages in the first image in your post # 24 show typical hard drive access error messages.

Also the zpool import -f <pool id> command attempt from the FreeBSD installer USB in the image in your post # 30 prints I/O (input/output) error.

Those are strong indications that the hard drive itself has issues, not the software.

To make sure, I would run sysutils/smartmontools on the disk to check the hard disks health.

It can be installed on the FreeBSD USB installer. Plugin the USB stick on a machine with a internet access supported environment (machines network card supported by FreeBSD, a DHCP server on the LAN).

Procedure: Boot into single-user mode, mount the system read/write, edit /etc/fstab, change the ro (read-only) option into rw, exit single-user mode, run dhclient <network interface, i.e em0>, pkg install smartmontools.

Afterwards, edit /etc/fstab, change rw to ro, power off the system, boot on the affected machine.

Tracker · Mar 22, 2023

Hey thanks a lot for getting back! Almost thought I'd be left stranded here ?

T-Daemon said:
That's not possible.

Hmm.

T-Daemon said:
1. As you stated, the system booted properly after the partition table modification procedure as it should, once. If it were damaged or the pool data corrupted, it wouldn't have booted at all.

Yes it did, once, as I reported earlier.

T-Daemon said:
2. Another indication that the partition table is intact is, you were able to geli attach the provider from the FreeBSD installer USB. Second, zpool import -f shows a pool to import. None of the two actions were possible if the partition table were damaged.

Do you suspect not being able to complete the swap partition step from your instructions might have played some kind of role?

T-Daemon said:
As I said in post # 25, this is a disk hardware problem.

T-Daemon said:
As I said in post # 25, this is a disk hardware problem.

The error messages in the first image in your post # 24 show typical hard drive access error messages.

UPDATE: I took out the hard disk (since you suspected/implied it could be a connector issue). Connected the same hard drive via a USB connector - same result! It gives the same Solaris error 5

T-Daemon said:
To make sure, I would run sysutils/smartmontools on the disk to check the hard disks health.

It can be installed on the FreeBSD USB installer. Plugin the USB stick on a machine with a internet access supported environment (machines network card supported by FreeBSD, a DHCP server on the LAN).

I'll try to do this as well - just going to be a pain to connect internet to the installer USB.

Do you have any other suggestions to maybe double check things? This has been quite an exasperating experience, despite the kind help, to just get this to run.

ALSO -THINGS I TRIED: I tried to follow this post a bit with a similar error 5 of opensolaris https://forums.freebsd.org/threads/...ting-after-upgrading-from-10-1-to-10-2.55338/ .... I did the following in bootloader console - unload, load /boot/kernel/kernel, load /boot/kernel/opensolaris which lead to the label not being recognized issue as in this post https://forums.freebsd.org/threads/error-in-boot-loader-conf-gives-me-a-mountroot-prompt.74583/ .... But that seemed like a dead end, to me - given my lack of experience, because the person had changed Boot loader.conf file - in my case I haven't ....... Do you have any opinions about if I may have missed something obvious or done something incorrectly that may lead to a solution?

T-Daemon · Mar 22, 2023

Tracker said:
Hey thanks a lot for getting back! Almost thought I'd be left stranded here

Don't worry, I'm not used to leaving questions from users I've helped unanswered or with their problems alone, unless I am prevented. It can happen that I overlook a post, but that happens rarely.

Tracker said:
Do you suspect not being able to complete the swap partition step from your instructions might have played some kind of role?

I'm sure it didn't play a role at all. I'm almost certain the problem is a hardware issue, not a software.

Tracker said:
UPDATE: I took out the hard disk (since you suspected/implied it could be a connector issue). Connected the same hard drive via a USB connector - same result! It gives the same Solaris error 5

Have you tried to connect the disk to another machine, to the other laptop perhaps?

Besides "Solaris error 5" were there error messages like in the first image of your post # 24 (CAM Status, SCSI:Status error)?

But any attempts to somehow import the pool might be useless. It is very likely that the drive is about to die and the ZFS pool is already affected.

Running sysutils/smartmontools would bring us certainty.

Tracker · Mar 22, 2023

T-Daemon said:
To make sure, I would run sysutils/smartmontools on the disk to check the hard disks health.

T-Daemon said:
It can be installed on the FreeBSD USB installer.

Had a Ubuntu stick - hope the smartctl test would suffice? Was just a bit difficult to get the internet wifi running.

The short test didn't seem to indicate any errors - please check image below.

Have put it on the long test as well - but that seems likely to take a couple of hours.

T-Daemon said:
Don't worry, I'm not used to leaving questions from users I've helped unanswered or with there problems alone, unless I am prevented. It can happen that I overlook a post, but that happens rarely.

Thank you so much

I was seriously considering not using FreeBSD based on this exasperating experience, but people like you keep the faith alive!

T-Daemon said:
I'm sure it didn't play a role at all. I'm almost certain the problem is a hardware issue, not a software

T-Daemon said:
Have you tried to connect the disk to another machine, to the other laptop perhaps?

So I tried to boot from the hard drive on the original laptop - and the result was the same, unfortunately. (There was also this line about loading opensolaris, after kernel, after booting which I somehow don't particularly recall seeing earlier, if that's of any help - there's also nothing of that sort in my loader.conf)

At this point I'm tempted to try undoing the changes of EFI and reverting back to original - would that be something that I should explore?

T-Daemon said:
Running sysutils/smartmontools would bring us certainty.

Does the image from the test help?

astyle · Mar 22, 2023

I realize that learning to properly troubleshoot issues is part of the experience of using FreeBSD... And done right, can be a productive lesson. But at some point, I think we gotta be able to say, "You know what, I'm gonna wipe the disk clean and start all over again". There's no shame in that - I've had plenty of times when I just restarted a FreeBSD installation all over again, and wiped my own SSD's clean. (Well, in my case, that's what drove me to ZFS, I can do post-install adjustments as needed, which is something I can't do with UFS

).

I haven't played with GELI myself, but following this conversation from the sidelines, I'm thinking to myself, "Yeah, fun thing to try, but I personally don't yet have a real need for GELI just yet. If I realize I do need it, it looks like I'm gonna be at least somewhat informed about it." ?

Tracker · Mar 22, 2023

astyle said:
But at some point, I think we gotta be able to say, "You know what, I'm gonna wipe the disk clean and start all over again". There's no shame in that - I've had plenty of times when I just restarted a FreeBSD installation all over again, and wiped my own SSD's clean

Yes I would have given up had I not had data to care about. Plus hopefully if this is solved someone else might have an easier time.

astyle said:
I haven't played with GELI myself, but following this conversation from the sidelines

GELI isn't the issue here, I think - something else.

smithi · Mar 22, 2023

Tracker said:
You have already helped - just a little more ?

I didn't say I won't help without knowing what model Thinkpad you're talking about, but that I can't.

I have zero ZFS, nor UEFI, nor GPT experience beyond familiarity by reading.

All I have pertaining to your issue is general systems hardware and particularly thinkpad experience.

Tracker said:
Seems like something has corrupted the ZFS from the steps I took earlier? Doubt the model of the laptop has much to do with the issue at this stage.

Without knowing whether yours is 3, 13 or 23 years old, perhaps indicating why it would mention Windows in BIOS boot selection, I'm flying blind for no apparent reason other than "none of my business"?

Good luck, sincerely.

Tracker · Mar 23, 2023

smithi said:
I didn't say I won't help without knowing what model Thinkpad you're talking about, but that I can't.

I have zero ZFS, nor UEFI, nor GPT experience beyond familiarity by reading.

All I have pertaining to your issue is general systems hardware and particularly thinkpad experience.

I don't think the issue pertains particularly to the hardware or the model of the Thinkpad - seems like a booting/disk issue. You're pretty sharp even if you just have familiarity with reading

..... If you must know it's not a very old machine ( the Thinkpad)

smithi said:
Without knowing whether yours is 3, 13 or 23 years old, perhaps indicating why it would mention Windows in BIOS boot selection

If you insist - its a model released within the past 6 yrs. Not an ancient laptop. And I honestly have no idea what the windows in boot section is doing there - that's how it was when I received it.

T-Daemon - so I made a couple of observations
1) Sometimes the hard drive will go into GELI and sometimes it won't - I'm not sure how it's happening
2) If I leave it on the GELI prompt and don't do anything - it sometimes restarts on its own
3)) The bootable FreeBSD stick that I have (and used for this) has a 13.1 release while this particular hard drive has 12.x installed - does that make some kind of difference?
4) Just reading around a bit lead me to this post - Post in thread 'GPT to UEFI?' https://forums.freebsd.org/threads/gpt-to-uefi.78086/post-487275 - which seemed to suggest removing freebsd-boot in favor of EFI - which has a very small size compared to our EFI of 260M? When I do gpart show I'm shown freebsd-boot (512k), followed by free(492k) , followed by EFI (260m) - wondering why it's so big

Trying to recap, for my understanding of the issue: I have a GPT based hard drive which (assuming) worked fine on an older laptop but wouldn't boot on this Thinkpad machine. So we basically deleted swap and adjusted an EFI file onto it for UEFI to recognize it - but so far it hasn't worked. The fact that it's going to GELI prompt suggests it's reading from the loader.conf file on the hard drive but somehow ZFS won't boot and runs into error 5 of Solaris, alongside SCSI issue. Is my summary somewhat correct?

T-Daemon · Mar 23, 2023

Tracker said:
Had a Ubuntu stick - hope the smartctl test would suffice?

That will do just fine.

Tracker said:
The short test didn't seem to indicate any errors - please check image below.

Have put it on the long test as well - but that seems likely to take a couple of hours.

You should also run a long test.

Tracker said:
Thank you so much I was seriously considering not using FreeBSD based on this exasperating experience, but people like you keep the faith alive!

Thanks for the confidence, but the chances are not good to recover that pool.

To make it clear, the investigation of the disk health with smartmontools won't help us to recover the pool. It will only confirm (or disproof) if the disk is dying.

Tracker said:
So I tried to boot from the hard drive on the original laptop - and the result was the same, unfortunately. (There was also this line about loading opensolaris, after kernel, after booting which I somehow don't particularly recall seeing earlier, if that's of any help - there's also nothing of that sort in my loader.conf)

The opensolaris kernel modul is a dependency of the zfs module on 12.0.

Tracker said:
At this point I'm tempted to try undoing the changes of EFI and reverting back to original - would that be something that I should explore?

That wouldn't make any difference. The modifications made earlier (deleting swap and creating efi partitions) are not related with the current problem. It's just coincidence.

Tracker said:
Does the image from the test help?

Unfortunately no. A long test will give more date to evaluate the disk health.

Tracker said:
I would have given up had I not had data to care about.

If in the pool is data you care about then no recovery attempts should be applied on the original disk and pool. There are some methods to try to import the pool but those are invasive.

If unsuccessful it can destroy the pool and data for good and should only applied on a cloned image of the freebsd-zfs partition, which requires a spare disk, at least the same size as the partition, to copy to.

If the data is important to you, then you should have made backups. Sorry to be blunt.

Tracker said:
GELI isn't the issue here, I think - something else.

GELI isn't involved in the issue. The stronges indication what's happening are the CAM status: SCSI status errors, indicating a disk hardware problem.

T-Daemon · Mar 23, 2023

Tracker said:
1) Sometimes the hard drive will go into GELI and sometimes it won't - I'm not sure how it's happening

It could be another indication for disk failure.

Tracker said:
2) If I leave it on the GELI prompt and don't do anything - it sometimes restarts on its own

I can't give an explanation, never observed or heard of it either. Disk failure?

Tracker said:
3)) The bootable FreeBSD stick that I have (and used for this) has a 13.1 release while this particular hard drive has 12.x installed - does that make some kind of difference?

Boot loaders are backward compatible. But it wouldn't hurt to try the 12.0 FreeBSD installation media.

Have you installed the loader.efi from the 13.1?

Tracker said:
4) Just reading around a bit lead me to this post - Post in thread 'GPT to UEFI?' https://forums.freebsd.org/threads/gpt-to-uefi.78086/post-487275 - which seemed to suggest removing freebsd-boot in favor of EFI - which has a very small size compared to our EFI of 260M? When I do gpart show I'm shown freebsd-boot (512k), followed by free(492k) , followed by EFI (260m) - wondering why it's so big

Standart ESP partition size. There should be some sources coming up when web searching for "standart "ESP" minimum size".

Tracker said:
Trying to recap, for my understanding of the issue: I have a GPT based hard drive which (assuming) worked fine on an older laptop but wouldn't boot on this Thinkpad machine. So we basically deleted swap and adjusted an EFI file onto it for UEFI to recognize it - but so far it hasn't worked.

Not true, it booted once properly.

Tracker said:
The fact that it's going to GELI prompt suggests it's reading from the loader.conf file on the hard drive but somehow ZFS won't boot and runs into error 5 of Solaris, alongside SCSI issue. Is my summary somewhat correct?

No, it's not reading from /boot/loader.conf on the ZFS, inside the GELI provider.

The GELI provider is locked until it's unlocked with the passphrase (or keyfile, or both). Otherwise it wouldn't deserve to be called a encrypted container when the data inside is somehow readable without unlocking it.

When the GELI provider is initialized all the information regarding the encrypted partition or disk are stored as metadata in the last sector of the encrypted partition/disk.

The FreeBSD loader reads that metadata in the last sector of the partition and presents a prompt, if the provider was initialized to do so. See geli(8) init subcommand and -b, -g options for details.

Tracker · Mar 23, 2023

T-Daemon said:
Thanks for the confidence, but the chances are not good to recover that pool.

To make it clear, the investigation of the disk health with smartmontools won't help us to recover the pool. It will only confirm (or disproof) if the disk is dying.

T-Daemon said:
The opensolaris kernel modul is a dependency of the zfs module on 12.0.

Ok. Understood.

T-Daemon said:
Unfortunately no. A long test will give more date to evalate the disk health.

I did try to run the longer test but it seemed to say that the test was being aborted by the user after 24 mins or so - tried a couple of times, will try again.

T-Daemon said:
If the data is important to you, then you should have made backups. Sorry to be blunt.

So I have another disk that's built off this disk and has more data on it - same partitioning. But now I'm mortally scared to try similar steps on it for risk of losing it all ! ( I don't have an extra disk atm so need to ensure that the current disk is dead/alive with testing as you suggested)

T-Daemon said:
GELI isn't involved in the issue. The stronges indication what's happening are the CAM status: SCSI status errors, indicating a disk hardware problem

I'm guessing those SCSI errors aren't borne out of software issues?

T-Daemon said:
When the GELI provider is initialized all the information regarding the encrypted partition or disk are stored as metadata in the last sector of the encrypted partition/disk.

The FreeBSD loader reads that metadata in the last sector of the partition and presents a prompt, if the provider was initialized to do so. See geli(8) init subcommand and -b, -g options for details.

T-Daemon said:
No, it's not reading from /boot/loader.conf on the ZFS, inside the GELI provider.

Gotcha. So another interesting point - when I booted into Ubuntu stick and tried to open gparted , it said : GPT back up is corrupted but primary appears to be ok.

I'm not sure if that has an effect on things.

T-Daemon said:
Have you installed the loader.efi from the 13.1

Yes, if I recall correctly . Does that change something?

T-Daemon said:
Not true, it booted once properly.

Haha fair enough - that's the most confusion part - it ran almost fine the first time upon restart/reboot but after that something happened and it just didn't work.

I'll try to run the hard disk tests and update here. Thanks again for all the help.

Tracker · Mar 23, 2023

UPDATE - hard disk seems to NOT be the issue. Here are the test results of the long test - do they look normal T-Daemon ?

Code:

ubuntu@ubuntu:~$ sudo smartctl -a /dev/sdc
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-28-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST320LT022-1AE142
Serial Number:    W040Y20Q
LU WWN Device Id: 5 000c50 0392f4f43
Firmware Version: 0001EXM1
User Capacity:    320,072,933,376 bytes [320 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Mar 23 09:37:36 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  87) minutes.
Conveyance self-test routine
recommended polling time:      (   3) minutes.
SCT capabilities:            (0x303f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   120   099   006    Pre-fail  Always       -       241041192
  3 Spin_Up_Time            0x0003   099   098   085    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   094   094   020    Old_age   Always       -       6397
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       4794495055
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       15000
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   094   094   020    Old_age   Always       -       6243
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   097   097   000    Old_age   Always       -       4295034104
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   053   041   045    Old_age   Always   In_the_past 47 (Min/Max 32/49 #434)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1757
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       715
193 Load_Cycle_Count        0x0032   041   041   000    Old_age   Always       -       119373
194 Temperature_Celsius     0x0022   047   059   000    Old_age   Always       -       47 (0 6 0 0 0)
195 Hardware_ECC_Recovered  0x001a   046   036   000    Old_age   Always       -       241041192
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       3
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       14125 (243 253 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       743134222
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       733660868
250 Read_Error_Retry_Rate   0x0000   100   001   000    Old_age   Offline      -       126
251 Unknown_Attribute       0x0000   100   001   000    Old_age   Offline      -       2
252 Unknown_Attribute       0x0000   100   001   000    Old_age   Offline      -       0
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     15000         -
# 2  Extended offline    Aborted by host               80%     14997         -
# 3  Extended offline    Aborted by host               80%     14980         -
# 4  Extended offline    Aborted by host               90%     14980         -
# 5  Short offline       Completed without error       00%     14980         -
# 6  Short offline       Completed without error       00%     14980         -
# 7  Short offline       Interrupted (host reset)      00%      9789         -
# 8  Short offline       Completed without error       00%      9750         -
# 9  Short offline       Completed without error       00%      3665         -
#10  Short offline       Completed without error       00%      3656         -
#11  Short offline       Completed without error       00%      3654         -
#12  Short offline       Completed without error       00%      3269         -
#13  Short offline       Completed without error       00%      3084         -
#14  Short offline       Completed without error       00%      3082         -
#15  Short offline       Completed without error       00%      3078         -
#16  Short offline       Completed without error       00%        68         -
#17  Short offline       Completed without error       00%        59         -
#18  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

What next to try ?
Also, I think there might be some entries for fstab on hard drive - could that potentially be causing issues?

Update: Out of sheer desperation i tried the -x option with zpool import - zpool -fFX baxkuppool........ it's just constantly flickering it's light.... Guess it's nuked now

..... That might possibly leave me with only 1 backup which is on a bigger disk that has same partition set up (will access from original laptop - works only in safe mode..... And somehow need to figure out how to backup a larger sized disk onto the smaller one - has less data than smaller iirc)

astyle · Mar 23, 2023

Seriously, don't be afraid of a reinstall.

If the disk boots fine in machine a, then boot it in machine a, and use the opportunity to back up the data you care about. Yeah, that takes time, but it is an important step.

THEN move the disk to machine b, and mess around with it.

Tracker · Mar 24, 2023

astyle said:
If the disk boots fine in machine a, then boot it in machine a, and use the opportunity to back up the data you care about. Yeah, that takes time, but it is an important step.

It's not booting on machine A as well now.

What complicates it is that I can't use dd because the disk is smaller in size compared to the other . So I need to figure out how and which files/config I need to backup - if I go the reinstall route.

T-Daemon · Mar 24, 2023

Tracker said:
UPDATE - hard disk seems to NOT be the issue.

Not necessarily. The TYPE colon prints a lot of "Old-age" and "Pre-fail". Even if no error is reported by smarmontools, the disk can be in a state before a complete failure.

I have two Samsung 840 PRO SSD. One of them is working fine with macOS on it. On the other one I had FreeBSD Root-on-ZFS (none GELI encrypted) on a desktop machine. One day it wouldn't boot, stopping at the mountroot prompt.

When I tried to import the pool, it reported the file system corrupted. I installed the system again, restored the data from a backup. After a while the same happened again.

Installed the system again, this time with a UFS file system. The system worked fine, until the disk wasn't recognized by the machines BIOS anymore. Tried the disk to attach it as a USB device to a laptop. Nothing, the disk was dead.

smartmontools didn't report any errors before either.

Tracker said:
Update: Out of sheer desperation i tried the -x option with zpool import - zpool -fFX baxkuppool........ it's just constantly flickering it's light.... Guess it's nuked now

I told you so:

T-Daemon said:
If in the pool is data you care about then no recovery attempts should be applied on the original disk and pool. There are some methods to try to import the pool but those are invasive.

If unsuccessful it can destroy the pool and data for good and should only applied on a cloned image of the freebsd-zfs partition, which requires a spare disk, at least the same size as the partition, to copy to.

Tracker said:
That might possibly leave me with only 1 backup which is on a bigger disk that has same partition set up (will access from original laptop - works only in safe mode..... And somehow need to figure out how to backup a larger sized disk onto the smaller one - has less data than smaller iirc)

Don't trust the affected disk anymore. Buy a new disk ASAP, backup the important data there. If the "good" disk is as old as the affected one, buy two. Or buy the second one later, if now is not a good time.

Tracker · Mar 25, 2023

T-Daemon said:
Not necessarily. The TYPE colon prints a lot of "Old-age" and "Pre-fail". Even if no error is reported by smarmontools, the disk can be in a state before a complete failure.

Just curious - what is a good test for a hard drive if smartmontools can't be relied upon? This hard drive hasn't been used so much.

T-Daemon said:
Don't trust the affected disk anymore. Buy a new disk ASAP, backup the important data there. If the "good" disk is as old as the affected one, buy two. Or buy the second one later, if now is not a good time.

Got it - will try to backup on a new one soon. "Good" hard drive is newer - doubt it will fail soon - although it has been used a bit with a swap partition of 8g on it - not sure how that affects it's life.

T-Daemon · Mar 26, 2023

Tracker said:
Just curious - what is a good test for a hard drive if smartmontools can't be relied upon?

I know only smartmontools. It's also the recommended tool here in forums.

Tracker said:
This hard drive hasn't been used so much.

Usage is not a reliable factor to measure a disks health. It has been seen that factory new drives failed after a short time of use.

Tracker said:
Got it - will try to backup on a new one soon. "Good" hard drive is newer - doubt it will fail soon

Nonetheless, better get a new drive sooner than later and make the backup. Better safe than sorry.

Tracker · Mar 27, 2023

T-Daemon said:
I know only smartmontools. It's also the recommended tool here in forums.

T-Daemon said:
Usage is not a reliable factor to measure a disks health. It has been seen that factory new drives failed after a short time of use.

So effectively there isn't a *definitive* diagnosis for the hard drive being faulty in this case - it's what we are *assuming* since we have run out of other explanations - is that a correct summary of my case?

PS : I think those SCSI errors were probably due to the USB connector I was using.... Will have to double check.

Tracker · Mar 27, 2023

Update: Here's something weird that happened

So I followed the same instructions of T-Daemon on the main "good" SSD drive - it booted perfectly fine. The swap also was showing to be ~7.75 GB (was 8 gb earlier but we deleted swap , put in 260 mb of EFI and resized swap - so all good).

Here's the mysterious part - first time I see swap in htop say ~7.75 gb - perfectly normal.

Upon the next reboot the swap somehow became 8 GB in htop. I ignored it despite noticing. Same thing across the next reboot as well.

All of a sudden I do another reboot (maybe 3rd or 4th) and GELI prompt goes missing and this "good" disk doesn't boot!! Again! (goes back to being recognized as in my original post - but doesn't boot on either UEFI or legacy first mode)

Seriously - what's going on? i almost thought I haf solved the issue but it's back to square one, mysteriously.

astyle · Mar 27, 2023

Yeah, this is why I backup at first opportunity after a scare...

Hard drive with GPT detected but not booting on Thinkpad.

Tracker

smithi

Tracker

Tracker

Tracker

Attachments

T-Daemon

Tracker

T-Daemon

Tracker

Attachments

astyle

Tracker

smithi

Tracker

T-Daemon

T-Daemon

Tracker

Tracker

astyle

Tracker

T-Daemon

Tracker

T-Daemon

Tracker

Tracker

astyle