ZFS zohome - My Pool With All My Stuff - Disappeared

I don't know if this applies to the OP situation, but I've been doing this as my default. Solaris I think wanted "whole device" but almost everything else partitions were better.

Solaris used its own partition table format, handled by the Solaris format(8) command. The format is similar to but also incompatible to bsdlabel. The OBP understood the format.
 
I've noticed when a disk looks faulty for any reason to ZFS system, it totally ignores it, as if there was nothing there. Nor gpart show nor zfs import would show anything, but the disk is OK.
The only solution I've found is to boot from a FreeBSD install disk and go to shell or live and try zpool import. Then the disk would be marked as clean.
Then, when you boot normally, hopefully, you should be able to import the zpool again.
 
Almost a week later ...

The only reasonable NMVEs I could get are two Kioxia Exceria Pro. They are not data centre. Does anyone know if they are any good or not? How much life should I expect to get out of them?

I could have waited and got two more Samsung data centre NVMEs, but I am highly pissed off with Samsung. Even though I have backups, I did not expect two - expensive as well - data centre drives in a zfs mirror to fail at the same time.

Unfortunately, with the world situation, there is not the same free flow of goods, and I cannot get the same choice as I could four years ago.

Previously, I have had a hardware raid card fail, and so all the data on four drives was gone, so I thought that mirroring with zfs would be safer.

I took the Samsung NVMEs out. There was a thin layer of some sort of oil on the drives, beneath the silicone pads in the heatsinks. What is it? How could it have got there? Could that have caused the drives to fail?

Putting in the Kioxia was not simple. As they are 2280 factor, they are shorter that the Samsung drives, and I had to take out the mother board to put in more standoffs.

Anyway, they are in. I did not do dangerously dedicated this time, although I am sure that was not the problem. Did gpt partitions, and left a bit free at the end. Did zpool mirror and restored from backup, and I am broadly up and running.

Found a windows machine that I could put the Samsungs in and downloaded Samsung's 'Magician' that is supposed to solve all problems. It did not. Just told me Errormod, 1GB, and nothing to be done to fix them.

I am not going to mark this thread as 'Solved'. It is fixed, but far from solved.
 
I took the Samsung NVMEs out. There was a thin layer of some sort of oil on the drives, beneath the silicone pads in the heatsinks. What is it? How could it have got there? Could that have caused the drives to fail?
I've seen that oil/liquid on other stuff with thermal pads, but haven't seen it be a problem.

If you're brave; I took some NVMes and ran them in a 5-min sonic bath of 90%+ isopropyl alcohol and they still worked :p (cleaned up oil and thermal paste on chips)
 
I have seen SSD drives crash their internal firmware under weird (but apparently reproducible) conditions.

The cells contain different things:
  • user data (visible)
  • overprovisioning (invisible)
  • controller firmware (invisible)
  • runtime configuration (invisible)
The runtime configuration describes the size and layout of the cells, timing parameters, usage information and whatever else, and it can be written during operation - which means it can be destroyed due to a malfunction. And in that case the controller then no longer knows how and where the cells are to be accessed, and consequentially the data is gone.
Typically in that case the controller shows only a very small amount of accessible storage, and allows to load the initial factory configuration. This is not something that can simply be done; maybe specialized disk recovery shops are able to do it and recover (some of) the data.

Bottomline: mirroring is of limited usefulness with SSD. Mirroring can only protect from individual defects (from old age or manufacturing flaws), it cannot protect from systemic malfunction (i.e. controller bugs) if the controller firmware is the same in all the mirrors.

Recommendation: mirror to different brands of devices, and (if possible at all) make sure they use different controller series.
 
I took the Samsung NVMEs out. There was a thin layer of some sort of oil on the drives, beneath the silicone pads in the heatsinks. What is it? How could it have got there? Could that have caused the drives to fail?

If it's just cooling compound touching the heatsinks only, that's okay. But normally the silicone pads are the non liquid cooling compound between the electronics and the heat sink (mostly aluminium, sometimes copper.)
Maybe you can post a close up foto.

But if it's on the PCB, touching electronics parts, that's not good.
And yes it's highly probable that this then was the reason your drives failed. (While it's uncommon both drives failed at the same time; maybe you didn't recognized when the first failed, but until the second failed, too.)
You need to check what this is, and where it came from.

Maybe your water cooling (if you have such) is leaking (water cooling does not contain water to cool, but a cooling liquid that consist of purified water and several additives, like for to suppress algae, reduce corrosion, et al.), Maybe a capacitor it broken and is dropping electrolyte (a good magnifier glass and a lot of light are very helpful for the search.) Maybe you however accidently produced a hairline crack into one of the mainboard's heatpipes. Or cooling compound (e.g. from your CPU) is dripping (most use way too much of that stuff [not the more, but the lesser the better!]), or it's decomposing. Or you have an unlucky situation that steam from the sorrounding air is condensing (if you are living in the jungle very moist air can be a source)... - I don't know. Could be many reasons or origins.
But you better find out.
Keep the machine completely powered off (pull the cord!) until you found it.

Any non isolating, but conducting stuff - doesn't have to be liquid; even very small amounts, even very viscous, pasty stuff, and anything unclean, e.g. just moisty dust, needs to be suspected as conducting - touching electronics causing short circuits, which in very rare cases may just disturb, but almost always simply kills the electronics.
Short: Anything not completely dry is dangerous.

So you better wipe that stuff from your drives pretty good. There are special wash liquids for electronics, isopropyl alcohol will also do (depending on the stuff you need to wipe off; can get it in your pharmacy) - NO water! NO acetone! The last will dissolve all paint. Remove it spotless. Then test, if the drives work again. But don't get your hopes up too much. Be prepared for your drives are toast. Sorry, buddy.

But you must find the source where this liquid comes from.
You risk even more damage (mainboard, GPU,...)
 
Could it be oil bleeding from the thermal pad?
msi-rtx-3080-top.jpg
msi-rtx-3080.jpg
 
I think it was oil from the thermal pads. The thermal pads came with the heatsinks on the Supermicro board, so even Supermicro might skimp on quality here or there.
 
Back
Top