Other Where to talk low-level SCSI hacking? Trying to replace EMC with Dell firmware

I realize this is the wrong place to ask this. Hence the title.
In a different thread mrsas and mystery of lost speed I briefly mentioned how my controller's BIOS capped negotiated drive speed at 6Gbps even though these HGST drives are SAS-3 12Gbps capable. Now. Turns out they have been pulled out of EMC Unity cluster which can't go faster than that and capped them at firmware level. Funny story, there exists Dell's firmware package update for drives of the exact same model and as it happens I am trying to have these drives go fast on Dell PowerEdge R730 where every relevant part of the system is 12Gbps.

So, how does one go about it? One asks Dell's Community Outreach but meanwhile discovers that firmware and attempts to apply it by booting into Fedora Live USB ... sorry FreeBSD no love for you there. Whereupon I discover that Dell's validator kicks in and attempts to validate against known DeviceId or ComponentId which it expects to find somewhere in the firmware which (this being EMC Unity drives) it can't find. That doesn't stop us so we hexedit the firmware file and observe in the logs how it passes validation, that header gets stripped and updater at least attempts to push that firmware onto the drive. But in response it gets some SCSI error (I believe 0xA) and SCSI sense data which I don't think sg_decode_sense interprets correctly (wrong endianness as printed or smth?). Here's a sample of the debug.log and my attempt to make sense:

Code:
<03/31/21, 10:27:08 AM>doProcessLibCommand: After Calling ProcessLibCommand
<03/31/21, 10:27:08 AM>WriteBuffer: Return code from ProcessLibCommand = 2d
<03/31/21, 10:27:08 AM>SASHardDriveDUPDevice::writeBuffer : Return code from storelib =2d
<03/31/21, 10:27:08 AM>SASHardDriveDUPDevice::writeBuffer : SCSIStatus=^@
<03/31/21, 10:27:08 AM>SASHardDriveDUPDevice::writeBuffer : SenseData=
<03/31/21, 10:27:08 AM>0x70 0x0 0x5 0x0 0x0 0x0 0x0 0x18 0x0 0x0 0x0 0x0 0x26 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0xf1 0x30 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<03/31/21, 10:27:08 AM>Got an error from writebuffer
<03/31/21, 10:27:08 AM>Finished writeBuffer

See that funny ^@ symbol there. Unprintable, so I just looked at that log in the hexeditor and I think its 0xA so could be a copy error.
Here's what I managed to get from sg_decode:
Code:
$ sg_decode_sense -e 10
Copy aborted


$ sg_decode_sense 70 00 50 00 00 00 00 18 00 00 00 00 26 20 00 00 00 00 00 00 f1 30 00 00 00 00 00 00 00 00 00 00
Fixed format, current; Sense key: No Sense
<<<Sense data overflow (SDAT_OVFL)>>>
ASC=26, ASCQ=20 (hex)
 EOM

I am way out of my depth here. So. Any low level SCSI hackers around that would be interested in hacking together?
Or more likely anyone can recommend a better venue to discuss?

Thank you
 
a thought: instead of trying to apply somebody else's binary blob why not minimally reverse and patch the EMC firmware already on the drive? Just enough for it to negotiate and allow 12Gbps link. Reverse engineering is not something I expected myself doing and yet here I am entertaining the idea. I mean, that drive has to talk to whatever controller it happens to be attached to, so SCSI commands - they negotiate link speed. Surely there must be a way to:
- "grep" that f/w for some well known SCSI protocol bits,
- "patch" just the link speed limit?

I really need a venue where I can have these ideas shot down or encouraged with pointers to specific tools and techniques, sigh. Reddit r/ReverseEngineering looks dull
 
I am way out of my depth here
I don't think people are going to remotely work on your firmware and if you can't find anyone specifically working on your problem, and if you are as you state above, you should probably just get another controller. Otherwise you're getting into "what is this microcontroller, what are the instructions and how does the firmware get executed on it."

Have you tried taking the backplane out of the equation?
 
Summary so far: You have HGST drives (SSDs), which are loaded with EMC firmware. That firmware caps their speed at 6 Gbit/s, and you want them to go 12 Gbit/s instead. This immediately brings up questions: Do these disks identify themselves as vendor = HGST and model = a HGST model number in the inquiry command. And: Is the 6 Gbit/s speed really a bottleneck? I see that they are running at ~350 MB/s, nowhere near the 600 MB/s limit of the interface.

Anyway: Your problem is: You're trying to download Dell-branded firmware into the drives, using some sort of tool that runs on Linux. Which doesn't work, and the root cause seems to be that the drive refuses the firmware, and returns a SCSI ASC/ASCQ that seems impossible to decode above.

What happens if we decode it? The SCSI status is ^@, which is a zero byte printed as an ASCII character (control-@ comes right before control-A, which is 0x01). I have no idea what "SCSI status" exactly means in that context. The SCSI sense data is that long string starting with 0x70. Decoding that with the SCSI sense data struct says: Response code 0x70 = sense data, Segment number 0x00, Sense key = 0x05 (other bits in byte 2 are not set) means "ILLEGAL REQUEST", Information (4 bytes) are all 0x00, Additional sense length = 0x18 bytes = 24 bytes in decimal, Command-specific information (4 bytes) are all 0x00, and most importantly: ASC and ASCQ are 0x26 and 0x02. Look up those values in the tables of ASC/ASCQ (they are all over the web), and it means: Invalid parameter. At this point, we can skip the rest: The drive says "I don't like the command you sent". Specifically, it was an illegal command. Why was it illegal? One of the parameters was invalid. Educated guess: The firmware that was being downloaded is considered invalid. Or to put it bluntly: the drive is refusing the firmware.

Honestly, what did you expect? Why would EMC put any effort into allowing their drives to be turned into Dell drives? On the contrary: If I was an EMC customer, I would insist that the hardware I buy from EMC refuses clearly incorrect firmware files, because on a production system, only a saboteur would load non-EMC firmware onto these drives.

Instead of doing crazy hacking, try this: Download the SCSI command manual for the specific model if disk from HGST. Read exactly how to download firmware into the drives. Typically, this is done with the SCSI command "write buffer" command, and usually requires a complicated dance: first use that command many times to load the firmware into the drive (little pieces at a time, at the correct offsets), then use it one more time to save the downloaded firmware from RAM to some sort of permanent ROM, and then activate it (which is done either with another write buffer command, or with a reset command). Then write a script or program that performs this action, and debugs exactly which step fails.

My educated guess: The drive will only accept firmware downloads that are "signed", meaning contain some form of authentication and checksum. For simple security that is vital, otherwise black hats would be destroying disk drives whenever they become root. And I think an EMC drive will only accept firmware that is cryptographically signed with EMC's key, everything else would be dumb.

This is also the reason why hacking the firmware file itself is nearly impossible. Firmware files need to come from the drive vendor or authorized parties. A disk drive is not sold as a general-purpose computing device to execute arbitrary code, on the contrary.

Suggestion: Be happy that you got this hardware for cheap, and use it at 6 Gbit/s.
 
My educated guess: The drive will only accept firmware downloads that are "signed", meaning contain some form of authentication and checksum. For simple security that is vital, otherwise black hats would be destroying disk drives whenever they become root. And I think an EMC drive will only accept firmware that is cryptographically signed with EMC's key, everything else would be dumb.

I think for this reason WD tried to put the rug over their bugged Raptor drives.
There has been an firmware updater for this available for a short time on their website. but it was removed very soon.
I guess this is exactly for the reasons you mentioned.
 
Back
Top