The root cause of all these problems is that "firmware" is not perfect. By firmware, I mean things below the OS, including the code that runs in the disk interfaces (SATA interfaces on the motherboard, SAS HBAs), whatever is in the storage IO path like SAS expanders or SATA multiplexers, and the disk drives themselves. Ideally, they should all have relatively short timeouts (30 or 90 seconds), where any uncompleted IO request gets correctly aborted. One has to include the lowest levels of the kernel in the term "firmware", because the kernel drivers for certain hardware (like SCSI HBAs) needs to participate in keeping track of the state of pending IOs, and clearing aborted or stuck IOs. In practice, all that code has bugs. Usually they cause one of the parties to forget an IO (typically after an error), so the other party waits "forever" for the IO to complete.
I'm sorry if the following makes you even more depressed ... but reality can be ugly.
Usually a hardware reset fixes these problems, because it is sent to not only the motherboard, but also to the disk interfaces. Typically, that clears whatever stuck software state is causing the hang. But I have seen individual disk drives that are so broken that they will hang the bus on the next reboot, and make booting impossible. I had one at home (SATA disk attached directly to the motherboard, with a low-cost motherboard), and I had one at home (high-quality SAS disk with verified firmware version, installed the most expensive disk enclosure ever built, and connected to a very high-quality computer). We ended up having to do a binary search among several hundred disks to find out which disk needs to be physically removed to allow the machine to reboot. No amount of IPMI or remote management/reboot would have helped against cases like that.
This just means that you going home at lunch may be a necessary reality.