Other SATA standby on SAS HBA can hang the system

There are a few ways to put a disk into standby. Most of them don't work, or work in an unintellegible fashion. And each device behaves differently.

There is EPC and APM. EPC is nice, because it can put a drive into low-rpm standby, while APM is just an 8-bit number, and the drive may do anything with that number (lower numbers mean earlier standby, but everything else is quite obscure).

Another possibility is camcontrol [idle|standby] -t <seconds>. This should configure an internal timer within the device, which would put the drive into standby automatically after so many seconds of inactivity. Most devices seem to ignore this configuration, but some actually adhere to it. And then there may be a problem with SAS attachment.

SAS works with SCSI protocol, and in SCSI the standby is handled differently. A SATA disk in standby will just wait for the next command and then go back into operation implicitely. A SCSI disk in standby, however, when receiving a command will send an error sense message telling that it is stopped, and then expect the host to send a SCSI START UNIT command in return. And FreeBSD will do as requested, and retry the command afterwards.

When a SATA disk is attached to a SAS controller, it is the job of the controller to emulate SCSI behaviour to the host. So, when the disk is in standby, the controller does not simply send a command and expect the disk to start, but send a scsi sense error back to the host. Then FreeBSD sends the START UNIT command, and things can proceed. And this does work if we have put the disk into standby explicitely. It does apparently not work when the disk has put itself into standby due to the internal timer configuration.

I've been in the habit of spinning down my disks for a long time already. And I have added a message into the kernel so I can see when the kernel stops and starts a disk. Thanks to that I could immediately notice what is going on, and it looks like this:

Code:
Sep 25 02:42:01 <kern.crit> edge kernel: [4118] SCSI START unit (fg) <2:1:0>
Sep 25 02:42:02 <kern.crit> edge syslogd: last message repeated 370 times
Sep 25 02:42:02 <kern.crit> edge kernel: [4119] SCSI START unit (fg) <2:1:0>
Sep 25 02:42:03 <kern.crit> edge syslogd: last message repeated 415 times
Sep 25 02:42:03 <kern.crit> edge kernel: [4120] SCSI START unit (fg) <2:1:0>
Sep 25 02:42:04 <kern.crit> edge syslogd: last message repeated 418 times

This goes on forever, and it locks some other things, and finally the system will freeze or crash. Apparently the kernel tries to start the device, but that does not work and the sense error seems to come back again and again.

There is no such problem when attaching the same device to an ahci port - it will go to standby, and then just accept the next command and get alive again.

In short: the camcontrol [idle|standby] -t feature can be harmful on a SAS HBA. I recommend to use sysutils/gstopd instead which achieves the same thing from the host.
 
Is the following a good compromise?

hint.ahcich.X.pm_level=5
hint.ata.X.pm_level=1

Thats what I have, too - but that's something different again.
ata is probably obsolete and replaced by ahci, and the ahic.pm_level concerns the power on the SATA signal wire. The SATA cable has a sender and receiver circuit on both ends, and obviousely some current must flow through the cable in order to transport the data, and this can be powered down when there is no access to the disk. I think this more or less similar to the "SATA Link Power Management" found in the mainboard BIOS. But I have no idea what amount of wattage this is about.

What I was talking about, that is disk standby. And that means, with a mechanical disk, that either the heads go off the platter and into park, so the air cushion for the heads goes away and that might give about one watt less - or, the spindle stops entirely, and the drive does no longer consume noticeable energy. So this makes a real difference, and if an application is not used during nighttime, it makes a lot of sense to stop the disks concerned. (If somebody appears and uses them, there is some 10 sec. delay at first.)
SSD devices may also do standby, and may then shut down some components, but, as usual, nobody knows details.
 
I see.

I haven't looked too much into parking disks. It creeps me out the thought of some badly behaving subsystem erroneously parking and waking the heads in a constant loop. However with an SSD, if it does go into a low power state and (worst case scenario) keeps waking up, I might spend some time exploring it again.
 
Back
Top