camcontrol hdd smart status

How can I get smart status of hard drive?
Two hard drivers configure in RAID1, I think they are on HP raid controller.
I can get this results, but cannot read smart.
Code:
%camcontrol devlist
<COMPAQ RAID 1  VOLUME OK>         at scbus0 target 0 lun 0 (pass0,da0)
<COMPAQ RAID 1  VOLUME OK>         at scbus0 target 1 lun 0 (pass1,da1)
When trying ident or identify says
Code:
option identify (or ident) not found
 
For reading SMART data: What SirDice said. Install smartmontools, and then run "smartctl -a /dev/daXXX".

BUT: This is unlikely to give you the actual SMART data of the physical disk drives. Here's why: You are using hardware RAID, with a RAID controller taking two drives and making them into virtual volumes. What your operating system sees as a "disks" (the block devices /dev/da0 and /dev/da1) are not the two physical disks, but two virtual volumes created using RAID-1 out of those disks. To get the actual SMART data from the real disk drives, you need to ask your RAID controller to perform that commands for you.

Also: The two virtual disk drives /dev/daX are SCSI devices. The command camcontrol identify won't work on them; that command is for ATA disks. You should use camcontrol inquiry and get better results. But again, those commands will tell you about the virtual disks, not the physical ones.
 
For reading SMART data: What SirDice said. Install smartmontools, and then run "smartctl -a /dev/daXXX".

BUT: This is unlikely to give you the actual SMART data of the physical disk drives. Here's why: You are using hardware RAID, with a RAID controller taking two drives and making them into virtual volumes. What your operating system sees as a "disks" (the block devices /dev/da0 and /dev/da1) are not the two physical disks, but two virtual volumes created using RAID-1 out of those disks. To get the actual SMART data from the real disk drives, you need to ask your RAID controller to perform that commands for you.

Also: The two virtual disk drives /dev/daX are SCSI devices. The command camcontrol identify won't work on them; that command is for ATA disks. You should use camcontrol inquiry and get better results. But again, those commands will tell you about the virtual disks, not the physical ones.
You got me on this one. I had to go and check the smartd.conf which comes with OS. Glad to see ondra_knezour already posted the answer. I think this is the relevant snippet of the sample file.
Code:
 Monitor 2 ATA disks connected to a 3ware 9000 controller which

# uses the 3w-9xxx driver (Linux, FreeBSD). Start long self-tests Tuesdays

# between 1-2 and 3-4 am.

#/dev/twa0 -d 3ware,0 -a -s L/../../2/01

#/dev/twa0 -d 3ware,1 -a -s L/../../2/03



# Monitor 2 SATA (not SAS) disks connected to a 3ware 9000 controller which

# uses the 3w-sas driver (Linux). Start long self-tests Tuesdays

# between 1-2 and 3-4 am.

# On FreeBSD /dev/tws0 should be used instead

#/dev/twl0 -d 3ware,0 -a -s L/../../2/01

#/dev/twl0 -d 3ware,1 -a -s L/../../2/03
 
Running smartctl -a -d cciss,0 /dev/ciss0 gives only drive temperature, no Reallocated, CRC Errors or any other.
On another computer, which is SuperMicro, running camcontrol devlist got this results
Code:
<ST1000NM0033-9ZM173 SN04>         at scbus0 target 0 lun 0 (ada0,pass0)
<ST1000NM0033-9ZM173 SN04>         at scbus1 target 0 lun 0 (ada1,pass1)
<AHCI SGPIO Enclosure 1.00 0001>   at scbus6 target 0 lun 0 (ses0,pass2)
Also when trying smartctl -a /dev/ada1 or ada0 can see all smart data.

Can you give me another advice? I would like to make shell script, which reads raid status and smart data, then sent it via email
with subject "raid ok or raid failure". I make this script which collects data. Here example


Code:
less hddreport.sh
SHELL=/bin/sh
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin/
/sbin/camcontrol devlist >> /storage/hddreport/`date +\%Y-\%m-\%d`-hdd.txt | /usr/local/sbin/smartctl -a /dev/ada0 >> /storage/hddreport/`date +\%Y-\%m-\%d`-hdd.txt |/usr/local/sbin/smartctl -a /dev/ada1 >> /storage/hddreport/`date +\%Y-\%m-\%d`-hdd.txt

How to get the subject and change "from", then attach to mail?
Now I send mail using this line:
Code:
cat /storage/hddreport/`date +\%Y-\%m-\%d`-hdd.txt | mail -v -s "hdd report" my@mail.com
 
I prefer using shell script. Keeping records of hdd smart status, can help me in some scenario.
Just some more help about mail and subject.
 
See mail(1):
Code:
     -s subject
             Specify subject on command line.  (Only the first argument after
             the -s flag is used as a subject; be careful to quote subjects
             containing spaces.)
The From: address is always the user that's sending the mail.
 
It depends on what drive supports, see following two different drives on same controller
Code:
smartctl -d cciss,5 /dev/sg0 -a
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0+2] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sg0 [cciss_disk_05] [SCSI]: Device open changed type from 'sat,auto+cciss' to 'cciss'
=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              EG0146FARTR
Revision:             HPDA
User Capacity:        146,815,737,856 bytes [146 GB]
Logical block size:   512 bytes
Rotation Rate:        10025 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x500000e1183d4ac0
Serial number:        D0A1PB80JG4P1133
Device type:          disk
Transport protocol:   SAS
Local Time is:        Tue Mar 27 20:40:18 2018 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     37 C
Drive Trip Temperature:        65 C

Manufactured in week 33 of year 2011
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  51
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0     365331.205           0
write:         0        0         0         0          0      28527.670           0
verify:        0        0         0         0          0        147.378           0

Non-medium error count:      163

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   51136                 - [-   -    -]
# 2  Background short  Completed                   -   51112                 - [-   -    -]
# 3  Background long   Completed                   -   51067                 - [-   -    -]
# 4  Background short  Completed                   -   51065                 - [-   -    -]
# 5  Background short  Completed                   -   51041                 - [-   -    -]
# 6  Background short  Completed                   -   51017                 - [-   -    -]
# 7  Background short  Completed                   -   50993                 - [-   -    -]
# 8  Background short  Completed                   -   50969                 - [-   -    -]
# 9  Background short  Completed                   -   50945                 - [-   -    -]
#10  Background short  Completed                   -   50922                 - [-   -    -]
#11  Background long   Completed                   -   50899                 - [-   -    -]
#12  Background short  Completed                   -   50898                 - [-   -    -]
#13  Background short  Completed                   -   50874                 - [-   -    -]
#14  Background short  Completed                   -   50850                 - [-   -    -]
#15  Background short  Completed                   -   50826                 - [-   -    -]
#16  Background short  Completed                   -   50802                 - [-   -    -]
#17  Background short  Completed                   -   50778                 - [-   -    -]
#18  Background short  Completed                   -   50754                 - [-   -    -]
#19  Background long   Completed                   -   50732                 - [-   -    -]
#20  Background short  Completed                   -   50730                 - [-   -    -]
Long (extended) Self Test duration: 1722 seconds [28.7 minutes]
Code:
smartctl -d cciss,6 /dev/sg0 -a
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0+2] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sg0 [cciss_disk_06] [SAT]: Device open changed type from 'sat,auto+cciss' to 'sat'
=== START OF INFORMATION SECTION ===
Model Family:     Intel 320 Series SSDs
Device Model:     INTEL SSDSA2CW160G3
Serial Number:    CVPR1283000P160DGN
LU WWN Device Id: 5 001517 9595d7565
Firmware Version: 4PC10302
User Capacity:    160,041,885,696 bytes [160 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Mar 27 20:40:26 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    1) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       50428
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50
170 Reserve_Block_Count     0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       49
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       970525
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       5197
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       0
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       3025704
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   095   095   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       970525
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       3660128

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Smart is defined completely different between SCSI (SAS) and SATA. The output of your HP SAS drive above is typical for a SCSI drive. The single most important thing in SCSI Smart is that drives can self-identify as "failure predicted", which gives a concise alert that the drive is wanting to be replaced. Interestingly, I don't see it in your Smart summary above, but that may be because there are two ways to report this information (the other is via sense data).

In addition, SATA Smart is not very standardized, and the set of reported parameters and their meaning varies from manufacturer to manufacturer, and sometimes model to model. Working with Smart is a big hairy mess.
 
  • Thanks
Reactions: Oko
It depends on what drive supports, see following two different drives on same controller
Yes, two drive on one controller in raid1 mode.

Then somehow to collect all information scsi, raid and hard drives smart into one file, and then send it via mail?
I prefer to collect hard drives smart status.
Can smartd.conf make this and how can I test it?
 
I wasn't aware of that. I'm definitely interested in that too. I have a couple of scripts for Zabbix to get some SMART parameters, this could be very useful. Thanks for the pointer!
 
You don't need to monitor the S.M.A.R.T. status of the hard disks that are connected behind the RAID controller unless you want to make some statistical information. The RAID controller monitor all pre-fail status of the disks that are attached to it.
Regarding the monitoring of the RAID Volume you have several options depending of your needs.

1. No monitoring at all
2. Local Periodic checks (logs,dmesg, status commands etc)
3. Periodic reports by mail (using cron(8) and periodic(8) script)
4. Instant report by mail when the event occur (using syslogd (8)with logging subprocess) or with monitoring the log file for changes using tail(1)

Here is the example script for periodic daily reports. Save the script under /usr/local/etc/periodic/daily/
Bash:
#!/bin/sh
#

# If there is a global system configuration file, suck it in.
#
if [ -r /etc/defaults/periodic.conf ]
then
    . /etc/defaults/periodic.conf
    source_periodic_confs
fi

case "$daily_status_camcontrol_enable" in
    [Yy][Ee][Ss])OD
        echo
        echo 'Checking status of camcontrol(8) devices:'

        if camcontrol devlist; then
                components="$(camcontrol devlist | grep -v ok)"
                if [ "${components}" ]; then
                        rc=3
                else
                        rc=0
                fi
        else
                rc=2
        fi
        ;;

    *)  rc=0;;
esac

exit $rc

Edit /etc/periodic.conf.local and add:
daily_status_camcontrol_enable="YES"

If you want to receive immediately the the event when it occur you need to pipe the output line from syslogd which read the /dev/klog kernel messages and write them in corespondent file configured in /etc/syslog.conf So you can pipe line this output in shell script for filtering then send it by mail. This can be dangerous of flooding the e-mail with messages if it's not filtered/configured correctly.

Here is the example script/configuration that can be made in single line in the syslog.conf or with shell script.

This script filter the output of kernel messages for keywords of the ciss driver for raid volume status and if there's match then send this message via mail to root. The same technique may be used for another messages if you know the status that are send by the raid driver.

/usr/local/etc/syslog_mail.sh
Bash:
#!/bin/sh

while IFS= read line; do
        if [ "$(echo $line|tr [:upper:] [:lower:]|egrep 'rea|int|exp|rec|fai')" == "" ];
            then :;
        else echo "$line" | /usr/bin/mail -s SYSLOG root;
        fi
done

We can grab all ciss driver messages with
Bash:
#!/bin/sh

while IFS= read line; do
        if [ "$(echo $line|tr [:upper:] [:lower:]|grep 'ciss')" == "" ];
            then :;
        else echo "$line" | /usr/bin/mail -s SYSLOG root;
        fi
done


It depend under which facility the message of the raid driver report the status. If it's under kern.warning or *.emerg then we can add the second line to pipe and filter the output
/etc/syslog.conf
Code:
*.err;kern.warning;*.emerg     | /usr/local/etc/syslog_mail.sh
Or we can write it in single line into it the /etc/syslog.conf
Code:
*.err;kern.warning;*.emerg        | while read log; do if [ "$(echo $log|tr [:upper:] [:lower:]|egrep 'rea|int|exp|rec|fai')" == "" ]; then :; else echo "$log" | /usr/bin/mail -s SYSLOG root;fi done
 
On hp computer "camcontrol devlist" I see status "OK" with this result:
Code:
<COMPAQ RAID 1  VOLUME OK>         at scbus0 target 0 lun 0 (pass0,da0)
<COMPAQ RAID 1  VOLUME OK>         at scbus0 target 1 lun 0 (pass1,da1)
What should I use in smartd.conf to run this check ?

On Supermicro "camcontrol devlist" get this, without status
Code:
<ST1000NM0033-9ZM173 SN04>         at scbus0 target 0 lun 0 (ada0,pass0)
<ST1000NM0033-9ZM173 SN04>         at scbus1 target 0 lun 0 (ada1,pass1)
<AHCI SGPIO Enclosure 1.00 0001>   at scbus6 target 0 lun 0 (ses0,pass2)
How to read raid status here and also run in in smartd ?
 
One uses RAID of card and the other doesn't? Different controllers show different information? There's no real "standard" here, every manufacture uses something else and there are even differences between card models of the same manufacturer.
 
It depend what software raid you are using on the Supermicro server. If it's a GEOM_RAID (intel onbord) you can use graid status
 
On the SuperMicro, you are using software RAID, because the OS sees two real disk drives. How do I know that? Because the model number "ST100..." are the model number of real Seagate drives. The drives are connected via SATA (because they OS sees them as "ada..."). In this situation, you can (and should) be checking the smart status of the individual disks, by setting up smartd, completely the normal way.

In addition, you obviously need to look at the RAID status by using "graid status" like VladiBG set, or whatever is appropriate for your particular RAID solution. On ZFS, I happen to use "zfs status".

For the HP, you need to take the advice from ondra_knezour above, and configure smart to use the "-d cciss..." option. If you look at "man smartd.conf", it is very obvious how to translate that into a device configuration line, just look for the use of the "-d" options in there.
 
Back
Top