Solved Random "slow" disk I/O on 10.1-RELEASE

I'm new to FreeBSD. Please pardon me if this is frequently asked or a known issue. I tried hard searching this forum and the web but was not able to find any clues.

I installed FreeBSD in an old Zotac AMD E-350 with a new WD 1TB 2.5" HDD. The OS was installed with standard UFS partitions for / and /home.

The problem I have is that disk I/O is "randomly" / "intermittently" slow. Viewing or saving random text file, with cat or vi may take about 1-2s. It is not reproducible consistently. I could be saving something on rc.conf with no issues, but vi /etc/fstab may take 2s to show up. This also happens on files in the /home partition.

I confirmed that this isn't a disk or hardware issue because I've tried the following to troubleshoot:

1. Re-installing with different partition sizes for / and /home
2. Used one entire / partition
3. Enabled "noatime" on every partition
4. Installed Ubuntu on the hard disk and didn't reproduce the issue
5. Bonnie test doesn't show slowness
6. Rsync write speed into the hard disk is about 310Mbps

In short, the hard disk on FreeBSD performs normally for large sequential read or write. But random sysadmin work like reading or writing files will intermittently pause for 1-2s. It is like the OS may have intermittent slow seek time.

There are not HDD sleep tools installed. It happens randomly when I'm messing around and navigating around the OS. Can you provide clues to where else I can look to find out why FreeBSD is behaving this way?
 
Some useful commands for troubleshooting performance issues:
http://www.brendangregg.com/USEmethod/use-freebsd.html

I didn't see gstat in there but kicking that off in the background and watching it as you go about business is another one. Lastly, considering installing sysutils/smartmontools and checking the drive with smartctl -a /dev/ada0 where that is your disk. Remember that the hard drive may have been kicked or tossed around during shipping and just because it's brand new doesn't prevent it from having issues.
 
Thanks for the pointers. The gstat(8) tool logged the following when vi .cshrc in my home took 2s to appear. It confirms some kinda read slowness:
Code:
dT: 1.022s  w: 1.000s
L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      1      1      4   2321      0      0    0.0  227.1| ada0
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p2
    0      1      1      4   2321      0      0    0.0  227.1| ada0p3
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p4
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/3c2862b7-b455-11e4-ac4d-00012e3a49bb
Seems like the drive is taking a long time to read occasionally. This is what I also captured with iostat -xz:
Code:
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       1.0   0.0     3.9     0.0    0 385.3  38

device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       1.0   0.0     3.8     0.0    0 2340.4 224
Are the numbers above "normal"?

I'm also pretty sure that the HDD is fine because I ran Ubuntu a few days without similar intermittent slowness. Short and extended SMART test passed:
Code:
# smartctl -a /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 10.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue Mobile
Device Model:     WDC WD10JPVX-22JC3T0
Serial Number:    WD-WX11AC43V9K5
LU WWN Device Id: 5 0014ee 6054bdf8d
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Feb 15 01:39:48 2015 SGT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (19200) seconds.
Offline data collection
capabilities:             (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 214) minutes.
Conveyance self-test routine
recommended polling time:     (   5) minutes.
SCT capabilities:           (0x7035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   183   182   021    Pre-fail  Always       -       1841
  4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4178
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       135
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       6449
194 Temperature_Celsius     0x0022   102   089   000    Old_age   Always       -       45
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        23         -
# 2  Short offline       Completed without error       00%        18         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Code:
Model Family:     Western Digital Blue Mobile
...
  4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4178
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always  -  6449
There's your problem. Your drive is continually spinning up and down due to WD's misguided attempts at energy conservation. Download the WDIDLE3 utility zipfile from here and extract it onto bootable DOS media (floppy, USB stick, CD-ROM). Boot that media and issue the command wdidle3 /D to disable the 8-second spindown timer. If WDIDLE3 doesn't work for you, see this WD forum topic.

Similar utilities are available under Linux, but you want to make sure you get one that can make the change permanent and not just until power is cycled.
 
Thanks for the tip. WDIDLE3 didn't solve the issue but it led me on the right direction. APM on the hard disk was the culprit and turning it off resolved the problem:

Code:
# smartctl -g apm /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 10.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

APM level is:     96 (intermediate level with standby)

# smartctl -s apm,off /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 10.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
APM disabled

This can be turned off permanently with smartd:

Code:
# cat /usr/local/etc/smartd.conf
# Turn off APM
/dev/ada0 -e apm,off

# grep smartd /etc/rc.conf
smartd_enable="YES"
 
To configure smartctl to rerun the command on resume after a suspend, add the following line to
/etc/rc.resume
Code:
/usr/local/sbin/smartctl -s apm,off /dev/ada0
and make /etc/rc.resume executable:
Code:
# chmod +x /etc/rc.resume
 
Last edited:
Back
Top