UFS Extremely low writes on a SSD - UFS with soft updates, no journaling.

Hi,

I recently installed FreeBSD 11.2 on a PC which was transported with a Transcend SSD, this has UFS with Soft updates enabled(without journaling), I see very low write speeds on this and the PC goes unresponsive when disk IO intensive task is underway. Also the disk was not getting detected occasionally during bootup, causing the BIOS to show "No bootable media" messge, I removed it and refit the cables, now it seems to detect it. I am posting from that PC.

Following is the output when I was copying a 3.9Gb GNU/Linux image to this disk from a USB drive:

Bash:
                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
cpu  user|
     nice|
   system|
interrupt|
     idle|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
md0   MB/s
      tps|
ada0  MB/sXXXX
      tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
da0   MB/sXXX
      tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX127.45


As you can see above the effective speed is less than 10Mb/s, slower than a 100MBit connection.

Output from diskinfo

Bash:
diskinfo -t /dev/ada0
/dev/ada0
    512             # sectorsize
    128035676160    # mediasize in bytes (119G)
    250069680       # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    248085          # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    TS128GSSD370    # Disk descr.
    B552621885      # Disk ident.
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM
    Not_Zoned       # Zone Mode

Seek times:
    Full stroke:      250 iter in   0.814388 sec =    3.258 msec
    Half stroke:      250 iter in   0.985269 sec =    3.941 msec
    Quarter stroke:      500 iter in   1.347342 sec =    2.695 msec
    Short forward:      400 iter in   1.169079 sec =    2.923 msec
    Short backward:      400 iter in   0.606855 sec =    1.517 msec
    Seq outer:     2048 iter in   4.678100 sec =    2.284 msec
    Seq inner:     2048 iter in   3.010615 sec =    1.470 msec

Transfer rates:
    outside:       102400 kbytes in   9.028492 sec =    11342 kbytes/sec
    middle:        102400 kbytes in   4.794113 sec =    21360 kbytes/sec
    inside:        102400 kbytes in   8.194049 sec =    12497 kbytes/sec


I verified whether AHCI is enabled and is loaded and I observed some timeouts in the output:

Code:
# dmesg | grep -i ahci

ahci0: <Intel Panther Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xf7216000-0xf72167ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
ses0 at ahciem0 bus 0 scbus1 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ahcich0: Timeout on slot 31 port 0
ahcich0: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd c0 serr 00000000 cmd 0000df17
(ada0:ahcich0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 22 port 0
ahcich0: is 00000000 cs 00000000 ss effbffff rs effbffff tfd 40 serr 00000000 cmd 0000db17
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 a8 94 90 40 00 00 00 01 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
 
Have you tried this SSD in another computer/different SSD in same PC? It would determine whether SSD/motherboard are themselves faulty, before digging deeper. Also, try changing SATA cable.
 
Hi,

I recently installed FreeBSD 11.2 on a PC which was transported with a Transcend SSD, this has UFS with Soft updates enabled(without journaling), I see very low write speeds on this and the PC goes unresponsive when disk IO intensive task is underway. Also the disk was not getting detected occasionally during bootup, causing the BIOS to show "No bootable media" messge, I removed it and refit the cables, now it seems to detect it. I am posting from that PC.

Following is the output when I was copying a 3.9Gb GNU/Linux image to this disk from a USB drive:

Bash:
                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
cpu  user|
     nice|
   system|
interrupt|
     idle|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
md0   MB/s
      tps|
ada0  MB/sXXXX
      tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
da0   MB/sXXX
      tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX127.45


As you can see above the effective speed is less than 10Mb/s, slower than a 100MBit connection.

Output from diskinfo

Bash:
diskinfo -t /dev/ada0
/dev/ada0
    512             # sectorsize
    128035676160    # mediasize in bytes (119G)
    250069680       # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    248085          # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    TS128GSSD370    # Disk descr.
    B552621885      # Disk ident.
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM
    Not_Zoned       # Zone Mode

Seek times:
    Full stroke:      250 iter in   0.814388 sec =    3.258 msec
    Half stroke:      250 iter in   0.985269 sec =    3.941 msec
    Quarter stroke:      500 iter in   1.347342 sec =    2.695 msec
    Short forward:      400 iter in   1.169079 sec =    2.923 msec
    Short backward:      400 iter in   0.606855 sec =    1.517 msec
    Seq outer:     2048 iter in   4.678100 sec =    2.284 msec
    Seq inner:     2048 iter in   3.010615 sec =    1.470 msec

Transfer rates:
    outside:       102400 kbytes in   9.028492 sec =    11342 kbytes/sec
    middle:        102400 kbytes in   4.794113 sec =    21360 kbytes/sec
    inside:        102400 kbytes in   8.194049 sec =    12497 kbytes/sec


I verified whether AHCI is enabled and is loaded and I observed some timeouts in the output:

Code:
# dmesg | grep -i ahci

ahci0: <Intel Panther Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xf7216000-0xf72167ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
ses0 at ahciem0 bus 0 scbus1 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ahcich0: Timeout on slot 31 port 0
ahcich0: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd c0 serr 00000000 cmd 0000df17
(ada0:ahcich0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 22 port 0
ahcich0: is 00000000 cs 00000000 ss effbffff rs effbffff tfd 40 serr 00000000 cmd 0000db17
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 a8 94 90 40 00 00 00 01 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command

How old is your SSD? SSD tend to become slow after 2 years of use mostly, happened with me recently too. I am currently using Western Digital and its working fine. I would certainly be happy if you get solution of your query here.
 
aht0, unfortunately I can't access my other PC, this PC was working fine and it used to run 10.x before. Though it was using UEFI at that time, now I switched to legacy BIOS on the same machine.

Boone, this SSD is around 3 years old, but it was not in use(maybe one/twice a year), so I got it transported where I currently live.

Just to ensure that the disk alignment is proper:

Code:
# gpart show
=>       34  250069613  ada0  GPT  (119G)
         34          6        - free -  (3.0K)
         40       1024     1  freebsd-boot  (512K)
       1064  245366784     2  freebsd-ufs  (117G)
  245367848    4701798     3  freebsd-swap  (2.2G)
  250069646          1        - free -  (512B)

lebarondemerde, thanks I did not know about these. I ran the command suggested in the post install message. It seems to be healthy.

Code:
% doas smartctl -a /dev/ada0

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SiliconMotion based SSDs
Device Model:     TS128GSSD370
Serial Number:    B552621885
Firmware Version: 20140516
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug 15 23:21:14 2018 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x71) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       59
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       244
192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       11
194 Temperature_Celsius     0x0000   100   100   000    Old_age   Offline      -       38
195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -       24563
196 Reallocated_Event_Count 0x0000   100   100   016    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0000   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0000   100   100   050    Old_age   Offline      -       0
160 Uncorrectable_Error_Cnt 0x0000   100   100   000    Old_age   Offline      -       0
161 Valid_Spare_Block_Cnt   0x0000   100   100   000    Old_age   Offline      -       44
163 Initial_Bad_Block_Count 0x0000   100   100   000    Old_age   Offline      -       26
164 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       6795
165 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       28
166 Min_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       0
167 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       6
168 Max_Erase_Count_of_Spec 0x0000   100   100   000    Old_age   Offline      -       3000
169 Remaining_Lifetime_Perc 0x0000   100   100   000    Old_age   Offline      -       100
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
177 Wear_Leveling_Count     0x0000   100   100   050    Old_age   Offline      -       0
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0
241 Host_Writes_32MiB       0x0000   100   100   000    Old_age   Offline      -       11681
242 Host_Reads_32MiB        0x0000   100   100   000    Old_age   Offline      -       5584

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Ok I will try it tomorrow, or in the weekend( once I get sometime :), pretty late night here.
 
The power_cycle_count seems to be quite high compared to the power_on_hours. I mean I have an SSD with 41000+ power_on_hours that only has a power_cycle_count of 39.
 
@ValdiBG, I tried swapping the cables with another unused one, but the same results with + 2Mbps increase, the other thing that I could do is borrow cables but not possible atm.

SirDice, I actually read the stats now that you pointed out, could not read with the drowsiness yesterday.
Well, this is a desktop which gets turned off when not in use and we have regular power cuts here, I switch it off when not in use, so yeah this is expected. It had a backup small UPS(giving 10 mins backup) at my permanent residence.
Another reason is that I did dabble a bit with TPlink wireless card before on 10.x_RELEASE(to get it working) which caused kernel panics every now and then and most recently due to emulators/virtualbox-ose and x11/nvidia-driver installation from pkg which were complied on 11.1-RELEASE so more panics and reboots. :)


Also to update, the timeouts were seen mostly when the writes were happening, when I leave it on idle I did not see the timeouts. The reads are still excellent when I copied the same(3.9 GB) file back to a USB 3.0 drive.
And the SSD on MBR(or BIOS with legacy only option) might be having some quirk which is causing it to not get detected sometimes. I strongly suspect I need to reformat and force 4k sectors instead of 512 first, then try with UEFI instead of using Legacy only settings in BIOS and GPT-EFI on the SSD.
I am taking this direction as wblock@ made similar observation in some post stating writes take a hit when we have unaligned 512 sectors.
 
Ok, spoke with a colleague who is ready to do this on his windows laptop with a USB adapter.

I ran the long SMART tests today, it reported no errors, ping lebarondemerde.

Code:
 % doas smartctl -a /dev/ada0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SiliconMotion based SSDs
Device Model:     TS128GSSD370
Serial Number:    B552621885
Firmware Version: 20140516
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Aug 17 07:43:33 2018 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x71) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       60
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       245
192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       11
194 Temperature_Celsius     0x0000   100   100   000    Old_age   Offline      -       35
195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -       24563
196 Reallocated_Event_Count 0x0000   100   100   016    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0000   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0000   100   100   050    Old_age   Offline      -       0
160 Uncorrectable_Error_Cnt 0x0000   100   100   000    Old_age   Offline      -       0
161 Valid_Spare_Block_Cnt   0x0000   100   100   000    Old_age   Offline      -       44
163 Initial_Bad_Block_Count 0x0000   100   100   000    Old_age   Offline      -       26
164 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       6795
165 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       28
166 Min_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       0
167 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       6
168 Max_Erase_Count_of_Spec 0x0000   100   100   000    Old_age   Offline      -       3000
169 Remaining_Lifetime_Perc 0x0000   100   100   000    Old_age   Offline      -       100
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
177 Wear_Leveling_Count     0x0000   100   100   050    Old_age   Offline      -       0
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0
241 Host_Writes_32MiB       0x0000   100   100   000    Old_age   Offline      -       11710
242 Host_Reads_32MiB        0x0000   100   100   000    Old_age   Offline      -       5628

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        60         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Ok, looks like I can conclude this. Took sometime as it involved some logistical issues.

I tried using borrowed (look like new/unused)SATA cables, tried using different ports on the Motherboard, finally we connected a USB 2.0 adapter and ran the diagnostic software and the result in all the above was the same - low speeds.

Btw, I swapped it inside a laptop and it seemed to work well, which confused me further. Because it seemed fast enough, the seek times were like 0.0xx msecs, rates were like 4xx MB/sec, when I ran diskinfo -t /dev/ada0.
I thought that maybe the cables have gone bad, or that the SATA III port on the motherboard is faulty, etc.

But on other hardware it is back to low speeds.
If I have any other lead in the future I will update this thread, I might try again with the laptop once to see whether I made any mistake in reading the values off the screen.

Thanks everyone who helped me with this!
 
Ok, I cannot post from the laptop which is running on Live USB of FreeBSD 11.2, but the SSD readings are indeed proper ones, the seek times are in like 0.0xx ms ranges.

Transfer rates:
outside: 102400 kbytes in 0.213434 sec = 479774 kbytes/sec
middle: 102400 kbytes in 0.210541 sec = 486366 kbytes/sec
inside: 102400 kbytes in 0.211868 sec = 483320 kbytes/sec

Boone, the SSD seems fine on a lenevo thinkpad, but my it makes it more confusing to me now. :(

Just to see how the desktop behaves, I moved the magnetic disk I removed from the laptop to the desktop where the SSD showed low performance and it shows proper ~1xx MB/s transfer rates.

Could it be that diskinfo shows incorrect output sometimes?
 
Back
Top