Disk errors but system functions OK

Can anyone explain what these errors mean?

Code:
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4f dc 74 40 03 00 00 01 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 cb dc 74 40 03 00 00 00 01
(ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 40 0f aa 6c 40 09 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 44 aa 6c 40 09 00 00 40 00
(ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain

I can log in to the system remotely and don't experience any problems. The disk is a 150GB Intel SSD.
Here is part of the output from smartctl:


Code:
=== START OF INFORMATION SECTION ===
Model Family:     Intel X18-M/X25-M G1 SSDs
Device Model:     INTEL SSDSA2MH160G1GN
Serial Number:    CVEM943500RE160PGN
LU WWN Device Id: 5 001517 9590d0654
Firmware Version: 045C8820
User Capacity:    160,041,885,696 bytes [160 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Unavailable
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat Mar 25 10:26:03 2023 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...

Code:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...
Code:
SMART Error Log Version: 1
Warning: ATA error count 11 inconsistent with error log pointer 5
...
Code:
Error 11 occurred at disk power-on lifetime: 52785 hours (2199 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.
Similar msgs occur showing Error 7,8,9,10.

Any enlightenment would be appreciated.
 
Assuming it's the same problem I has, try to change the sata cable or the sata port on the motherboard.
To reproduce the error and see if anything changes run make installworld a few time. For me it was usually crashing on the 1st/2nd try.
 
You don't show the POH numbers.
But with this drive being so old I bet she is nearing death.
Intel X18-M/X25-M G1 SSDs
15 Years is a respectable life.

Similar msgs occur showing Error 7,8,9,10.
Look for the POH or 'Power On Hours' for the drive in smartctl.
It passed SMART test but if high mileage I would be worried.

Make backups and run it into the ground
 
Could you provide context for these problems to be diagnosed.

Did you recently move this drive into a new box and errors show or have you been running FreeBSD with this hardware for a while and errors are just starting or what is the context?

Cables can defiantly show those types errors too but so can a disk controller. But context would help here.
 
Phishfry, much appreciate the insight. Here is some further info:-

Code:
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0000   100   000   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0000   100   000   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0002   001   001   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       52794
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       346
192 Unsafe_Shutdown_Count   0x0002   100   100   000    Old_age   Always       -       147
232 Available_Reservd_Space 0x0003   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0002   041   041   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0000   199   199   000    Old_age   Offline      -       270947
226 Intel_Internal          0x0002   255   000   000    Old_age   Always       -       0
227 Intel_Internal          0x0002   000   000   000    Old_age   Always       -       0
228 Intel_Internal          0x0002   000   000   000    Old_age   Always       -       0

SMART Error Log Version: 1
Warning: ATA error count 11 inconsistent with error log pointer 5

I have just moved this drive from a ThinkPad x61 to a ThinkCentre M92. I rarely need to access the system locally. After having just rebooted I get the following messages shown by dmesg:-

Code:
Root mount waiting for: usbus1 usbus2 CAM
uhub3: 6 ports with 6 removable, self powered
uhub4: 8 ports with 8 removable, self powered
Root mount waiting for: usbus2 CAM
ugen2.3: <vendor 0x0489 product 0xe04e> at usbus2
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
ses0 at ahciem0 bus 0 scbus1 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <INTEL SSDSA2MH160G1GN 045C8820> ATA-7 SATA 2.x device
ada0: Serial Number CVEM943500RE160PGN
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 152627MB (312581808 512 byte sectors)
ada0: quirks=0x1<4K>
ses0: ada0 in 'Slot 00', SATA Slot: scbus0 target 0
mountroot: waiting for device /dev/ufsid/576d01474908d044...
random: unblocking device.
 
I rarely need to access the system locally.
Ok this makes me wonder. Is the local console full of these messages or are they sporadic?

I have seen these errors flood the console so badly you cant get anything done.
With SSH you do not get the messages.
Is that what you are seeing?

The disk don't look bad for its age.
So roughly 6 years of continuous service with 346 disk starts.

Does the drive act right in other computers? Maybe just an hardware incompatibility here.
 
It world really behove you to check if trim is enabled or available. If not enable it.
tunefs -p /dev/ada0psa2 (Or whatever your root drive is.)

tunefs -t enable /dev/ada0psa2
 
Disk is about 6 years old (power-on hours), but total write traffic is only ~8.5 TiB, which is only 53 overwrites of the 160 GiB nominal capacity of the drive. The drive is probably fine. The errors are likely just communication errors. Check all the cables to the drive.

And obviously, don't trust that drive for any long-term storage, but it probably makes a very fine boot drive.

The boot SSD in my server is a little bit older (7.4 years), and has been overwritten 743 times. Fortunately, I have a spare complete copy on a slightly younger drive.
 
Most modern mainboards allow the firmware to be updated from within the BIOS/UEFI itself. You just need to have a USB stick (usually FAT32 or exFAT formatted) with the correct files on it.
 
Back
Top