Other ZFS, USB: interface CRC errors

grahamperrin

Daemon

Reaction score: 531
Messages: 1,721

I'll discard this mobile hard disk drive, but for now:
  • are interface CRC errors more likely to occur with USB, than with IDE?
I suspect so …

2021-07-24 02.30.png2021-07-24 02.31.png

Code:
root@mowa219-gjp4-freebsd-d31121-mobile:~ # uptime ; uname -KUv
 3:32AM  up  1:22, 5 users, load averages: 0.07, 0.12, 0.15
FreeBSD 14.0-CURRENT #0 main-f4e67f18b-dirty: Fri Jul 23 23:23:04 BST 2021     root@mowa219-gjp4-freebsd-d31121-diff92312-mobile:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG  1400026 1400026
root@mowa219-gjp4-freebsd-d31121-mobile:~ # zpool status -v
  pool: d31121
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:17:58 with 0 errors on Fri Jul 23 21:20:49 2021
config:

        NAME        STATE     READ WRITE CKSUM
        d31121      ONLINE       0     0     0
          da0p4     ONLINE       0    35     0

errors: No known data errors
root@mowa219-gjp4-freebsd-d31121-mobile:~ # lsblk
DEVICE         MAJ:MIN SIZE TYPE                              LABEL MOUNT
ada0             0:110 466G GPT                                   - -
  <FREE>         -:-   1.0M -                                     - -
  ada0p1         0:112 466G freebsd-ufs           gpt/FreeBSD%20UFS -
cd0              0:114   0B -                                     - -
da0              0:134 233G GPT                                   - -
  da0p1          0:135 260M efi                   gpt/FreeBSD%20UFS -
  da0p2          0:136 512K freebsd-boot               gpt/gptboot0 -
  <FREE>         -:-   492K -                                     - -
  da0p3          0:137  16G freebsd-swap                  gpt/swap0 SWAP
  da0p4          0:140 217G freebsd-zfs                    gpt/zfs0 <ZFS>
  <FREE>         -:-   164K -                                     - -
root@mowa219-gjp4-freebsd-d31121-mobile:~ # smartctl -a /dev/da0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 14.0-CURRENT amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HM251JX
Serial Number:    29251B141A0T0Z
LU WWN Device Id: 5 0f0000 01411040a
Firmware Version: 2AF00_01
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 0
SATA Version is:  SATA 2.5, 1.5 Gb/s
Local Time is:    Sat Jul 24 03:33:04 2021 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 245) Self-test routine in progress...
                                        50% of test remaining.
Total time to complete Offline
data collection:                (  102) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 102) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0007   252   252   025    Pre-fail  Always       -       2750
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       880
  5 Reallocated_Sector_Ct   0x0033   091   091   010    Pre-fail  Always       -       92
  7 Seek_Error_Rate         0x000e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   252   252   000    Old_age   Always       -       8
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       298
191 G-Sense_Error_Rate      0x0032   252   252   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       222
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       46 (Min/Max 13/46)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       2
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       7
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       95
201 Soft_Read_Error_Rate    0x0032   252   252   000    Old_age   Always       -       0
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       13
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       2947

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         6         -
# 2  Extended offline    Aborted by host               90%         1         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Self_test_in_progress [50% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@mowa219-gjp4-freebsd-d31121-mobile:~ # grep vdev /var/log/messages
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2062]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150365696 size=61440 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2066]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=4396314624 size=1024 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2070]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150369792 size=1024 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2074]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150382080 size=4096 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2078]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150373888 size=4096 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2082]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150398464 size=28672 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2086]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150386176 size=12288 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2090]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150361600 size=512 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2094]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150365696 size=1024 error=5
Jul 23 19:33:55 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[2098]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=9150377984 size=4096 error=5
Jul 23 19:50:35 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1426]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34365239296 size=4096 error=5
Jul 23 20:03:42 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1408]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=21480337408 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1315]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660968448 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1319]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36514074624 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1323]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660931584 size=12288 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1327]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660976640 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1331]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=21485654016 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1335]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=17203744768 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1339]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660939776 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1343]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34368606208 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1347]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=8894406656 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1351]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=27980627968 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1355]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660931584 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1359]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=30071943168 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1363]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660956160 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1367]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=30071918592 size=4096 error=5
Jul 24 01:43:20 mowa219-gjp4-freebsd-d31121-diff92312-mobile ZFS[1371]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38660935680 size=4096 error=5
Jul 24 02:16:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1630]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=232605097984 size=8192 error=5
Jul 24 02:16:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1634]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=232605360128 size=8192 error=5
Jul 24 02:16:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1638]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=270336 size=8192 error=5
Jul 24 02:16:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1642]: vdev probe failure, zpool=d31121 path=/dev/da0p4
Jul 24 02:19:09 mowa219-gjp4-freebsd-d31121-mobile ZFS[1674]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=4404297728 size=4096 error=5
Jul 24 02:26:26 mowa219-gjp4-freebsd-d31121-mobile ZFS[1724]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34395947008 size=8192 error=5
Jul 24 02:26:26 mowa219-gjp4-freebsd-d31121-mobile ZFS[1728]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34395758592 size=4096 error=5
Jul 24 02:26:26 mowa219-gjp4-freebsd-d31121-mobile ZFS[1732]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34395947008 size=4096 error=5
Jul 24 02:26:26 mowa219-gjp4-freebsd-d31121-mobile ZFS[1736]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=4404719616 size=4096 error=5
Jul 24 02:26:26 mowa219-gjp4-freebsd-d31121-mobile ZFS[1740]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34395951104 size=4096 error=5
Jul 24 02:33:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[1800]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=53693538304 size=8192 error=5
Jul 24 02:39:03 mowa219-gjp4-freebsd-d31121-mobile ZFS[1815]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38664474624 size=512 error=5
Jul 24 02:39:03 mowa219-gjp4-freebsd-d31121-mobile ZFS[1819]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36516069376 size=4096 error=5
Jul 24 02:39:03 mowa219-gjp4-freebsd-d31121-mobile ZFS[1823]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34423980032 size=4096 error=5
Jul 24 02:44:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[1840]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=53693669376 size=8192 error=5
Jul 24 02:48:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[1848]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38664863744 size=512 error=5
Jul 24 02:48:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[1852]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36516237312 size=4096 error=5
Jul 24 02:48:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[1856]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34426204160 size=4096 error=5
Jul 24 02:53:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1864]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36516352000 size=4096 error=5
Jul 24 02:53:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1868]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38665248768 size=512 error=5
Jul 24 02:53:59 mowa219-gjp4-freebsd-d31121-mobile ZFS[1872]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34428325888 size=4096 error=5
Jul 24 02:57:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[1887]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38665433088 size=512 error=5
Jul 24 02:57:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[1891]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36516433920 size=4096 error=5
Jul 24 02:57:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[1895]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34441207808 size=4096 error=5
Jul 24 03:18:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[2688]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38667341824 size=512 error=5
Jul 24 03:18:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[2692]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34479443968 size=4096 error=5
Jul 24 03:18:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[2696]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36570902528 size=4096 error=5
Jul 24 03:24:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[2984]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38667726848 size=512 error=5
Jul 24 03:24:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[2988]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36580663296 size=4096 error=5
Jul 24 03:24:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[2992]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34482216960 size=4096 error=5
Jul 24 03:27:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[2999]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=36581105664 size=4096 error=5
Jul 24 03:27:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[3003]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=38667890688 size=512 error=5
Jul 24 03:27:02 mowa219-gjp4-freebsd-d31121-mobile ZFS[3007]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=34482962432 size=4096 error=5
Jul 24 03:33:00 mowa219-gjp4-freebsd-d31121-mobile ZFS[3028]: vdev I/O failure, zpool=d31121 path=/dev/da0p4 offset=53692710912 size=8192 error=5
root@mowa219-gjp4-freebsd-d31121-mobile:~ #
 

Attachments

  • 2021-07-24 02.30.png
    2021-07-24 02.30.png
    324.8 KB · Views: 12

ralphbsz

Son of Beastie

Reaction score: 2,301
Messages: 3,212

Personal experience: Based solely on having ONE backup disk that is connected via USB to a FreeBSD machine, and forms a ZFS pool. The original version used a Seagate 1TB disk with USB 2.0, and was so unreliable, it would crash and burn regularly: A few IO errors per day, and full hangs (requiring power cycling of disk + server) perhaps every week or month. Not production quality. Strangely, the exact same disk connected to a Mac works flawlessly. So I went to an eSATA for my external backup disk (I need something with about 6ft = 2m of cable). That worked pretty well, but was painful: The external eSATA enclosure is big, it needs a bigger power cable, and it was still not 100% reliable. But it was workable. Eventually, the big hyper-market chain "Costco" had a 2.5" Seagate backup disk for sale (I think 4TB for $60 or so, probably shingled), which has a USB-3.0 interface and needs no external power. Connected it via USB-3.0, and it has worked flawlessly. No hassle, no maintenance problems. Small and convenient.

Professional experience: In enterprise computing, nobody would ever consider connecting a disk drive via USB. It's just an insanely bad idea. Way too unreliable, hard to debug. SATA or SAS (or various modern interfaces for flash), or go home.
 
OP
grahamperrin

grahamperrin

Daemon

Reaction score: 531
Messages: 1,721

… USB 2.0, and was so unreliable, … same disk connected to a Mac works flawlessly. …

Generally: I do sense that Mac OS X is more reliable, than FreeBSD, for hard disk drives on USB 2.0.
 

hardworkingnewbie

Active Member

Reaction score: 150
Messages: 153

People who are dead serious about data safety don't use external USB drives. Period. Some dedicated RAID distributions outside will even downright refuse to create anything RAID on USB HDDs at all, e.g. OpenMediaVault.

The reasons why are quite simple:
a) USB is very much unreliable, and some people even don't use an external powered USB hub for that or HDD case with own power supply. Even aside that, it is being considered as unreliable and will bite you in the ass if you don't expect it to do so.
b) the USB to SATA bridges normally have a wide variety of quality, normally from mediocre to absolutely shit and even more important often only offer a certain subset of SATA functionality. Nothing you should really trust important data with right from the beginning.
c) the HDDs built into USB enclosures are normally very cheaply built, and therefore from bad quality as well.

If you care about data safety, then better invest into the proper hardware which can deliver it.
 

hardworkingnewbie

Active Member

Reaction score: 150
Messages: 153

Generally: I do sense that Mac OS X is more reliable, than FreeBSD, for hard disk drives on USB 2.0.
That's quite a bold claim to make. I consider it as wrong.

In my opinion it's not MacOS, but the limited range of hardware found in Apple products. Apple normally tends to put quite good and reliable periphery parts into its computers. Normal mainboards on the other hand offer the broad range from shit, average up to very good. So it really depends on which type of mainboard you put your FreeBSD to use.
 
OP
grahamperrin

grahamperrin

Daemon

Reaction score: 531
Messages: 1,721

… which type …

For the case in the opening post:
My current everyday computer:
– for the mobile hard disk drive with ZFS that I use for my VirtualBox data, I nearly always make a direct connection to one of the four USB-only ports in the 8570p (rarely or never the eSATA/USB 2.0 combo port, never the dock).

Others used include:
 
OP
grahamperrin

grahamperrin

Daemon

Reaction score: 531
Messages: 1,721

I'll discard this mobile hard disk drive, …

With the disk still at the USB 2.0 port on the left of the ZBook, I decided to push things to the limit with StressDisk targeting /var/tmp. Observations during this period, via ssh (before things were pushed too hard):
  • no vdev I/O failures
  • numerous CAM status: CCB request completed with an error but no recorded showstopper
  • StressDisk wrote as much as possible then, without error, removed the incomplete file
  • the lowest free reported by zpool iostat 60 was 6.69 G, this probably coincided with the incomplete file
  • things ground to a halt around six minutes later, whilst StressDisk read its test files
  • around twenty minutes before things ground to a halt, the pool was error-free.
I'll force off the computer then review the state of the disk.

Code:
root@mowa219-gjp4-freebsd-d31121-mobile:~ # date ; uptime ; zpool status -v
Sat Jul 24 09:14:48 BST 2021
 9:14AM  up  7:04, 5 users, load averages: 0.06, 0.08, 0.08
  pool: d31121
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:17:58 with 0 errors on Fri Jul 23 21:20:49 2021
config:

        NAME        STATE     READ WRITE CKSUM
        d31121      ONLINE       0     0     0
          da0p4     ONLINE       0   200     0

errors: No known data errors
root@mowa219-gjp4-freebsd-d31121-mobile:~ # zpool clear d31121
root@mowa219-gjp4-freebsd-d31121-mobile:~ # zpool status
  pool: d31121
 state: ONLINE
  scan: scrub repaired 0B in 00:17:58 with 0 errors on Fri Jul 23 21:20:49 2021
config:

        NAME        STATE     READ WRITE CKSUM
        d31121      ONLINE       0     0     0
          da0p4     ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-freebsd-d31121-mobile:~ # stressdisk run /var/tmp
2021/07/24 09:15:29 loaded statsfile "stressdisk_stats.json"
2021/07/24 09:15:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2666492 MByte ( 339.89 MByte/s)
Errors:                 0
Elapsed time:  124.305µs

2021/07/24 09:15:29 No check files - generating
2021/07/24 09:15:29 Writing file "/var/tmp/TST_0000" size 1000000000
2021/07/24 09:15:37 Writing file "/var/tmp/TST_0001" size 1000000000
2021/07/24 09:16:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2668304 MByte ( 337.54 MByte/s)
Errors:                 0
Elapsed time:  1m0.025315424s

2021/07/24 09:16:32 Writing file "/var/tmp/TST_0002" size 1000000000
2021/07/24 09:17:02 Writing file "/var/tmp/TST_0003" size 1000000000
2021/07/24 09:17:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2670212 MByte ( 335.24 MByte/s)
Errors:                 0
Elapsed time:  2m0.003464367s

…

2021/07/24 11:12:34 Writing file "/var/tmp/TST_0200" size 1000000000
2021/07/24 11:13:16 Writing file "/var/tmp/TST_0201" size 1000000000
2021/07/24 11:13:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2858580 MByte ( 192.26 MByte/s)
Errors:                 0
Elapsed time:  1h58m0.085253107s

2021/07/24 11:13:53 Writing file "/var/tmp/TST_0202" size 1000000000
2021/07/24 11:14:26 Writing file "/var/tmp/TST_0203" size 1000000000
2021/07/24 11:14:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2860192 MByte ( 191.60 MByte/s)
Errors:                 0
Elapsed time:  1h59m0.000715228s

2021/07/24 11:15:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2860644 MByte ( 190.86 MByte/s)
Errors:                 0
Elapsed time:  2h0m0.001666767s

2021/07/24 11:15:54 Writing file "/var/tmp/TST_0204" size 1000000000
2021/07/24 11:16:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2861854 MByte ( 190.18 MByte/s)
Errors:                 0
Elapsed time:  2h1m0.097566383s

2021/07/24 11:16:43 Writing file "/var/tmp/TST_0205" size 1000000000
2021/07/24 11:17:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2862732 MByte ( 189.52 MByte/s)
Errors:                 0
Elapsed time:  2h2m0.168441916s

2021/07/24 11:18:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2862856 MByte ( 188.78 MByte/s)
Errors:                 0
Elapsed time:  2h3m0.004272241s

2021/07/24 11:19:23 Error while writing "/var/tmp/TST_0205"
2021/07/24 11:19:23 Removing incomplete file "/var/tmp/TST_0205"
2021/07/24 11:19:29
Bytes read:       4149588 MByte ( 934.58 MByte/s)
Bytes written:    2862874 MByte ( 188.04 MByte/s)
Errors:                 0
Elapsed time:  2h4m0.007930008s

2021/07/24 11:19:40 Starting round 1
2021/07/24 11:19:40 Reading file "/var/tmp/TST_0048", "/var/tmp/TST_0001"
2021/07/24 11:20:29
Bytes read:       4150326 MByte ( 924.39 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  2h5m0.001268944s

2021/07/24 11:21:15 Reading file "/var/tmp/TST_0082", "/var/tmp/TST_0052"
2021/07/24 11:21:29
Bytes read:       4151866 MByte ( 912.54 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  2h6m0.002999842s

2021/07/24 11:22:27 Reading file "/var/tmp/TST_0197", "/var/tmp/TST_0083"
2021/07/24 11:22:29
Bytes read:       4153452 MByte ( 901.02 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  2h7m0.001315757s

2021/07/24 11:23:29
Bytes read:       4155022 MByte ( 889.77 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  2h8m0.013232815s

2021/07/24 11:23:40 Reading file "/var/tmp/TST_0091", "/var/tmp/TST_0150"
2021/07/24 11:24:29
Bytes read:       4156560 MByte ( 878.81 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  2h9m0.02157078s

2021/07/24 11:25:29
Bytes read:       4156980 MByte ( 867.89 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  2h10m0.008553071s

^C  
load: 0.00  cmd: stressdisk 4970 [uwait] 16764.34r 347.58u 88.65s 0% 27696k
mi_switch+0xc1 sleepq_catch_signals+0x31a sleepq_wait_sig+0x9 _sleep+0x1be umtxq_sleep+0x143 do_wait+0x48c __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7a amd64_syscall+0x10c fast_syscall_common+0xf8

Code:
root@mowa219-gjp4-freebsd-d31121-mobile:~ # date ; zpool status -v
Sat Jul 24 11:04:33 BST 2021
  pool: d31121
 state: ONLINE
  scan: scrub repaired 0B in 00:17:58 with 0 errors on Fri Jul 23 21:20:49 2021
config:

        NAME        STATE     READ WRITE CKSUM
        d31121      ONLINE       0     0     0
          da0p4     ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-freebsd-d31121-mobile:~ # zpool iostat 60
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
d31121       191G  24.7G      9     12   150K  5.63M
d31121       193G  22.9G      0     75  4.86K  30.3M
d31121       195G  21.0G      0     76  3.27K  30.6M
d31121       196G  20.1G      0     36  1.60K  17.4M
d31121       198G  18.2G      0     64  1.13K  31.0M
d31121       199G  17.1G      0     32  1.26K  17.6M
d31121       200G  15.7G      0     40  3.53K  25.3M
d31121       202G  13.9G      0     44  1.93K  29.7M
d31121       203G  12.5G      0     38  3.87K  23.9M
d31121       205G  11.0G      0     42  2.46K  24.7M
d31121       206G  9.55G      0     58  2.40K  24.6M
d31121       207G  8.96G      0    133  1.67K  10.1M
d31121       208G  7.68G      0    109  3.33K  22.0M
d31121       209G  6.78G      1    137  14.3K  15.9M
d31121       209G  6.70G     14    153   476K  2.21M
d31121       209G  6.69G     32    154   425K  1.23M
d31121       208G  7.56G     27      5  15.4M  43.0K
d31121       208G  7.56G     29      4  25.9M  24.3K
d31121       208G  7.56G     28      3  26.5M  19.7K
d31121       208G  7.56G     27      1  26.2M  9.07K
d31121       208G  7.56G     27      2  25.5M  14.0K
^C  
load: 0.11  cmd: zpool 5123 [spa->spa_errlog_lock] 5487.03r 0.00u 0.01s 0% 6708k
mi_switch+0xc1 _sx_xlock_hard+0x3e1 spa_get_errlog_size+0x15c spa_get_stats+0xd6 zfs_ioc_pool_stats+0x22 zfsdev_ioctl_common+0x4e3 zfsdev_ioctl+0x143 devfs_ioctl+0xc6 vn_ioctl+0x1a4 devfs_ioctl_f+0x1e kern_ioctl+0x25b sys_ioctl+0xf1 amd64_syscall+0x10c fast_syscall_common+0xf8

tail -f -n 82 /var/log/messages
 

Attachments

  • tail -f -n 82 messages.txt
    60.7 KB · Views: 12
OP
grahamperrin

grahamperrin

Daemon

Reaction score: 531
Messages: 1,721

I'll force off the computer then review the state of the disk.

An error-free pool, and no increase in the number of reallocated sectors.

An increase in the number of interface CRC errors, from 7 to 8.

Code:
root@mowa219-gjp4-freebsd-d31121-mobile:~ # date ; uptime ; zpool status -v
Sat Jul 24 14:11:58 BST 2021
 2:11PM  up 3 mins, 1 user, load averages: 0.15, 0.10, 0.04
  pool: d31121
 state: ONLINE
  scan: scrub repaired 0B in 00:17:58 with 0 errors on Fri Jul 23 21:20:49 2021
config:

        NAME        STATE     READ WRITE CKSUM
        d31121      ONLINE       0     0     0
          da0p4     ONLINE       0     0     0

errors: No known data errors
root@mowa219-gjp4-freebsd-d31121-mobile:~ # smartctl -a /dev/da0 | grep Reallocated_Sector_Ct
  5 Reallocated_Sector_Ct   0x0033   091   091   010    Pre-fail  Always       -       92
root@mowa219-gjp4-freebsd-d31121-mobile:~ # zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
d31121   216G   208G  7.56G        -         -    77%    96%  1.00x    ONLINE  -
root@mowa219-gjp4-freebsd-d31121-mobile:~ # stressdisk clean /var/tmp
2021/07/24 14:12:28 loaded statsfile "stressdisk_stats.json"
2021/07/24 14:12:28
Bytes read:       4156562 MByte ( 878.81 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  12.022402ms

2021/07/24 14:12:28 Starting round 1
2021/07/24 14:12:28 Removing 205 check files
2021/07/24 14:12:28 Removing file "/var/tmp/TST_0000"
2021/07/24 14:12:28 Removing file "/var/tmp/TST_0001"
…
2021/07/24 14:12:42 Removing file "/var/tmp/TST_0203"
2021/07/24 14:12:42 Removing file "/var/tmp/TST_0204"
2021/07/24 14:12:43 All done
2021/07/24 14:12:43
Bytes read:       4156562 MByte ( 878.81 MByte/s)
Bytes written:    2862874 MByte ( 187.91 MByte/s)
Errors:                 0
Elapsed time:  14.944726834s

2021/07/24 14:12:43 PASSED with no errors
root@mowa219-gjp4-freebsd-d31121-mobile:~ # zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
d31121   216G  17.2G   199G        -         -     0%     7%  1.00x    ONLINE  -
root@mowa219-gjp4-freebsd-d31121-mobile:~ # exit
logout
Connection to 192.168.1.4 closed.
%
 
Top