UFS Bad response while reading long long files

Hi all.
I'm trying FreeBSD 10.2-RELEASE(i386) for my new home server.

While testing, I have aware that sometimes the shell operation response is very slow.
I looked for the condition and found, that:
while reading long long files, another disk request may be blocked long time.

Like operations below, simple ls is blocked for several seconds (worst case).

Code:
$ ls -l sample-2G-*
-rw-r--r--  1 uratan  nobody  2147483648 Dec  5 10:02 sample-2G-1.bin
-rw-r--r--  1 uratan  nobody  2147483648 Dec  6 15:33 sample-2G-2.bin

$ (cat sample-2G-[12]* > /dev/null) & (sleep 20; time ls /usr/local/man/)
ja  man1  man3  man4  man5  man6  man7  man8  manm  whatis
  1.46 real  0.00 user  0.00 sys

$ mount
/dev/ada0p5 on /home (ufs, NFS exported, local, noatime, soft-updates)

Don't you have similar experience?
Or please try above commands.
( sleep 20 is a wait to fill the buffer cache by the long file data, force ls (or ls itself) to read from the disk.)
(Using 2 consoles, one for cat and one for ls, is more sensitive for slow response, ... and feel vi xxx or man man takes over 10 seconds for example.)

My current home server FreeBSD 8.1R has no problem.
Before switching to i386, 10.2R amd64 had same problem.

I'm doubting: AdvancedFormat 4KB/sect HDD... (of course formatted by newfs -U -S 4096)

Code:
ada0: <WDC WD40EFRX-68WT0N0 82.00A82> ACS-2 ATA SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>

As for my home server, no other problem is found, this is the only and the last problem.
 
Please find my results below.
Code:
# ls -l ssd_ports_10*
-rw-r--r--  1 root  wheel  2334194538  8 Jan 08:52 ssd_ports_10p2_20160108.gz
-rw-r--r--  1 root  wheel  2331801358 12 Jan 18:25 ssd_ports_10.2-STABLE_20160112.gz
-rw-r--r--  1 root  wheel  2333659434 16 Jan 08:42 ssd_ports_10.2-STABLE_20160116.gz

# (cat ssd_ports* > /dev/null) & (sleep 20; time ls /usr/local/man/)
[1] 5241
cat1     cat5     cat9     en.ISO8859-1   fr.UTF-8   it.UTF-8   man3     man7     manm     ru
cat2     cat6     catl     es     hu     ja     man4     man8     pl     ru.KOI8-R
cat3     cat7     catn     fr     it     man1     man5     man9     pl.ISO8859-2   ru.UTF-8
cat4     cat8     de.ISO8859-1   fr.ISO8859-1   it.ISO8859-1   man2     man6     manl     pl.UTF-8   sr
0.000u 0.013s 0:00.01 100.0%   56+272k 0+0io 0pf+0w

# mount (This is the relevant disk)
/dev/gpt/gpt_reserve on /mnt (ufs, local, soft-updates)

# dmesg | grep ada1
ada1 at ata1 bus 0 scbus1 target 0 lun 0
ada1: <HDT722516DLA380 V43OA96A> ATA-7 SATA 1.x device
ada1: Serial Number VDBE1BTCE4J8PP
ada1: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes)
ada1: 157066MB (321672960 512 byte sectors)
ada1: Previously was known as ad2
After showing the prompt the cat(1) process is running as pid 5241 in this specific case. I hope this is helpful for you.
 
There was something mentioned about noatime on the mailing lists, but I don't recall if that was relevant.

Specifying 4K blocks for the filesystem could be a mistake. It does not guarantee that those blocks are actually aligned with the physical blocks on the drive, and might be responsible for other problems. However, misalignment is usually not a problem when reading.
 
Thank you for your actions.

I feel relieved that this symptom is not popular, but on my environment...
I'm thinking (with small act) how to solve this day by day... ;-)

Today, I will show the regularized test process.

At first, Make files for preparation, two long and 10 small files.
To flood the buffer cache, the long files shall be longer than
available memory size.

Code:
$ mkdir test
$ cd test
$ dmesg | grep "avail memory"
avail memory = 1947504640 (1857 MB)

$ dd if=/dev/zero bs=1m count=1857 > L1
1947205632 bytes transferred in 14.201390 secs (137113735 bytes/sec)

$ dd if=/dev/zero bs=1m count=1857 > L2
1947205632 bytes transferred in 14.043931 secs (138651039 bytes/sec)

$ mkdir S
$ cp /etc/fstab S/0
$ ls -l S/0
-rw-r--r--  1 uratan  zinc  461 Jan 30 09:52 S/0

$ for I in 1 2 3 4 5 6 7 8 9; do cp S/0 S/$I; done

$ ls -RF
L1  L2  S/

./S:
0  1  2  3  4  5  6  7  8  9

$ du -sk S/
44  S/

Next, know the static performance, and calc the timing parameter.
To Avoid that small-file-access shall not be insterted just between
the L1 and L2, and to ensure the disk access overlaping,
aim center of the second long file reading.

Code:
$ time cat L1 > /dev/null
  14.45 real  0.37 user  6.51 sys
$ time cat L2 > /dev/null
  13.90 real  0.37 user  6.51 sys
$ time cat S/* > /dev/null
  0.02 real  0.00 user  0.00 sys
$ time cat S/* > /dev/null
  0.00 real  0.00 user  0.00 sys

$ bc -e "(14.45+13.90)/2*1.5" -e quit
21.0

Do the test.
Do not forget to wait the finish of long file access if repeat this test.

Code:
$ (cat L1 L2 > /dev/null) & (sleep 21; time cat S/* > /dev/null)
  2.90 real  0.00 user  0.00 sys
$
[1]  Done  (cat L1 L2 >/dev/null)

I'm collecting infos with this test while changing parameters below:
  • changing sleep timing.
  • file distance of L1/L2 and S/*, or reading order, in the single disk.
  • reading from respective disks, like L* from ada0, S/* from ada1.
  • tuning command queue by camcontrol tags.
  • UFS/ mount parameters like SoftUpdate, noatime, ...

Any other parameters and hints are welcome,
and I want to know how to change the maximum ATA read request length.
(1857MB read must not be requested by only one ATA READ command, isn't it ?)​
 
Hi, I have bought a new Seagate HDD for comparison today and, unfortunately, this symptom is reproduced.

but...

Tuning by both sysctl kern.cam.ada.0.sort_io_queue and camcontrol tags may take me to the solution.

I will report the detail later.


- * - * -

[2016.2.20] add disk info here
Code:
ada0: <ST3000DM001-1ER166 CC25> ACS-2 ATA SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>
 
Last edited:
For waiting. ;-)

I've finally reached to a test script, see the attached file: do_test.sh.

With the script, the test repeated 5 times by sweeping 2 parameters.
sysctl kern.cam.ada.0.sort_io_queue (shown as q=N later)
camcontrol tags (shown as t=N)​

From here, show the results, which are cooked by AWK.

[1] WD40 with my next home server machine
Code:
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
ada0: <WDC WD40EFRX-68WT0N0 82.00A82>
/dev/ada0p5 on /home (ufs, NFS exported, local, noatime, soft-updates)

   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=1,t=16: 3.67 / 4.39 / 4.61: |       |       |       |     =====@=   |    
  q=1,t= 5: 4.95 / 4.99 / 5.08: |       |       |       |       |       @    
  q=1,t= 4: 4.92 / 4.94 / 4.95: |       |       |       |       |       @    
  q=1,t= 3: 4.51 / 5.24 / 5.86: |       |       |       |       |    =====@===
  q=1,t= 2: 4.51 / 4.81 / 5.04: |       |       |       |       |    =@=|    
  q=1,t= 1: 4.81 / 5.04 / 5.75: |       |       |       |       |      =@=====
   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=0,t=16: 3.69 / 4.21 / 4.58: |       |       |       |      ===@==   |    
  q=0,t= 5: 1.27 / 4.15 / 5.77: |       |  ======================@============
  q=0,t= 4: 3.69 / 4.21 / 4.55: |       |       |       |      ===@=    |    
  q=0,t= 3: 1.89 / 2.23 / 2.54: |       |       ==@=    |       |       |    
  q=0,t= 2: 0.25 / 2.02 / 2.54: |  =============@===    |       |       |    
  q=0,t= 1: 0.22 / 0.26 / 0.27: | @     |       |       |       |       |    
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
This system is heavily tuned, noatime option is one of them.
The test is stable, I can get similar result any time.
Inserting sync has no effect.
This disk seems to do re-ordering except (t = 1).

[2] ST3000 with new test machine
I had setup another machine for this report (same motherboard/CPU used),
so this machine is almost plain 10.2R.
3 volumes are prepared, like below.
Code:
/dev/ada0p5 on /home (ufs, local, soft-updates)   <--- newfs -S -U 4096
/dev/ada0p6 on /media (ufs, local)                <--- newfs    -U 4096
/dev/ada0p7 on /mnt (ufs, local)                  <--- newfs (no options)
Code:
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
ada0: <ST3000DM001-1ER166 CC25>
/dev/ada0p5 on /home (ufs, local, soft-updates)

   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=1,t=16: 0.03 / 1.12 / 2.71: |========@============  |       |       |    
  q=1,t= 5: 0.02 / 0.59 / 1.98: |====@==========|       |       |       |    
  q=1,t= 4: 1.34 / 2.05 / 2.93: |       |   ====@====== |       |       |    
  q=1,t= 3: 0.17 / 0.56 / 0.71: | ==@=  |       |       |       |       |    
  q=1,t= 2: 0.70 / 1.25 / 2.01: |      ===@=====|       |       |       |    
  q=1,t= 1: 0.82 / 2.75 / 3.37: |       ==============@====     |       |    
   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=0,t=16: 0.04 / 0.71 / 1.51: |=====@=====    |       |       |       |    
  q=0,t= 5: 0.05 / 0.26 / 0.59: |=@==   |       |       |       |       |    
  q=0,t= 4: 0.08 / 0.10 / 0.14: |@      |       |       |       |       |    
  q=0,t= 3: 0.05 / 0.12 / 0.18: |@      |       |       |       |       |    
  q=0,t= 2: 0.07 / 0.09 / 0.14: |@      |       |       |       |       |    
  q=0,t= 1: 0.10 / 0.13 / 0.18: |@      |       |       |       |       |    
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
ada0: <ST3000DM001-1ER166 CC25>
/dev/ada0p6 on /media (ufs, local)

   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=1,t=16: 0.04 / 1.16 / 2.67: |========@===========   |       |       |    
  q=1,t= 5: 0.13 / 1.53 / 2.43: | ==========@======     |       |       |    
  q=1,t= 4: 0.93 / 1.28 / 1.86: |       ==@==== |       |       |       |    
  q=1,t= 3: 0.31 / 0.64 / 0.79: |  ==@  |       |       |       |       |    
  q=1,t= 2: 0.32 / 0.69 / 1.42: |   ==@====     |       |       |       |    
  q=1,t= 1: 1.63 / 2.31 / 2.73: |       |     ====@===  |       |       |    
   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=0,t=16: 0.05 / 0.51 / 1.51: |===@=======    |       |       |       |    
  q=0,t= 5: 0.10 / 0.29 / 0.94: | @=====|       |       |       |       |    
  q=0,t= 4: 0.08 / 0.26 / 0.83: | @==== |       |       |       |       |    
  q=0,t= 3: 0.07 / 0.14 / 0.18: |@      |       |       |       |       |    
  q=0,t= 2: 0.07 / 0.14 / 0.20: |@      |       |       |       |       |    
  q=0,t= 1: 0.09 / 0.13 / 0.22: |@      |       |       |       |       |    
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
ada0: <ST3000DM001-1ER166 CC25>
/dev/ada0p7 on /mnt (ufs, local)

   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=1,t=16: 0.09 / 1.64 / 3.05: | ===========@==========|       |       |    
  q=1,t= 5: 0.10 / 2.03 / 3.15: | ==============@========       |       |    
  q=1,t= 4: 0.10 / 1.34 / 3.34: | =========@===============     |       |    
  q=1,t= 3: 0.19 / 0.93 / 2.71: |  ====@==============  |       |       |    
  q=1,t= 2: 0.44 / 0.90 / 1.39: |    ==@===     |       |       |       |    
  q=1,t= 1: 0.43 / 2.07 / 5.11: |   =============@=======================    
   -- -- -- min -- ave -- max   0sec- - 1sec- - 2sec- - 3sec- - 4sec- - 5sec
  q=0,t=16: 1.56 / 1.63 / 1.79: |       |    @  |       |       |       |    
  q=0,t= 5: 0.05 / 0.86 / 3.16: |======@=================       |       |    
  q=0,t= 4: 0.09 / 0.34 / 0.65: | =@=   |       |       |       |       |    
  q=0,t= 3: 0.14 / 0.28 / 0.70: | @===  |       |       |       |       |    
  q=0,t= 2: 0.13 / 0.15 / 0.18: |@      |       |       |       |       |    
  q=0,t= 1: 0.13 / 0.16 / 0.19: |@      |       |       |       |       |    
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
The test is something unstable, the reproducibility is rather low.
(the sync effect that I saw can not be reproduced today... X-)
This disk seems not to do re-ordering if (t <= 3).


- * - * -

The Workaround

As conclusion to solve my problem, it is needed to
disable functions about disk request queuing by both 2 settings.

# sysctl kern.cam.ada.0.sort_io_queue=0
# camcontrol tags ada0 -N 1 (even though minimum value is 2...)​

Other static factors I mentioned in message #4 are probably not-guilty.
With the variability of the results, some dynamic factors must
be still involved but I can not see or divide them.

I wish to pass this issue for experts.


- * - * -

notes:

(1)
Replacing reading-long-file operation by raw-reading-by- dd,
did not reproduce this symptom.

(2)
Disk request length is splitted to MAXPHYS defined in /usr/src/sys/sys/param.h,
128kB per requests.

(3)
As my thought, by imagination, re-ordering of disk requests makes
another priority class out of the kernel control,
because it is not guaranteed when executed.

Something like ISB/DSB instructions of ARM architecture,
(Instruction Synchronization Barrier)
(Data Synchronization Barrier)​
or FUA attribute of SCSI command set,
(Force Unit Access)​
some barrier to prevent endless re-ordering shall be needed.

Disk access existence like below may hide the symptom,
which stir/disturb the re-ordering determination moderately.
  • disk accesses from the background jobs ( cron, etc.)
  • file system jobs like update-atime, cache flush
My server is tuned to eliminate these accesses to sleep disks
by ATA stand-by timer function. (see the chart sample below)
x.png
 

Attachments

Last edited by a moderator:
Add a result of non-4Ksect disk with amd64 environmnt.

The symptom and the workaround are same.

Code:
$ uname -misr
FreeBSD 10.2-RELEASE amd64 GENERIC

$ dmesg | grep ada0
ada0: <ST32000542AS CC34> ATA8-ACS SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4

$ mount
/dev/ada0p5 on /home (ufs, local, soft-updates)

$ dmesg | grep avail
avail memory = 3928346624 (3746 MB)

$ ls -lh
-rw-r--r--  1 uratan  zinc   3.7G Mar  6 21:34 L1
-rw-r--r--  1 uratan  zinc   3.7G Mar  6 21:37 L2
drwxr-xr-x  2 uratan  zinc   512B Mar  6 21:43 S

$ (cat L* > /dev/null) & (sleep 53; time cat S/* > /dev/null)
        2.73 real         0.00 user         0.00 sys
$ (cat L* > /dev/null) & (sleep 53; time cat S/* > /dev/null)
        2.24 real         0.00 user         0.00 sys

         |
         | # camconrol tags ada0 -N 1
         | # sysctl kern.cam.ada.0.sort_io_queue=0
         |
         V

$ (cat L* > /dev/null) & (sleep 53; time cat S/* > /dev/null)
        0.24 real         0.00 user         0.00 sys
$ (cat L* > /dev/null) & (sleep 53; time cat S/* > /dev/null)
        0.24 real         0.00 user         0.00 sys
 
Interesting.
I would like to get some FreeBSD experts opinion.
 
Here, I will post a final report.

So the symptom will be appeared not only read operation,
see the movie clip, I had been doing simply filling my disk with single dd by attached shell script.

<It takes 90sec to login>

This test was issued for same environment as message #8,
and it becomes quite better if the workaround issued.


TO: dlegrand
I wish, too.​

TO: Juha Nurmela
sorry not to answer for you.​


See you all again.
 

Attachments

You should also add:
Code:
# Enhance desktop responsiveness under high CPU use (200/224)
kern.sched.preempt_thresh=224
for a desktop computer usage.
 
TO: dlegrand
Code:
  w_behind  p_thresh  sort_io_q  tags   telnet to ls -l done
------------------------------------- -----------------
      1        80       -1 (1)   32       76sec   (10.2R default condition)
      1       224       -1 (1)   32       58sec
      0        80       -1 (1)   32       42sec
      0       224       -1 (1)   32       62sec
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
      1        80         0       1       11sec
------------------------------------- -----------------

Summary of my thought about this bad response symptom:
This symptom can be appeared if your kernel says:
Code:
adaN: Command Queueing enabled
The priority of each disk request is determined by where the disk head position is, not by the kernel scheduler, regardless of the read or write direction.
A workaround is to disable it: (both necessary)
Code:
camcontrol tags adaN -N 1
sysctl kern.cam.adaN.sort_io_queue=0

Any kind of disk filesystem may have this symptom if my thought is correct.

So long really


- * - * -
Imagine at some McDonald's,
5 orders of Hamburger are already queued in the counter staff (kernel),
I ordered a Filet-O-Fish, it accepted as 6th in her queue,
new 10 people has come, and 10 Hamburgers are queued after me.

The clerk in the kitchen (CAM) start to make 15 burgers first,
my Filet-O-Fish is re-ordered to last, 16th in his queue.
He says it is fast and efficient because the flyer is at some little distance,
he walk some steps go and back to it, it is wasteful.

Oh, I see, there comes new 100 more people who want Hamburgers...(cat/dd is still run
nning)

My order is at the top of counter staff's queue now,
but keep last in the kitchen clerk's queue,
when can I get my Filet-O-Fish ?
 
Good work uratan,

I was running a NFS server which was used to build and distribute FreeBSD packages over my local network. I had NFS timeouts under high load and high disk activity when building ports with ports-mgmt/poudriere. This server was running with UFS.
When I will have time, I will try your workaround to see if the NFS timeouts will disappear.

For information, I am now running ZFS on my workstation, and I can do other work while locally building ports with ports-mgmt/poudriere. My system is more responsive with ZFS than with UFS.
 
Back
Top