System: FreeBSD 11.0, Dell PowerEdge 730xd with 256GB of RAM and 140TB of disk on a SAS HBA (not hardware RAID).
We are having a number of pretty big file servers that suddenly experience a dramatic slowdown on read() & pread() response time (while reading files in the root filesystem - *not* on the ZFS data disks, different zpool).
These servers have tens of thousands of filesystems and 100,000 users (which we store in local DB databases under /var/db in order to speed things up).
Normally things work smoothly and (for example) doing a "ls -l /export/students" runs in around 2-3 seconds. But every now and then during the busy hours of the day it takes minutes instead.
Running a "truss -D" on the "ls -l" process shows that suddenly read()/pread() takes 2-3ms instead of 0.01-0.02 ms - and thus everything grinds to a virtual halt...
The system "load" number (as seen with "top") doesn't indicate anything extreme. zpool iostat doesn't show anything extreme either.
> cat /tmp/truss.out | tr '(' ' ' | awk '{N[$2]++; T[$2]+=$1} END { for (t in T) { printf "%f s\t%d\t%-30s\t%f ms\n", T[t], N[t], t, T[t]*1000/N[t] }}' | sort -nr
gives for a fast system (Total time, ncalls, syscall, time/call):
0.179025 s 10300 pread 0.017381 ms
0.102503 s 10831 fstat 0.009464 ms
0.096788 s 6204 read 0.015601 ms
and for when it's slow:
19.197472 s 6204 read 3.094370 ms
17.227360 s 10300 pread 1.672559 ms
0.101685 s 10831 fstat 0.009388 ms
We are looking for ideas where to look for possible causes. And knobs we can tune?
We tried moving parts of the files (our DB-passwd/groups) to /tmp (tmpfs-mounted) but it still takes around the same time.
- Peter
We are having a number of pretty big file servers that suddenly experience a dramatic slowdown on read() & pread() response time (while reading files in the root filesystem - *not* on the ZFS data disks, different zpool).
These servers have tens of thousands of filesystems and 100,000 users (which we store in local DB databases under /var/db in order to speed things up).
Normally things work smoothly and (for example) doing a "ls -l /export/students" runs in around 2-3 seconds. But every now and then during the busy hours of the day it takes minutes instead.
Running a "truss -D" on the "ls -l" process shows that suddenly read()/pread() takes 2-3ms instead of 0.01-0.02 ms - and thus everything grinds to a virtual halt...
The system "load" number (as seen with "top") doesn't indicate anything extreme. zpool iostat doesn't show anything extreme either.
> cat /tmp/truss.out | tr '(' ' ' | awk '{N[$2]++; T[$2]+=$1} END { for (t in T) { printf "%f s\t%d\t%-30s\t%f ms\n", T[t], N[t], t, T[t]*1000/N[t] }}' | sort -nr
gives for a fast system (Total time, ncalls, syscall, time/call):
0.179025 s 10300 pread 0.017381 ms
0.102503 s 10831 fstat 0.009464 ms
0.096788 s 6204 read 0.015601 ms
and for when it's slow:
19.197472 s 6204 read 3.094370 ms
17.227360 s 10300 pread 1.672559 ms
0.101685 s 10831 fstat 0.009388 ms
We are looking for ideas where to look for possible causes. And knobs we can tune?
We tried moving parts of the files (our DB-passwd/groups) to /tmp (tmpfs-mounted) but it still takes around the same time.
- Peter