Bad performance on FreeBSD 7.1 RELEASE AMD64

Hi,

I've been using all of the major BSD off and on for about 3 or 4 years. I just got some new boxes and needed a good desktop so I tried all the latest releases and I just about settled on FreeBSD AMD64 for this machine but after using it for a full day the performance feels really bad.

I know that saying the performance feels bad doesn't mean much so I ran some programs to try to quantify it.

I used rarcrack to brute force passwords against a rar archive with a long password. To me it's a good example of CPU throughput that doesn't use much memory. On a really slow Linux on the same box (openSUSE 11.1) I can test about 750 passwords per second. On the same archive on the same box running FreeBSD AMD64 it only processes 22 passwords a second. I don't understand how this can be happening.

I built ubench from ports and the results were:

Ubench CPU: 761518
Ubench MEM: 255535
--------------------
Ubench AVG: 508526

According to the list published on phystech these numbers look pretty good. But the system feels extremely sluggish (applications take forever to load) and there are other performance problems. Most of my downloads building ports die in the middle, it took me ages to get things built.

I don't have much disk space left on this box but I left a primary partition on one of the drives so I may try to install i386 again run ubench on that arch to see if it makes any difference.

Any ideas, fellas?

Cheers,
Randall
 
I suppose your hard disk mode is misdetected.

Run [cmd=atacontrol]mode <dev>[/cmd] to check the detected mode. You can force change it with that command if the mode is wrong. On my system the output looks like that:
Code:
# atacontrol mode ad4
current mode = SATA150

I have this line in my /etc/rc.local file, because my DVD-Burner is wrongly detected as PIO4:
Code:
/sbin/atacontrol mode acd0 WDMA2

The atacontrol(8) manual page states the available modes (apart from the SATA modes).
 
kamikaze said:
I suppose your hard disk mode is misdetected.

Run [cmd=atacontrol]mode <dev>[/cmd] to check the detected mode. You can force change it with that command if the mode is wrong. On my system the output looks like that:
Code:
# atacontrol mode ad4
current mode = SATA150

I have this line in my /etc/rc.local file, because my DVD-Burner is wrongly detected as PIO4:
Code:
/sbin/atacontrol mode acd0 WDMA2

The atacontrol(8) manual page states the available modes (apart from the SATA modes).

Hi thanks for your idea, I think you are on the right track. ubench shows very high numbers but the system still feels very slow and doesn't give much throughput on rarcrack.

I checked and it's running SATA150 mode.

I tried new installs both i386 and AMD64 with and without softdep and I can still only test 22-25 passwords/second.

Anything else to check, guys?
 
# dd bs=1m if=/dev/zero of=test count=1024
# dd bs=1m if=test of=/dev/null

You can check your file system read and write performance to get a clue weather this is a HD problem. The read command (2nd one) will read from the cache, so you might want to reboot and run it a second time to get your read speed for uncached data.
 
It's certainly not a hardware problem- it's contained within FreeBSD. I documented the performance difference running openSUSE on the same box.

Not sure where to look next.
 
I will look in ports/benchmarks to see if there's some filesystem benchmarking.

Interestingly and unrelated, i386 runs significantly slower Ubench on the same box.
 
richardpl said:
Can you provide more details.
vmstat -i, uptime, top, ...

I'm not sure what you are asking for here with uptime and top. I'll post vmstat -i in a few minutes, running benchmarks now.
 
This is an Intel E8400 Core 2 Duo box on MSI motherboard, 4G RAM, Seagate Barracuda 7200.11 drives.

Some benchmarks:

Bonnie 2.0.6 from ports

Code:
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 112685 37.4 95438  7.4 177180 17.3 298364 97.5 3253065 117.3 328569.1 188.1

UnixBench 4.1

Code:
   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

                 4        1           Based on the Byte Magazine Unix Benchmark
                44       11
   v   v       4 4        1
    v v       44444       1           v4.1 revisions mostly by David C. Niemi,
     v           4   o   111          Reston, VA, USA  <niemi@tux.org>
 


Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

System Call Overhead  1 2 3 4 5 6 7 8 9 10

Pipe Throughput  1 2 3 4 5 6 7 8 9 10

Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

Process Creation  1 2 3

Execl Throughput  1 2 3

Filesystem Throughput 1024 bufsize 2000 maxblocks  1 2 3

Filesystem Throughput 256 bufsize 500 maxblocks  1 2 3

Filesystem Throughput 4096 bufsize 8000 maxblocks  1 2 3

Shell Scripts (1 concurrent)  1 2 3
Shell Scripts (8 concurrent)  1 2 3
Shell Scripts (16 concurrent)  1 2 3

Arithmetic Test (type = short)  1 2 3

Arithmetic Test (type = int)  1 2 3

Arithmetic Test (type = long)  1 2 3

Arithmetic Test (type = float)  1 2 3

Arithmetic Test (type = double)  1 2 3

Arithoh  1 2 3

C Compiler Throughput  1 2 3

Dc: sqrt(2) to 99 decimal places  1 2 3

Recursion Test--Tower of Hanoi  1 2 3

==============================================================

  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- localhost.invalid.org
  Start Benchmark Run: Tue Jan 13 16:43:08 UTC 2009
   2 interactive users.
   4:43PM  up  1:13, 2 users, load averages: 0.00, 0.08, 0.40
  -r-xr-xr-x  1 root  wheel  132064 Jan  1 07:48 /bin/sh
  /bin/sh: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), for FreeBSD 7.1, dynamically linked (uses shared libs), FreeBSD-style, stripped
  /dev/ad4s1d    16244334 4116138 10828650    28%    /usr
Dhrystone 2 using register variables     17392567.4 lps   (10.0 secs, 10 samples)
Double-Precision Whetstone                 3985.7 MWIPS (9.8 secs, 10 samples)
System Call Overhead                     1139565.4 lps   (10.0 secs, 10 samples)
Pipe Throughput                          1415830.9 lps   (10.0 secs, 10 samples)
Pipe-based Context Switching             279496.7 lps   (10.0 secs, 10 samples)
Process Creation                          11042.3 lps   (30.0 secs, 3 samples)
Execl Throughput                           3052.1 lps   (29.8 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks    1115016.0 KBps  (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks    75095.0 KBps  (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks     76621.0 KBps  (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks      300892.0 KBps  (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks     112166.0 KBps  (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks      113295.0 KBps  (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks    2511317.0 KBps  (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks   103730.0 KBps  (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks     90873.0 KBps  (30.0 secs, 3 samples)
Shell Scripts (1 concurrent)               4462.8 lpm   (59.5 secs, 3 samples)
Shell Scripts (8 concurrent)                802.3 lpm   (59.5 secs, 3 samples)
Shell Scripts (16 concurrent)               421.2 lpm   (59.5 secs, 3 samples)
Arithmetic Test (type = short)           2875767.9 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = int)             2909285.8 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = long)            809862.3 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = float)           2428436.8 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = double)          1483392.5 lps   (10.0 secs, 3 samples)
Arithoh                                  428084233.6 lps   (10.0 secs, 3 samples)
C Compiler Throughput                      2163.1 lpm   (59.8 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places         238913.3 lpm   (30.0 secs, 3 samples)
Recursion Test--Tower of Hanoi           186968.7 lps   (20.0 secs, 3 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Dhrystone 2 using register variables        116700.0 17392567.4     1490.4
Double-Precision Whetstone                      55.0     3985.7      724.7
Execl Throughput                                43.0     3052.1      709.8
File Copy 1024 bufsize 2000 maxblocks         3960.0    76621.0      193.5
File Copy 256 bufsize 500 maxblocks           1655.0   113295.0      684.6
File Copy 4096 bufsize 8000 maxblocks         5800.0    90873.0      156.7
Pipe Throughput                              12440.0  1415830.9     1138.1
Pipe-based Context Switching                  4000.0   279496.7      698.7
Process Creation                               126.0    11042.3      876.4
Shell Scripts (8 concurrent)                     6.0      802.3     1337.2
System Call Overhead                         15000.0  1139565.4      759.7
                                                                 =========
     FINAL SCORE                                                     665.1

vmstat -i

Code:
interrupt                          total       rate
irq1: atkbd0                        3249          0
irq6: fdc0                            14          0
irq12: psm0                        67835          8
irq18: re0 uhci2                     309          0
irq19: uhci1+                    3325168        432
cpu0: timer                     15478820       2014
cpu1: timer                     15478379       2014
Total                           34353774       4471

Ubench

Code:
Unix Benchmark Utility v.0.3
Copyright (C) July, 1999 PhysTech, Inc.
Author: Sergei Viznyuk <sv-obfuscated-mailaddr@phystech.com>
http://www.phystech.com/download/ubench.html
FreeBSD 7.1-RELEASE FreeBSD 7.1-RELEASE #0: Thu Jan  1 08:58:24 UTC 2009     root-obfuscated-mailaddr@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
Ubench CPU:   761053
Ubench MEM:   254555
--------------------
Ubench AVG:   507804
 
It is not help for you but my experience with 7.1 on 386 system is that is machine slower as on 7.0. I don't know if is a problem because I have /usr and /var gjournal or something else.
For example: on 7.0 I install from ports OpenOffice 3.0 about 6-7 hours on 7.1 12!! Before if I were compailling something and athe same time working on the KDE it was not a problem, on 7.1 is masochism. I have the same configuration and settings as before on 7.0.
 
@randux: Could you please paste the output of the following two commands (running in two separate terminals) when the problematic test is running? I.e., run "vmstat 10", "iostat 10", then switch to another terminal, wait 20 seconds, run the problematic program and leave it running for a few minutes. Then paste the vmstat and iostat output here. Commands:

vmstat 10

iostat 10
 
lumiwa said:
It is not help for you but my experience with 7.1 on 386 system is that is machine slower as on 7.0. I don't know if is a problem because I have /usr and /var gjournal or something else.
For example: on 7.0 I install from ports OpenOffice 3.0 about 6-7 hours on 7.1 12!! Before if I were compailling something and athe same time working on the KDE it was not a problem, on 7.1 is masochism. I have the same configuration and settings as before on 7.0.

That's really an incredible difference. I hope the devs will look at all these posts and fix the problem. Thanks for your post.
 
trasz@ said:
@randux: Could you please paste the output of the following two commands (running in two separate terminals) when the problematic test is running? I.e., run "vmstat 10", "iostat 10", then switch to another terminal, wait 20 seconds, run the problematic program and leave it running for a few minutes. Then paste the vmstat and iostat output here. Commands:

vmstat 10

iostat 10

Hi, here is the info:

vmstat @ http://randux.pastebin.com/m44c1235d
iostat @ http://randux.pastebin.com/m697766a8

Thank you.
 
SaveTheRbtz said:
Maybe you could try old school 4BSD scheduler?

PS. And How you built rarcrack? from ports?

Can you revert to the 4BSD scheduler without rebuilding the kernel? How do you do it?
 
Thanks for the info. I may have to pull down stable from source anyway to fix lack of direct rendering for my chipset, so maybe I will get my hands dirty and try to learn a little FreeBSD.
 
I run rarcrack uner truss and I have also look its source code.
Most of time it is vforking unrar and waiting for results, allocating end freeing memory all the time.
 
I just installed 7.0-RELEASE-AMD64 and I get the same poor rarcrack performance so I think we can rule out the scheduler changes in 7.1.

It may be waiting under FreeBSD because there's a performance problem in unrar? Or fork or malloc/free is slow on FreeBSD?

On openSUSE on the same box it runs 33x faster. There is something wrong here.
 
Maybe if I knew more about what you are saying ;)

Do I just global change all occurrences of malloc to tcmalloc?
 
Back
Top