FreeBSD 9.2 and too much idle slices?

Hi,

I've got something very, very weird going on.....

I have 10 jails on FreeBSD 9.2.

Both the server OS and all jails are 9.2.

I just upgraded the ports tree in the jails.

I'm recompiling with:

portupgrade -a -r -p -f

to get everything in sync with my packages for my jail flavours as I use ezjail.

Well, I have a 6 core Opteron 4334 with 16 GB of RAM running ZFS. I decided to upgrade 6 jails at once with portupgrade.

Here's where things got really strange..... after a I hit a load of about 10 (not bad on a 6 core), things started to get choppy.... with intermittent bursts of activity on the package building (all are using PKGNG)..... All of a sudden all but either 1 or 2 of the builds would continue while the other 4-5 jails just "stopped". The load dropped to 1-2....

There is nothing in top or vmstat that says any type of resource is getting starved out.

top -HPS shows this:

Code:
last pid: 29931;  load averages:  1.47,  1.24,  1.31                             up 0+04:16:32  21:30:45
713 processes: 8 running, 678 sleeping, 1 zombie, 26 waiting
CPU 0:  0.8% user,  0.0% nice,  1.2% system,  0.0% interrupt, 98.1% idle
CPU 1:  1.2% user,  0.0% nice,  1.9% system,  0.0% interrupt, 96.9% idle
CPU 2:  0.4% user,  0.0% nice,  2.3% system,  0.0% interrupt, 97.3% idle
CPU 3:  0.0% user,  0.0% nice,  0.8% system,  0.0% interrupt, 99.2% idle
CPU 4:  1.2% user,  0.0% nice,  2.0% system,  0.0% interrupt, 96.9% idle
CPU 5:  0.0% user,  0.0% nice,  0.8% system,  0.0% interrupt, 99.2% idle
Mem: 2283M Active, 882M Inact, 7556M Wired, 14M Cache, 191M Buf, 5096M Free
ARC: 6118M Total, 1989M MFU, 3197M MRU, 938K Anon, 166M Header, 766M Other
Swap: 4096M Total, 4096M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        20 ki-1     0K    96K CPU5    5 179:58 100.00% idle{idle: cpu5}
   11 root       155 ki31     0K    96K CPU0    0 179:49 100.00% idle{idle: cpu0}
   11 root       155 ki31     0K    96K CPU3    3 179:24 100.00% idle{idle: cpu3}
   11 root       155 ki31     0K    96K CPU1    1 177:43 100.00% idle{idle: cpu1}
   11 root       155 ki31     0K    96K RUN     2 177:42 100.00% idle{idle: cpu2}
   11 root       155 ki31     0K    96K CPU4    4 177:39 100.00% idle{idle: cpu4}
18635 root        52    0 14536K  2780K wait    5   0:01  0.29% sh
25917 root        20    0 16596K  4040K CPU2    2   0:00  0.10% top
    0 root       -16    0     0K  5520K sched   4   1:37  0.00% kernel{swapper}
    3 root        -8    -     0K   176K tx->tx  2   0:47  0.00% zfskern{txg_thread_enter}
 1742 root        20    0 43500K  8148K select  2   0:46  0.00% snmpd
    0 root       -16    0     0K  5520K -       4   0:19  0.00% kernel{zio_write_issue_}

systat -vm show:
Code:
    6 users    Load  2.85  1.78  1.51                  Oct 11 21:32

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act 1729748  103164 11157016   153160 5158644  count
All 7244064  231460 1087346k   636988          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt   3429 cow    4353 total
  2         269      1851  11k  23k  109  163  11k   6586 zfod        ohci0 ohci
                                                          ozfod       ehci0 17
35.7%Sys   0.0%Intr 14.3%User  0.0%Nice 50.0%Idle        %ozfod       ohci2 ohci
|    |    |    |    |    |    |    |    |    |    |       daefr       ehci1 19
==================>>>>>>>                           13825 prcfr       ahci0 22
                                           dtbuf    20139 totfr   544 cpu0:timer
Namei     Name-cache   Dir-cache    333596 desvn          react       mps0 256
   Calls    hits   %    hits   %    146398 numvn          pdwak    54 em0:rx 0
      30      30 100                 79188 frevn          pdpgs    54 em0:tx 0
                                                          intrn       em0:link
Disks   md0   da0   da1   da2   da3   da4   da5   7760716 wire        em1:rx 0
KB/t   0.00  0.00  0.00  0.00  0.00  0.00  0.00   2382612 act         em1:tx 0
tps       0     0     0     0     0     0     0    908048 inact       em1:link
MB/s   0.00  0.00  0.00  0.00  0.00  0.00  0.00     14180 cache   490 cpu1:timer
%busy     0     0     0     0     0     0     0   5145236 free   1252 cpu3:timer
                                                   195472 buf      54 cpu4:timer
                                                                  653 cpu2:timer

ps -auxww :


Code:
USER         PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED       TIME COMMAND
root          11 600.0  0.0      0     96 ??  RL    5:14PM 1086:44.51 [idle]
root        1742   0.1  0.0  43500   8148 ??  S     5:15PM    0:46.39 /usr/local/sbin/snmpd -p /var/run/net_snmpd.pid
root           0   0.0  0.0      0   5520 ??  DLs   5:14PM    4:08.91 [kernel]
root           1   0.0  0.0   6276    592 ??  ILs   5:14PM    0:03.70 /sbin/init --
root           2   0.0  0.0      0     16 ??  DL    5:14PM    0:00.04 [mps_scan0]
root           3   0.0  0.0      0    176 ??  DL    5:14PM    0:48.88 [zfskern]
root           4   0.0  0.0      0     16 ??  DL    5:14PM    0:00.00 [sctp_iterator]
root           5   0.0  0.0      0     16 ??  DL    5:14PM    0:00.00 [xpt_thrd]
root           6   0.0  0.0      0     16 ??  DL    5:14PM    0:00.00 [ipmi0: kcs]
root           7   0.0  0.0      0     16 ??  DL    5:14PM    0:00.23 [enc_daemon0]
root           8   0.0  0.0      0     16 ??  DL    5:14PM    0:00.01 [pagedaemon]
root           9   0.0  0.0      0     16 ??  DL    5:14PM    0:00.00 [vmdaemon]
root          10   0.0  0.0      0     16 ??  DL    5:14PM    0:00.00 [audit]
root          12   0.0  0.0      0    416 ??  WL    5:14PM    0:34.32 [intr]
root          13   0.0  0.0      0     48 ??  DL    5:14PM    0:12.97 [geom]
root          14   0.0  0.0      0     16 ??  DL    5:14PM    0:01.62 [yarrow]
root          15   0.0  0.0      0    448 ??  DL    5:14PM    0:00.55 [usb]
root          16   0.0  0.0      0     16 ??  DL    5:14PM    0:00.00 [pagezero]
root          17   0.0  0.0      0     16 ??  DL    5:14PM    0:00.05 [bufdaemon]
root          18   0.0  0.0      0     16 ??  DL    5:14PM    0:02.07 [vnlru]
root          19   0.0  0.0      0     16 ??  DL    5:14PM    0:13.59 [syncer]
root          20   0.0  0.0      0     16 ??  DL    5:14PM    0:00.13 [softdepflush]

I don't know if its the scheduler that has lost it's mind or something else.
 
I cannot find anything wrong with what you posted. Anyway, if there are no idle processes(they do nothing but forever loop) running on your system, your CPU may stop working, IMO. In short, idle process is a must to any OS.
 
Agreed. I don't see anything wrong....

However, things came to a near screeching halt.... It wasn't until some of them started to complete their jobs that response finally became a little better.

Once everything was 'done' and tried something else, everything was ok again.

What you see below is 'while it was happening'. Everything went idle like there was nothing to do. Everything would hang at the 'checksum' of the distfile for somewhere between 5-10 minutes and then proceed. All the other CPUs would just go idle yet there were 4-5 other portupgrade processes running but in an 'idle' state.

It seem like the machine was resource starved. Does anyone think this might have been a file handle issue or something else?

Thank you.
 
Back
Top