Hi,
I've got something very, very weird going on.....
I have 10 jails on FreeBSD 9.2.
Both the server OS and all jails are 9.2.
I just upgraded the ports tree in the jails.
I'm recompiling with:
to get everything in sync with my packages for my jail flavours as I use ezjail.
Well, I have a 6 core Opteron 4334 with 16 GB of RAM running ZFS. I decided to upgrade 6 jails at once with portupgrade.
Here's where things got really strange..... after a I hit a load of about 10 (not bad on a 6 core), things started to get choppy.... with intermittent bursts of activity on the package building (all are using PKGNG)..... All of a sudden all but either 1 or 2 of the builds would continue while the other 4-5 jails just "stopped". The load dropped to 1-2....
There is nothing in top or vmstat that says any type of resource is getting starved out.
I don't know if its the scheduler that has lost it's mind or something else.
I've got something very, very weird going on.....
I have 10 jails on FreeBSD 9.2.
Both the server OS and all jails are 9.2.
I just upgraded the ports tree in the jails.
I'm recompiling with:
portupgrade -a -r -p -fto get everything in sync with my packages for my jail flavours as I use ezjail.
Well, I have a 6 core Opteron 4334 with 16 GB of RAM running ZFS. I decided to upgrade 6 jails at once with portupgrade.
Here's where things got really strange..... after a I hit a load of about 10 (not bad on a 6 core), things started to get choppy.... with intermittent bursts of activity on the package building (all are using PKGNG)..... All of a sudden all but either 1 or 2 of the builds would continue while the other 4-5 jails just "stopped". The load dropped to 1-2....
There is nothing in top or vmstat that says any type of resource is getting starved out.
top -HPS shows this:
Code:
last pid: 29931; load averages: 1.47, 1.24, 1.31 up 0+04:16:32 21:30:45
713 processes: 8 running, 678 sleeping, 1 zombie, 26 waiting
CPU 0: 0.8% user, 0.0% nice, 1.2% system, 0.0% interrupt, 98.1% idle
CPU 1: 1.2% user, 0.0% nice, 1.9% system, 0.0% interrupt, 96.9% idle
CPU 2: 0.4% user, 0.0% nice, 2.3% system, 0.0% interrupt, 97.3% idle
CPU 3: 0.0% user, 0.0% nice, 0.8% system, 0.0% interrupt, 99.2% idle
CPU 4: 1.2% user, 0.0% nice, 2.0% system, 0.0% interrupt, 96.9% idle
CPU 5: 0.0% user, 0.0% nice, 0.8% system, 0.0% interrupt, 99.2% idle
Mem: 2283M Active, 882M Inact, 7556M Wired, 14M Cache, 191M Buf, 5096M Free
ARC: 6118M Total, 1989M MFU, 3197M MRU, 938K Anon, 166M Header, 766M Other
Swap: 4096M Total, 4096M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 20 ki-1 0K 96K CPU5 5 179:58 100.00% idle{idle: cpu5}
11 root 155 ki31 0K 96K CPU0 0 179:49 100.00% idle{idle: cpu0}
11 root 155 ki31 0K 96K CPU3 3 179:24 100.00% idle{idle: cpu3}
11 root 155 ki31 0K 96K CPU1 1 177:43 100.00% idle{idle: cpu1}
11 root 155 ki31 0K 96K RUN 2 177:42 100.00% idle{idle: cpu2}
11 root 155 ki31 0K 96K CPU4 4 177:39 100.00% idle{idle: cpu4}
18635 root 52 0 14536K 2780K wait 5 0:01 0.29% sh
25917 root 20 0 16596K 4040K CPU2 2 0:00 0.10% top
0 root -16 0 0K 5520K sched 4 1:37 0.00% kernel{swapper}
3 root -8 - 0K 176K tx->tx 2 0:47 0.00% zfskern{txg_thread_enter}
1742 root 20 0 43500K 8148K select 2 0:46 0.00% snmpd
0 root -16 0 0K 5520K - 4 0:19 0.00% kernel{zio_write_issue_}
systat -vm show:
Code:
6 users Load 2.85 1.78 1.51 Oct 11 21:32
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 1729748 103164 11157016 153160 5158644 count
All 7244064 231460 1087346k 636988 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt 3429 cow 4353 total
2 269 1851 11k 23k 109 163 11k 6586 zfod ohci0 ohci
ozfod ehci0 17
35.7%Sys 0.0%Intr 14.3%User 0.0%Nice 50.0%Idle %ozfod ohci2 ohci
| | | | | | | | | | | daefr ehci1 19
==================>>>>>>> 13825 prcfr ahci0 22
dtbuf 20139 totfr 544 cpu0:timer
Namei Name-cache Dir-cache 333596 desvn react mps0 256
Calls hits % hits % 146398 numvn pdwak 54 em0:rx 0
30 30 100 79188 frevn pdpgs 54 em0:tx 0
intrn em0:link
Disks md0 da0 da1 da2 da3 da4 da5 7760716 wire em1:rx 0
KB/t 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2382612 act em1:tx 0
tps 0 0 0 0 0 0 0 908048 inact em1:link
MB/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 14180 cache 490 cpu1:timer
%busy 0 0 0 0 0 0 0 5145236 free 1252 cpu3:timer
195472 buf 54 cpu4:timer
653 cpu2:timer
ps -auxww :
Code:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 11 600.0 0.0 0 96 ?? RL 5:14PM 1086:44.51 [idle]
root 1742 0.1 0.0 43500 8148 ?? S 5:15PM 0:46.39 /usr/local/sbin/snmpd -p /var/run/net_snmpd.pid
root 0 0.0 0.0 0 5520 ?? DLs 5:14PM 4:08.91 [kernel]
root 1 0.0 0.0 6276 592 ?? ILs 5:14PM 0:03.70 /sbin/init --
root 2 0.0 0.0 0 16 ?? DL 5:14PM 0:00.04 [mps_scan0]
root 3 0.0 0.0 0 176 ?? DL 5:14PM 0:48.88 [zfskern]
root 4 0.0 0.0 0 16 ?? DL 5:14PM 0:00.00 [sctp_iterator]
root 5 0.0 0.0 0 16 ?? DL 5:14PM 0:00.00 [xpt_thrd]
root 6 0.0 0.0 0 16 ?? DL 5:14PM 0:00.00 [ipmi0: kcs]
root 7 0.0 0.0 0 16 ?? DL 5:14PM 0:00.23 [enc_daemon0]
root 8 0.0 0.0 0 16 ?? DL 5:14PM 0:00.01 [pagedaemon]
root 9 0.0 0.0 0 16 ?? DL 5:14PM 0:00.00 [vmdaemon]
root 10 0.0 0.0 0 16 ?? DL 5:14PM 0:00.00 [audit]
root 12 0.0 0.0 0 416 ?? WL 5:14PM 0:34.32 [intr]
root 13 0.0 0.0 0 48 ?? DL 5:14PM 0:12.97 [geom]
root 14 0.0 0.0 0 16 ?? DL 5:14PM 0:01.62 [yarrow]
root 15 0.0 0.0 0 448 ?? DL 5:14PM 0:00.55 [usb]
root 16 0.0 0.0 0 16 ?? DL 5:14PM 0:00.00 [pagezero]
root 17 0.0 0.0 0 16 ?? DL 5:14PM 0:00.05 [bufdaemon]
root 18 0.0 0.0 0 16 ?? DL 5:14PM 0:02.07 [vnlru]
root 19 0.0 0.0 0 16 ?? DL 5:14PM 0:13.59 [syncer]
root 20 0.0 0.0 0 16 ?? DL 5:14PM 0:00.13 [softdepflush]
I don't know if its the scheduler that has lost it's mind or something else.