additional CPU heat caused by something other than core load

Internal storage, motherboards, PCI cards, stuff inside the case.

additional CPU heat caused by something other than core load

Postby ta0kira » 25 Feb 2013, 04:34

I recently built a computer with an AMD FX-8350 8-core processor, running FreeBSD 9.1. I've been testing it under heavy load, and a curious thing has been happening. I have a program that keeps all 8 cores running at 90-100% for quite a while. Without getting into too many specifics, when I run the program one way the CPU stays at about 53C (95-100% load,) and if I run it another way it stays at about 63C (90-95% load.) This makes no sense to me. The program goes through the exact same operations regardless of how it's run (it multiplies matrices incrementally by preloading small sections into RAM.) The only differences are how much RAM and address space are used, how often disk I/O happens, and how many times each loop is executed.
  1. The first instance (process that causes 53C) mmaps about 24GB and allocates about 520MB. It reads/writes (raidz2) about half as often as the other instance, which is why the core loads are higher. All reads and writes are sequential and in large blocks.
  2. The second instance (process that causes 63C) mmaps about 6GB and allocates about 516MB.
It could be that the first instance is making better use of the CPU caches, or that using the SB (for disk I/O) causes more CPU heat than does using the NB (for RAM.) I really can't think of any other reason for such a large difference in temperature. Neither of the temperatures are horrible, but if/when I decide to mess around with overclocking I'd like to know that I can max out my CPU heat predictably.

Thanks!

Kevin Barry
ta0kira
Member
 
Posts: 147
Joined: 28 Dec 2009, 00:09

Postby throAU » 25 Feb 2013, 05:43

ta0kira wrote:I recently built a computer with an AMD FX-8350 8-core processor, running FreeBSD 9.1. I've been testing it under heavy load, and a curious thing has been happening. I have a program that keeps all 8 cores running at 90-100% for quite a while. Without getting into too many specifics, when I run the program one way the CPU stays at about 53C (95-100% load,) and if I run it another way it stays at about 63C (90-95% load.) This makes no sense to me. The program goes through the exact same operations regardless of how it's run (it multiplies matrices incrementally by preloading small sections into RAM.) The only differences are how much RAM and address space are used, how often disk I/O happens, and how many times each loop is executed.
  1. The first instance (process that causes 53C) mmaps about 24GB and allocates about 520MB. It reads/writes (raidz2) about half as often as the other instance, which is why the core loads are higher. All reads and writes are sequential and in large blocks.
  2. The second instance (process that causes 63C) mmaps about 6GB and allocates about 516MB.
It could be that the first instance is making better use of the CPU caches, or that using the SB (for disk I/O) causes more CPU heat than does using the NB (for RAM.) I really can't think of any other reason for such a large difference in temperature. Neither of the temperatures are horrible, but if/when I decide to mess around with overclocking I'd like to know that I can max out my CPU heat predictably.

Thanks!

Kevin Barry


Maybe it is the on-die MMU getting a work-out with all the paging going on? No idea if MMU utilisation is included in reported CPU utilisation statistics, but I suspect not?
I use: FreeBSD, Mac OS X, Windows, Netapp, Cisco UCS, Cisco CUCM, Cisco IOS, Cisco ASA, vSphere 5.1, Cisco ISE, Orion NPM
throAU
Member
 
Posts: 912
Joined: 05 Jan 2012, 05:37
Location: Perth, Western Australia

Postby Crivens » 25 Feb 2013, 08:53

Please supply the wall times of the programm. You may also find the cpu event counters a good thing to try out, you may want to check the number of memory transfers, cache misses and so on in the code. Sometimes suprising things come up with this.
Optimists believe we live in the best world possible. Pessimists agree to this.

Two little lights, blinking out in a sky full of stars - we will never forget you. I miss you so much
User avatar
Crivens
Member
 
Posts: 752
Joined: 03 Oct 2010, 15:45

Postby ta0kira » 25 Feb 2013, 13:10

Crivens wrote:Please supply the wall times of the programm. You may also find the cpu event counters a good thing to try out, you may want to check the number of memory transfers, cache misses and so on in the code. Sometimes suprising things come up with this.
The first instance takes ~175m and the second ~14m. The first does at most 8x more work but takes 12x longer. By "in the code" do you mean to literally count them in the source code? I wrote the code, so I know that all of the reads and writes (for these particular instances) are page-aligned and sequential, but the memory access probably isn't that simple. The program basically just copies columns/rows into RAM, has GSL multiply them, then adds the resulting block to the respective rows in the output matrix. In the first instance GSL is continually multiplying a 4x256 and a 256x16384, and in the second a 4x128 and a 128x32768.

Kevin Barry
ta0kira
Member
 
Posts: 147
Joined: 28 Dec 2009, 00:09

Postby Crivens » 25 Feb 2013, 13:18

I mean what the core does, the code which gets actually executed. You do not only have your code running but also the OS doing the paging and MMU handling. This case looks like increasing the cache hit rate, your case #1 is running cooler because the cores are most likely waiting for main memory to deliver some data. As I said, using the performance counters can tell you a lot.
Optimists believe we live in the best world possible. Pessimists agree to this.

Two little lights, blinking out in a sky full of stars - we will never forget you. I miss you so much
User avatar
Crivens
Member
 
Posts: 752
Joined: 03 Oct 2010, 15:45

Postby ta0kira » 25 Feb 2013, 15:19

Crivens wrote:I mean what the core does, the code which gets actually executed. You do not only have your code running but also the OS doing the paging and MMU handling. This case looks like increasing the cache hit rate, your case #1 is running cooler because the cores are most likely waiting for main memory to deliver some data. As I said, using the performance counters can tell you a lot.
I think this is a start. I ran both with [FILE]pmcstat -w 10 -p instructions -p dc-misses -p ic-misses[/FILE]. A typical line of output for each when the temp is high:
  1. Code: Select all
    #  p/instructions     p/dc-misses     p/ic-misses
         309810972151               0         6466051
  2. Code: Select all
    #  p/instructions     p/dc-misses     p/ic-misses
         538303758005               0         7510967
Are there better PMCs to use for this? This is actually the first time I've used it.

Also, running the program with [FILE]pmcstat[/FILE] changed the process execution slightly, so those figures might not be representative of what's actually happening. Both ran more slowly, and with [FILE]pmcstat[/FILE] core usage dropped below 25% fairly often for the first instance. Thanks!

Kevin Barry
ta0kira
Member
 
Posts: 147
Joined: 28 Dec 2009, 00:09


Return to System Hardware

Who is online

Users browsing this forum: No registered users and 2 guests