This concerns more -or-less Intel core-i CPUs.
Normally i don't care about performce. If the machine works and gets things done in a reasonable way, everything is fine. I care much more about stability, so if lowering the bus cycles by 5% might increase the safety margin towards soft-flaws, I would do it.
Now I got my hands on a bunch of RAM, which is not on the QVL (qualified vendor list) of the board (but is within specs and so should work nevertheless), and those pieces offer two different speed ratings, a conservative JEDEC one and an XMP one.
Corrollary: It seems we no longer live in times where hardware has a precise functional rating that would say <it is guaranteed to run up to that speed>. We now have a culture of so called "over clockers", and industry caters for them by providing stuff to be "over clocked", including instructions on how to do so. Not only is this not what one should understand as overclocking (i.e. running stuff under conditions that are beyond what the manufacturer does propose), it also goes astray from what I understand as engineering work - it does no longer answer the question "what is this piece proposed to do?" in a measureable way.
(Just imagine somebody would sell an airplane and state: it endures a maximum tolerable g-force of 2.8, but if you like extreme flying, you can also use a g-force of 3.4 )
Curious as I am, I now would like to know if there is a remarkable difference between these ratings, e.g. some actual metrics. So I checked and found the benchmarks/hpl, which should be more-or-less the established way to determine those MFlops counts that are all thru the newspapers.
But then I learned: this tool does not simply measure the ad-hoc throughput of the system (like dd would show the ad-hoc throughput of a drive); instead it is something that needs to be carefully adjusted to the actual topology in order to max-out the compute throughput of the cores. At first tries the counts shown were about factor 1000 below what newspapers say my CPU should do.
Finally I started to watch what the CPUs are doing, with sysutils/intel-pcm. That port shows the detailed core activity and an IPC value - instructions per cycle (per core). And while I was used to see about 1 IPC (per core) during e.g. compiling, now it showed 3.5 IPC.
So that is what hyper-threading is about (my chip doesn't), why Intel charges a lot of extra money for chips having it enabled, and why it is not useful on high-performance mathematical workloads.
Questions that remain:
When the CPU shows about 3.5 IPC, and clocks at 2.9 GHz with 4 cores, that should make 40 GFlops. But strangely, the hpl tool shows a value that is always precisely !/4 of that, i.e. 10 GFlops. Why is this? (It is not related to the 4 cores, the same 1/4 factor applies when running on a single CPU.)
And: has anybody achieved metrics with that hpl tool that would vaguely resemble those that are talked in the technical newspapers? Probably with some high-end mainboard? (Mine is just basic consumer standard, specifically chip i5-3570T and board asus P8B75-V.)
Normally i don't care about performce. If the machine works and gets things done in a reasonable way, everything is fine. I care much more about stability, so if lowering the bus cycles by 5% might increase the safety margin towards soft-flaws, I would do it.
Now I got my hands on a bunch of RAM, which is not on the QVL (qualified vendor list) of the board (but is within specs and so should work nevertheless), and those pieces offer two different speed ratings, a conservative JEDEC one and an XMP one.
Corrollary: It seems we no longer live in times where hardware has a precise functional rating that would say <it is guaranteed to run up to that speed>. We now have a culture of so called "over clockers", and industry caters for them by providing stuff to be "over clocked", including instructions on how to do so. Not only is this not what one should understand as overclocking (i.e. running stuff under conditions that are beyond what the manufacturer does propose), it also goes astray from what I understand as engineering work - it does no longer answer the question "what is this piece proposed to do?" in a measureable way.
(Just imagine somebody would sell an airplane and state: it endures a maximum tolerable g-force of 2.8, but if you like extreme flying, you can also use a g-force of 3.4 )
Curious as I am, I now would like to know if there is a remarkable difference between these ratings, e.g. some actual metrics. So I checked and found the benchmarks/hpl, which should be more-or-less the established way to determine those MFlops counts that are all thru the newspapers.
But then I learned: this tool does not simply measure the ad-hoc throughput of the system (like dd would show the ad-hoc throughput of a drive); instead it is something that needs to be carefully adjusted to the actual topology in order to max-out the compute throughput of the cores. At first tries the counts shown were about factor 1000 below what newspapers say my CPU should do.
Finally I started to watch what the CPUs are doing, with sysutils/intel-pcm. That port shows the detailed core activity and an IPC value - instructions per cycle (per core). And while I was used to see about 1 IPC (per core) during e.g. compiling, now it showed 3.5 IPC.
So that is what hyper-threading is about (my chip doesn't), why Intel charges a lot of extra money for chips having it enabled, and why it is not useful on high-performance mathematical workloads.
Questions that remain:
When the CPU shows about 3.5 IPC, and clocks at 2.9 GHz with 4 cores, that should make 40 GFlops. But strangely, the hpl tool shows a value that is always precisely !/4 of that, i.e. 10 GFlops. Why is this? (It is not related to the 4 cores, the same 1/4 factor applies when running on a single CPU.)
And: has anybody achieved metrics with that hpl tool that would vaguely resemble those that are talked in the technical newspapers? Probably with some high-end mainboard? (Mine is just basic consumer standard, specifically chip i5-3570T and board asus P8B75-V.)