Other How is the UNIX Calling Convention faster than Microsoft® convention?

Wait, so "8088" was released after "8086" and it had a smaller data bus and address bit? What's the point? Was it much cheaper or something? Also, what was the case for "external" cache? Does an external cache even counts for the definition of "cache"?
yes probably cheaper. the same was 386 -> 386SX and 486->486SX (486SX was missing the FPU)
the originals were later called x86DX
yes external cache was SRAM without refresh circuits and stuff so faster to access.
hp pa-risc had some chips with external l1 so it could be tuned per usage
"An interesting aspect of the PA-RISC line is that most of its generations have no level 2 cache. Instead large level 1 caches are used, initially as separate chips connected by a bus, and later integrated on-chip."
 
I personally find the idea of learning how to use a profiler interesting... there's prof, and gprof, for starters, and lots of languages/compiler packages include a profiler. It's not always easy to find the correct port for that stuff, granted. My understanding is that knowing how to use a profiler can help with optimizing the code.
I've been trying to get pmcstat to work recently, without much joy.

I'm also waiting to see if/when GNU binutils gprofng will make it to FreeBSD.

I've not tried Google gperftools on FreeeBSD but that is also an option.

Closer to home, there's always callgrind and cachegrind for smaller exes and runtimes.
 
looks like I was wrong but I could swear the original PC was 8086 and only the XT was 8088
probably because of robotron DDR pc clones we had around here in the late '80
It happens to all of us! I have had similar case where I remember something with a different way that it was/happened, and I could swear the opposite.

In case you haven't heard about it, also take a look in the Mandela Effect. It's fascinating how much complicated the human mind is!
 
I've been trying to get pmcstat to work recently, without much joy.

I'm also waiting to see if/when GNU binutils gprofng will make it to FreeBSD.

I've not tried Google gperftools on FreeeBSD but that is also an option.

Closer to home, there's always callgrind and cachegrind for smaller exes and runtimes.
Thank you for sharing these tools for people like me who didn't know them! In the case of "pmcstat", have you tried to build it natively or to run it under the Linux compatibility layer?
 
Thank you for sharing these tools for people like me who didn't know them! In the case of "pmcstat", have you tried to build it natively or to run it under the Linux compatibility layer?

pmcstat is a FreeBSD application.

I get a lot of errors like

addr2line: dwarf_init: Debug info NULL [_dwarf_consumer_init(66)]
pmcstat: WARNING: addr2line function name read error
pmcstat: WARNING: addr2line pipe error

and as a result the calltrees seem incomplete and I don't get an accurate cumulative picture.

Maybe llvm-addr2line or eu-addr2line would work better but I don't know how to tell pmcstat to use them.
 
pmcstat is a FreeBSD application.

I get a lot of errors like

addr2line: dwarf_init: Debug info NULL [_dwarf_consumer_init(66)]
pmcstat: WARNING: addr2line function name read error
pmcstat: WARNING: addr2line pipe error

and as a result the calltrees seem incomplete and I don't get an accurate cumulative picture.

Maybe llvm-addr2line or eu-addr2line would work better but I don't know how to tell pmcstat to use them.
Hope it can be fixed in the future! Thanks for the info, and have a great Sunday!
 
original pc was 8086 which had 16 bit data bus and 20 bit address bits. the 8088 came later and was used in ibm xt. external cache was first in the 386 and internal cache in the 486.
Very minor nit-pick. While the 8086 preceeded the 8088 from Intel, the original IBM PC ran an 8088 at 4.7 MHz with eight bit memory plus parity. I'm old enough to remember populating 9 rows of 16 kilobit chips when adding more RAM.
 
Replying to the original question. There's no straightforward answer - no calling convention is always best in all cases.

The factors to consider are
  • How complex the code is (more complex code has more "register pressure").
  • The average number of arguments per function
  • For some architectures, the depth of the callstack.
So if your code is complex and you pass few arguments, a stack calling convention will be good. Simpler code and more arguemnts will favour registers.

The "some architectures " I was alluding to are CPUs that use register windows. Back in the day, that meant SPARC and a few others . I think that they are now out of fashion and not used in any current hardware. Register windows were great in theory. Roughly you could have one "window" of registers per call frame. No need to push/pop on the stack, and no need to save and restore registers. That is until the callstack depth got too deep and there would be a relatively slow "register spill". Then of couse Sun came out with Java and the JVM which has enormous callstacks. Just the thing to run on your own CPUs that are best with shallow callstacks!
 
Back
Top