What's "smart linking". You mean link time optimizations?
Yes and no.
There are things a linker can do that the compiler can't. But the area gets blurry with llvm and WPO, where the linker puts together the low level code of llvm and passes that to the code generator to make one binary. There you can then have global register allocations and more, but I digress.
Traditionally, a linker was operating on object files as a unit. It took the object files from the command line, resolved open symbols, made several passes over the libraries to resolve and pick up the rest. The libraries were also sets of object files.
A smart linker loads only what is needed. You start with the first code section in crt.o, which is the first object file on the command line, and mark that as used. Then you only resolve what that needs, not the whole object file. If an object contains a big static data structure, that one would be included in old style linking no matter what. A smart linker checks if that symbol is referenced, and if not, leaves it out. You need to make many passes over the objects and libraries, which is why this takes up more memory. Don't forget, back in the days the PDP came with memory in the kilobyte range.
This approach can give you very small binaries, but it also can expose sloppy coding which depends on side effects. Take this as an example:
Code:
static int some_result = setup_my_signal_handlers();
int main(int ac, char** av)
{
....
Someone uses the C++ constructors to set up his signal handlers. Only, if he never touches the "some_result" in his main(), or any other used code, the variable will be dropped and the constructor never called. Speaking of constructors, that is some more magic. How do you order them and what can you do if they form a circle? Using stream io to report an error in the memory allocator will be bad, because memory allocation needs to be initialized before IO, so what now? You have at least to detect this and report it.
What else can a smart linker do, except throwing out the trash (and sometimes your car keys with it when you messed up)?
Coalesc common code. Template expansions will create equal code for different parameters. A list template which is instanciated with an int, unsigned int, long, unsigned long and pointers to different data structures will produce the same code provided all the data types are 32 bits and only checked for equality. You can throw out all the copies and keep one.
You can check which function calls which other function and place them next to each other, so they end up in the same cache line if small enough or same memory page when bigger. Computing this call graph and deciding who goes where takes time, but may save you a lot in the runtime later.
Now do the same with data and bss. But keep the entry point nailed down.
The black magic comes when you start to patch the instructions to make them PC relative, if possible. PC relative addressing is often limited to smaller displacements, so you pay with a NOP and win one less relocation entry to be processed later. In shared objects, that can save you touching a complete page of code in the runtime memory image, so the efford is not to be sneezed at. You may end up with a complete PIC binary without setting one option in the compiler.
Real black magic comes up when you detect that the source of a call and the destination do not share the same calling convention and you dig into the debug information to auto-magically create the stub code and insert it into the working set. I have not seen this outside of what I did, but it would make so many problems with modern languages and their bindings go away. Guess why I made that.