Solved syscall wrappers

Paul Floyd · Oct 5, 2024

What is the point of the prolog for syscall wrappers? For instance, from FreeBSD 12.4 amd64 in Intel format

Code:

0000000000092b20 <msync>:
   92b20:       55                      push   rbp
   92b21:       48 89 e5                mov    rbp,rsp
   92b24:       48 8b 05 95 52 13 00    mov    rax,QWORD PTR [rip+0x135295]        # 1c7dc0 <svc_maxfd+0x78>
   92b2b:       5d                      pop    rbp
   92b2c:       ff e0                   jmp    rax
   92b2e:       cc                      int3 
   92b2f:       cc                      int3

In C that's just making function pointer call of via __libc_interposing

Push the base pointer
Copy the stack pointer to the base pointer
Copy the rip-relative address of _msync into rax (objdump doesn't seem to be able to figure that out)
Jump to _msync which looks like

Code:

│  > 0x800397440 <_msync>    mov    $0x41,%eax
│    0x800397445 <_msync+5>  mov    %rcx,%r10
│    0x800397448 <_msync+8>  syscall
│    0x80039744a <_msync+10> jb     0x8004055b4 <.cerror>
│    0x800397450 <_msync+16> ret

The TCO is clear enough.

What is the use of pushing and popping rbp? Is this always done with -fno-omit-frame-pointer?

The only side effect that I see is that the contents of rbp will still be just below the stack pointer.

Paul Floyd · Oct 6, 2024

OK to answer my own question with an example. Some made-up code

Code:

extern int(*pt[])(int, int);

int tc1(int a, int b)
{
   return pt[1](a, b);
}

If I compile that without optimization that disassembles to

Code:

0000000000000000 <tc1>:
0: 55                            push    rbp
1: 48 89 e5 mov rbp, rsp
4: 48 83 ec 10 sub rsp, 0x10
8: 89 7d fc mov dword ptr [rbp - 0x4], edi
b: 89 75 f8 mov dword ptr [rbp - 0x8], esi
e: 48 8b 04 25 00 00 00 00 mov rax, qword ptr [0x0]
16: 8b 7d fc mov edi, dword ptr [rbp - 0x4]
19: 8b 75 f8 mov esi, dword ptr [rbp - 0x8]
1c: ff d0 call rax
1e: 48 83 c4 10 add rsp, 0x10
22: 5d pop rbp
23: c3                            ret

prolog
make space for temporaries
store args in temporaries
get the rip-relative call address
put the exact same values back from the temporaries to the registers where they came from
call function
reset stack pointer
epilog
return

If I compile with -O3 I get

Code:

0000000000000000 <tc1>:
0: 55                            push    rbp
1: 48 89 e5 mov rbp, rsp
4: 48 8b 05 00 00 00 00 mov rax, qword ptr [rip]    # 0xb <tc1+0xb>
b: 5d pop rbp
c: ff e0 jmp rax

That's the same as the syscall. So clang is using the frame pointer even with -O3. GCC doesn't do that.

If I compile with clang -O3 -fomit-frame-pointer then I get

Code:

0000000000000000 <tc1>:
0: 48 8b 05 00 00 00 00          mov     rax, qword ptr [rip]    # 0x7 <tc1+0x7>
7: ff e0 jmp rax

(almost the same as GCC, which doesn't even bother with rax and does an indirect rip-relative jump.

For completeness, if I want to turn off the TCO then compiling with
-O3 -fomit-frame-pointer -fno-optimize-sibling-calls
results in

Code:

0000000000000000 <tc1>:
0: 50                            push    rax
1: ff 15 00 00 00 00 call qword ptr [rip]         # 0x7 <tc1+0x7>
7: 59 pop rcx
8: c3                            ret

Pushing rax and popping rcx is a bit mysterious.

So in summary it's the fact that clang -O3 doesn't include -fomit-frame-pointer like GCC does that was surprising me.

Solved syscall wrappers

Paul Floyd

Paul Floyd