15.0-ALPHA4 aarch64 ptrace PT_SETREGS issue

I'm having an issue with ptrace PT_SETREGS on FreeBSD 15.0-ALPHA4 (and ALPHA3) on aarch64.

This seems to be a regression related to FreeBSD 15. It worked OK with FreeBSD 14.0 14.1 and works OK with 14.3.

What should happen.The purpose of the code is to force a call to a polling function if the inferior is blocked in system calls.

  1. Call ptrace PT_GETREGS to get the stack pointer
  2. Set the registers for the polling function
  • ELR the address of the polling function
  • LR a phony address (0 - the polling function will return via longjmp())
  • SP based on what we got with PT_GETREGS
  • X0 argument to the polling function, a check value of 0x8BADF00D
  1. Call ptrace PT_CONTINUE
What is happening

All the above works OK apart from X0. Instead of seeing 0x8BADF00D the value is 4 which triggers an assert.

How to reproduce

On FreeBSD 15.0-ALPAH4 arm64. You'll need gmake, gdb and autotools installed.

  1. glt clone https://sourceware.org/git/valgrind.git
  2. cd valgrind
  3. ./autogen.sh
  4. ./configure
  5. gmake
  6. make sure that you have two terminals
  7. In terminal 1 in the valgrind directory run./vg-in-place --tool=none --vgdb-eror=0 sleep 10000
  8. In terminal 2 rungdbthen at the gdb prompt - replace {path} with the path containing the valgrind directorytarget remote | {path}/valgrind/coregrind/vgdbthencontinuethenctrl-c
You should get the following (possibly with slightly different line numbers):

valgrind: m_gdbserver/m_gdbserver.c:883 (void vgPlain_invoke_gdbserver(int, int)): Assertion 'check == 0x8BADF00D' failed.

host stacktrace:
==2874== at 0x380A902C: show_sched_status_wrk (m_libcassert.c:426)
==2874== by 0x380A9353: report_and_quit (m_libcassert.c:497)
==2874== by 0x380A932F: vgPlain_assert_fail (m_libcassert.c:564)
==2874== by 0x38180CB3: vgPlain_invoke_gdbserver (m_gdbserver.c:883)
==2874== by 0xFFFFFFFFFFFFFFFF: ???

sched status:
running_tid=0<br><br>Thread 1: status = VgTs_WaitSys syscall 240 (lwpid 100170)
==2874== at 0x4D27BF4: _nanosleep (in /lib/libsys.so.7)
==2874== by 0x4010ECF: ??? (in /bin/sleep)
==2874== by 0x49718BB: __libc_start1 (in /lib/libc.so.7)
==2874== by 0x4010C9B: ??? (in /bin/sleep)
client stack range: [0x1FBFFFC000 0x1FC0000FFF] client SP: 0x1FC00009A0
valgrind stack range: [0x100278E000 0x100288DFFF] top usage: 12688 of 1048576

I've already tried quite a few things

  1. Reading back the registers, they are identical.
  2. Reading the instructions from the address put in ELR they match what I see in the binary with objdump.
  3. Adding a second argument works OK. The value that I put in X1 appears as the second argument.
  4. Tried using clang 19 on FreeBSD 14.3 (19 is the default compiler on FreeBSD 15), and it worked OK.
I've looked a bit at the FreeBSD kernel code. I don't really know my way around, but nothing struck me as a possible cause.
 
I have a smallish reproducer now.

C:
#include <setjmp.h>
#include <stdio.h>
#include <assert.h>
#include <unistd.h>

jmp_buf jb;

void get_out_of_jail(int arg)
{
    assert(arg == 0x8BADF00D);
    fprintf(stderr, "in get out of jail!\n");
    longjmp(jb, 1);
}

int main(void)
{
    printf("Arguments for sup: %d %p\n", (int)getpid(), get_out_of_jail);
    if (setjmp(jb) == 0)
    {
        sleep(1000);
    }
    else
    {
        printf("got out of jail!\n");
    }
}

and

C:
#include <sys/types.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char** argv)
{
    if (argc == 3)
    {
        unsigned long addr;
        pid_t pid;
        struct reg regs;
        unsigned long bad_return = 0;
        unsigned long sp;
        const int regsize = 8;

        sscanf(argv[1], "%d", &pid);
        sscanf(argv[2], "%p", (void**)&addr);

        if (ptrace(PT_ATTACH, pid, 0, 0) != 0)
        {
            perror("attach failed:");
        }

        int status = 0;
        pid_t p = waitpid(pid, &status, 0);
        if (p != pid)
        {
            perror("waitpid failed:");
        }

        if (ptrace(PT_GETREGS, pid, (caddr_t)&regs, 0) != 0)
        {
            perror("getregs failed:");
        }

#if defined(__amd64__)
        sp = regs.r_rsp;
        sp &= ~0xf;
        sp = sp - regsize;

        regs.r_rbp = sp;
        regs.r_rsp = sp;
        regs.r_rip = (__int64_t)addr;
        regs.r_rdi = 0x8BADF00D;

        int buf[2];
        memcpy(buf, &bad_return, sizeof(bad_return));
        if (ptrace(PT_WRITE_D, pid, (caddr_t)sp, buf[0]) != 0)
        {
            perror("write d 0 failed:");
        }

        if (ptrace(PT_WRITE_D, pid, (caddr_t)(sp+sizeof(buf[0])), buf[1]) != 0)
        {
            perror("write d 1 failed:");
        }
#elif defined(__aarch64__)
        sp = regs.sp;
        sp &= ~0xf;
        regs.x[0] = 0x8BADF00D;
        regs.sp = sp;
        regs.elr = (__int64_t)addr;
        regs.lr = bad_return;
#endif

        if (ptrace(PT_SETREGS, pid, (caddr_t)&regs, 0) != 0)
        {
            perror("setregs failed:");
        }

        if (ptrace(PT_DETACH, pid, (caddr_t)1, 0) != 0)
        {
            perror("detach failed:");
        }
    }
}

Compile the first as 'inf' (for inferior) and the second as 'sup'. Run 'inf' in one terminal, then run 'sup' in a second terminal with the two arguments from 'inf'

The output of 'inf' on 14.3 aarch64 (and 14.2 amd64) should be something like

Arguments for sup: 2857 0x2018e0
in get out of jail!
got out of jail!

On 15.3-ALPHA4 I get
Arguments for sup: 9091 0x210780
Assertion failed: (arg == 0x8BADF00D), function get_out_of_jail, file inf.c, line 10.

Opening the core file with gdb shows that x0 is 0.
 
ouch. that sure sounds like a regression in the bowels of `ptrace`, maybe causing the x0 register to be overwritten by a return value somewhere along the process of returning from the ptrace call? Good luck.
 
I really don't know what is happening. From what I've seen in the kernel thread ptrace PT_SETREGS ultimately calls set_regs which copies the registers from struct reg to the trapframe of the thread. I can't see where that could be going wrong.
 
I don't have a way of testing it right now.

In the tracer program (that calls ptrace) you could "trace" it yourself - after PT_SETREGS call PT_GETREGS and verify.
Or do a "raw" test - set the pc of the debugee to 0xcafec0de and check if it segfaults on that address. That way you'll know it sets the registers and problem is elsewhere.

edit:
Did some quick checks of the code you shared. On 15 I see debug.ptrace_attach_transparent sysctl seems to affect your program. Disabling it makes it work.
I'm assuming that with the syctl enabled it changes the stack and/or state it is when you detach from it. It would be worth checking out further.

As I was writting this I realised that of course it sets the pc as you wanted - it hits the assert() as expected. :)
 
The output of 'inf' on 14.3 aarch64 (and 14.2 amd64) should be something like

Arguments for sup: 2857 0x2018e0
in get out of jail!
got out of jail!

On 15.3-ALPHA4 I get
Arguments for sup: 9091 0x210780
Assertion failed: (arg == 0x8BADF00D), function get_out_of_jail, file inf.c, line 10.

Opening the core file with gdb shows that x0 is 0.

On 16-current/amd64 I get the expected result, not the error.
 
I don't have a way of testing it right now.

In the tracer program (that calls ptrace) you could "trace" it yourself - after PT_SETREGS call PT_GETREGS and verify.
Or do a "raw" test - set the pc of the debugee to 0xcafec0de and check if it segfaults on that address. That way you'll know it sets the registers and problem is elsewhere.

edit:
Did some quick checks of the code you shared. On 15 I see debug.ptrace_attach_transparent sysctl seems to affect your program. Disabling it makes it work.
I'm assuming that with the syctl enabled it changes the stack and/or state it is when you detach from it. It would be worth checking out further.

As I was writting this I realised that of course it sets the pc as you wanted - it hits the assert() as expected. :)
That’s interesting. I’ve had issues with the transparent attach before. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287050

And I was having the same problem then but I forgot about it since most of the time I work with RELEASE versions. Now I’m back on the case as I’d like there to be no regressions in Valgrind by the time 15.0-RELEASE is out. Unfortunately the sysctl needs privileges to change the value so I can't just modify it in main().

I’ve opened a new bugzilla item https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=290008
 
I did modify the debugger code to spray the regs with 0xcc:
Code:
        memset(&regs, 0xcc, sizeof regs);
and let the client crash. The restored context was mixed - only few regs still held the 0xcc contents. Given the info from D50556 it makes sense. I didn't dig through kernel code.
 
Back
Top