Trigger an interrupt when the value of a memory location is modified in FreeBSD

I consider writing to an array w.o. being able to find out/know where it is done a bug. Refactor early. Refactor often. So you don't end up in spaghettie code hell.
I am purely guessing but I do suspect that it might not even be due to awkward code but instead some sort of memory error that is overflowing elsewhere "nearby" and writing all over the OP's dynamic array.

AddressSanitizer and Valgrind can only reliably detect access to canaries / bounds padding, whereas if the code is really broken, it could jump right past them!
 
When you talk about HW watchpoint, do you mean to set a HW watchpoint I need gdb, but at the same time it will not loose overall performance ?
Not neccesrily. There are interfaces to the special registers of modern CPUs, such as performance counters. You may be able to utilize them directly. Gdb may use them for you, just check if the performance drops at all or unacceptably.
 
To reference a long running insider joke : what game is this about?
I'm most likely not around for long enough to get this - please enlighten me.

In any case, the intention was to simply warn about this (while also aligning with the forum guidelines).
Given that this topic is seemingly unrelated to "a problem with FreeBSD" I don't see an issue with this. However, it might still be worth pointing out as OP appears to run a production server running an OS that is EOL for almost three years.
 
I'm most likely not around for long enough to get this - please enlighten me.
Ah, I didn't link the two together before Crivens mentioned it.

Is this the specific version of FreeBSD required? Did the OP just turn into a suspect ;)

(I especially love that sticky. Asking for help for the illicit software on the very post that is warning against doing so is... art; as is the self referential link to the no-METIN sticky)
 
Last edited:
When you talk about HW watchpoint, do you mean to set a HW watchpoint I need gdb, but at the same time it will not loose overall performance ?
Please check out the link I mentioned above about gdb's internals. When using gdb's command watch gdb will try to use HW assisted breakpoint for the condition. If this can be set you won't have any performance degradation as HW (cpu) is assisting. If not sw breakpoint is used. For watch that means to go step-by-step (instruction by instruction) and verify if condition is met. That is very slow.

Now you said you can't use gdb in production but you can modify the code. The thing is even in the code you'd do what gdb is doing and most likely it won't be that good.

And to reiterate, gdb will stop the application once breakpoint is hit. That's why I mentioned the printf debugging, just to see what is modifying the code and what with.
 
But for setting a h/w watchpoint, first I need to do run the application using 'gdb' .... correct ? We need some mechanism that we don't loose the performance of the box/application in production environment.
Did you read the manual page I linked?
 
Possibly mprotect(2) the page (mmap(2) that dynamically allocated array). Handle the signal, and check that memory location on each access?
So far mprotect() looks like most effective for debugging my issue.
With this I am able to trace (using signal handler) most of the read/write operation to the memory location I am interested in.

Whenever someone try to write some data to that memory location a signal is generated. With this it is now flooded with signals.
I am thinking is there any way to generate signal only when someone tries to write 0 value on that memory location ?
Or when the signal is generated, is there any way to know (from signal handler) what value it was trying to write due to which this signal is generated ?

Thanks.
 
Glad to hear you successfully have this (potentially fiddly to implement) debugging code in place.

With the mprotect I am assuming that you only allow read access (PROT_READ). The signal only gets triggered during a write
So from your signal handler, can you simply read the value (for now just make the dynamic array pointer a global and use something like extern to access it from the signal handler) and ignore if 0 or abort() if not (and then grab the coredump for a stacktrace).
 
  • Thanks
Reactions: Avk
Glad to hear you successfully have this (potentially fiddly to implement) debugging code in place.

With the mprotect I am assuming that you only allow read access (PROT_READ). The signal only gets triggered during a write
So from your signal handler, can you simply read the value (for now just make the dynamic array pointer a global and use something like extern to access it from the signal handler) and ignore if 0 or abort() if not (and then grab the coredump for a stacktrace).
Here is a sample program that I tested. Based on the output it is observed that the interrupt is generated multiple times although I try to write only once (ptr[0] = 0). I was expecting interrupt to be generated only once.

mprotect(ptr, size, PROT_READ); // Only read permission is provided
ptr[0] = 0;


Also from the interrupt handler it is printing the value 9 although I tried to write 0. Is there any way to know that SIGSEGV (11) interrupt is generated because application tries to write 0 at specified memory location ?

1 : Got 11 for address location : 0x801016640 : 0x801016640 => ptr[0] = 9


C:
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>

static int size;
static char *m;
int *ptr;
static unsigned long cnt;

void handler(int sig_num, siginfo_t *sig, void *unused) {
        cnt++;
        printf("%d : Got %d for address location : 0x%lx : 0x%lx => ptr[0] = %d\n", cnt, sig_num, (long)sig->si_addr, &ptr[0], ptr[0]);
        if (5 == cnt)  // when count is 5, provide  READ/WRITE permission
                mprotect(ptr, size, PROT_READ | PROT_WRITE);
}

int main()
{
        struct sigaction s;
        memset(&s, 0, sizeof(s));
        s.sa_flags = SA_SIGINFO;
        sigemptyset(&s.sa_mask);
        s.sa_sigaction = handler;
        if (sigaction(SIGSEGV, &s, NULL) == -1)
        {
                perror("sigaction");
                return(1);
        }

        size = sysconf(_SC_PAGE_SIZE) * sizeof(int);
        ptr = (int*)malloc(size);
        printf("Starting ....0x%lx : size = %lu\n", ptr, size);
        sleep(5);
        ptr[0] = 9;

        mprotect(ptr, size, PROT_READ); // Only read permission is provided
        ptr[0] = 0;
        printf("All completed...\n");
        //munmap(ptr, size); ****/
        return 0;
}

Output :
==============================

[aadhya@dut078-client02 ~]$ ./mprotect
Starting ....0x801016640 : size = 16384
1 : Got 11 for address location : 0x801016640 : 0x801016640 => ptr[0] = 9
2 : Got 11 for address location : 0x801016640 : 0x801016640 => ptr[0] = 9
3 : Got 11 for address location : 0x801016640 : 0x801016640 => ptr[0] = 9
4 : Got 11 for address location : 0x801016640 : 0x801016640 => ptr[0] = 9
5 : Got 11 for address location : 0x801016640 : 0x801016640 => ptr[0] = 9
All completed...
===============================
 
I am purely guessing but I do suspect that it might not even be due to awkward code but instead some sort of memory error that is overflowing elsewhere "nearby" and writing all over the OP's dynamic array.

AddressSanitizer and Valgrind can only reliably detect access to canaries / bounds padding, whereas if the code is really broken, it could jump right past them!

I don't know about asan, but this is a runtime configurable for Valgrind

--redzone-size=<number> set minimum size of redzones added before/after
heap blocks (in bytes). [16]
 
see
From the signal handler if we print backtrace, it doesn't give the trace of the code that caused the signal, rather it gives something like this :

0x400c55 <print_trace+0x1f> at /data/home/user/mprotect
0x400d2d <handler+0x6a> at /data/home/user/mprotect
 
From the signal handler if we print backtrace, it doesn't give the trace of the code that caused the signal, rather it gives something like this :

0x400c55 <print_trace+0x1f> at /data/home/user/mprotect
0x400d2d <handler+0x6a> at /data/home/user/mprotect

That's to be expected. The return address from the signal handler is the "retpoline" (portmanteau word for "return trampoline"). The retpoline is an assembler stub function that just calls the sigreturn syscall. Sigreturn will get the original instruction address from the mcontext that was synthesized when the signal triggered.

Multithread code is a bit different, the user signal handler is not called directly. Instead 'thr_sighandler' gets called. This calls the user sighandler (plus other stuff like masking and under some conditions locking), and on return from the user routine calls sigreturn itself.

It should be possible to detect the retpoline and work out the return address so that more of the stack can be displayed (I believe that this is what lldb/gdb do).
 
  • Like
Reactions: Avk
That's to be expected. The return address from the signal handler is the "retpoline" (portmanteau word for "return trampoline"). The retpoline is an assembler stub function that just calls the sigreturn syscall. Sigreturn will get the original instruction address from the mcontext that was synthesized when the signal triggered.

Multithread code is a bit different, the user signal handler is not called directly. Instead 'thr_sighandler' gets called. This calls the user sighandler (plus other stuff like masking and under some conditions locking), and on return from the user routine calls sigreturn itself.

It should be possible to detect the retpoline and work out the return address so that more of the stack can be displayed (I believe that this is what lldb/gdb do).

As per my understanding whenever a signal is generated from a process, code inside the trampoline page is executed to move the control in kernel mode . Then it jump to the user mode to execute user-defined-signal-handler as shown in the diagram below . Hence if I print backtrace within signal-hander, it doesn't show the location where signal was generated from.

Now to know the location of the code that caused signal generated, probably I need to hack the trampoline code and print the backtrace there. Is it really possible ?

signal.PNG
 
Andriy Here is the code :

C:
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <ucontext.h>

#include <stdio.h>
#include <execinfo.h>
static int size;
static char *m;
int *ptr;
static unsigned long cnt;

void print_trace(void) {
    char **strings;
    size_t i, size;
    enum Constexpr { MAX_SIZE = 1024 };
    void *array[MAX_SIZE];
    size = backtrace(array, MAX_SIZE);
    strings = backtrace_symbols(array, size);
    for (i = 0; i < size; i++)
        printf("%s\n", strings[i]);
    puts("");
    free(strings);
}

void handler(int sig_num, siginfo_t *sig, void *unused) {
        //ucontext_t *u = (ucontext_t *)unused;
        //unsigned char *pc = (unsigned char *)u->uc_mcontext.gregs[REG_RIP];
        cnt++;
        printf("%d : Got %d for address location : 0x%lx : 0x%lx => ptr[0] = %d\n", cnt, sig_num, (long)sig->si_addr, &ptr[0], ptr[0]);
        print_trace();
        if (5 == cnt)
                mprotect(ptr, size, PROT_READ | PROT_WRITE);
}

void test()
{
        printf("-- %s --\n", __FUNCTION__);
        print_trace();
}

int main()
{
        struct sigaction s;
        memset(&s, 0, sizeof(s));
        s.sa_flags = SA_SIGINFO;
        s.sa_sigaction = handler;
        sigemptyset(&s.sa_mask);

        if (sigaction(SIGSEGV, &s, NULL) == -1)
        {
                perror("sigaction");
                return(1);
        }

        size = sysconf(_SC_PAGE_SIZE) * sizeof(int);
        ptr = (int*)malloc(size);
        printf("Starting ....0x%lx : size = %lu\n", ptr, size);
        sleep(5);
        ptr[0] = 9;

        mprotect(ptr, size, PROT_READ);

        ptr[0] = 0;

        printf("All completed...\n");
        //munmap(ptr, size); ****/
        test();
        return 0;
}
 
Now to know the location of the code that caused signal generated, probably I need to hack the trampoline code and print the backtrace there. Is it really possible ?

As I said, yes it is possible (some gdb on Linux output):

Breakpoint 2, handle_vtalrm (sig=26) at 452274.c:13
13 ticks++;
(gdb) bt
#0 handle_vtalrm (sig=26) at 452274.c:13
#1 <signal handler called>
#2 0x00007ffff7afcfd0 in __write_nocancel () from /lib64/libc.so.6
#3 0x000000000040124e in main (argc=1, argv=0x7fffffffcdf8) at 452274.c:31

Take a look at the lldb or gdb source to try to see how they do it. I don't know exactly how it works, but my guess is that you need to do 2 things

  • detect the retpoline or thr_sigreturn functions
  • dig out the mcontext pointer (it's somewhere on the stack) and from that you can get 'addr'
 
The thing is even in the code you'd do what gdb is doing and most likely it won't be that good.

Take a look at the lldb or gdb source to try to see how they do it.
:) That's what I've mentioned earlier. While this does make an interesting issue in general if the focus is to fix the application that is in prod then using gdb is really the way to go. Anything else will cause more issues in the prod.

Avk Why is it a problem to replicate this in your own environment ? E.g. do a copy of the physical box to a VM and try there?

One irrelevant point from my side but I couldn't unsee this :)
if (5 == cnt) mprotect(ptr, size, PROT_READ | PROT_WRITE);
This drove me crazy when I read code from my French colleagues. I thought it's a "French thing". My brain is not able to process the code like this, my hemispheres were fighting over this. I always had to manually redo it to if(cnt==5) in my local copy to make the code readable for me.
 
This drove me crazy when I read code from my French colleagues. I thought it's a "French thing". My brain is not able to process the code like this, my hemispheres were fighting over this. I always had to manually redo it to if(cnt==5) in my local copy to make the code readable for me.

That's an old-timers thing, to prevent against accidental typos causing errors because of assignment in if statements. If you type = instead of == then

if (cnt = 5)

is legal and will always be true

if (5 = cnt)

will not compile

Most compilers will warn about this now and ask you to add parens to make it clear that it is deliberate.
 
Never thought about it that way. Anyway it's just terrible, even worse compared to switching from intel to att asm syntax and back (which is an interesting problem too but I kinda got used to that). Condition statements written as that one above make me lose focus and interrupt my flow in the head, if that makes sense to others.
 
I think that it's a problem with backtrace(3) (or rather libunwind) then. It should be able to walk across a signal frame.
See PR 243746.

How do you build the program?
Maybe try using libunwind from devel/libunwind
I used this to build the program :
gcc mprotect.c -rdynamic -fno-omit-frame-pointer -g -std=c99 -fno-inline -lexecinfo -o mprotect1

Also modified the program to include both - dump_trace() and print_trace() functions.

C:
static void dump_trace() {
        size_t max_frames = 1024;
        void *buffer[max_frames];
        size_t calls = backtrace(buffer, max_frames);
        printf("##### %s #####\n", __FUNCTION__);
        fprintf(stderr, "dump_trace - have %zu frames\n", calls);
        backtrace_symbols_fd(buffer, calls, 2);
        //_Exit(EXIT_FAILURE);
}

With this I see the same kind of output for backtrace.

##### print_trace #####
0x400e05 <print_trace+0x1f> at /data/home/aadhya/mprotect1
0x400fcd <handler+0x6a> at /data/home/aadhya/mprotect1

##### dump_trace #####
dump_trace - have 2 frames
0x400f0c <dump_trace+0x85> at /data/home/aadhya/mprotect1
0x400fd7 <handler+0x74> at /data/home/aadhya/mprotect1


Note that I have not tried that patch (PR 243746) or library (devel/libunwind) yet.
 
[FONT=monospace]Avk[/FONT] Why is it a problem to replicate this in your own environment ? E.g. do a copy of the physical box to a VM and try there?
Actually we tried a lot to repro this issue in local environment, but it's really difficult sometime to replicate customer environment (when there are huge network traffic of different types and in different sequence). If there was a local repro of this issue, it would have been lot more easier to debug this using gdb.
 
Back
Top