FreeBSD 11.2 has been EOL for a while.I am using the following HW model with FreeBSD 11.2 running.
FreeBSD 11.2 has been EOL for a while.I am using the following HW model with FreeBSD 11.2 running.
I am purely guessing but I do suspect that it might not even be due to awkward code but instead some sort of memory error that is overflowing elsewhere "nearby" and writing all over the OP's dynamic array.I consider writing to an array w.o. being able to find out/know where it is done a bug. Refactor early. Refactor often. So you don't end up in spaghettie code hell.
Not neccesrily. There are interfaces to the special registers of modern CPUs, such as performance counters. You may be able to utilize them directly. Gdb may use them for you, just check if the performance drops at all or unacceptably.When you talk about HW watchpoint, do you mean to set a HW watchpoint I need gdb, but at the same time it will not loose overall performance ?
To reference a long running insider joke : what game is this about?FreeBSD 11.2 has been EOL for a while.
I'm most likely not around for long enough to get this - please enlighten me.To reference a long running insider joke : what game is this about?
Ah, I didn't link the two together before Crivens mentioned it.I'm most likely not around for long enough to get this - please enlighten me.
Please check out the link I mentioned above about gdb's internals. When using gdb's command watch gdb will try to use HW assisted breakpoint for the condition. If this can be set you won't have any performance degradation as HW (cpu) is assisting. If not sw breakpoint is used. For watch that means to go step-by-step (instruction by instruction) and verify if condition is met. That is very slow.When you talk about HW watchpoint, do you mean to set a HW watchpoint I need gdb, but at the same time it will not loose overall performance ?
Did you read the manual page I linked?But for setting a h/w watchpoint, first I need to do run the application using 'gdb' .... correct ? We need some mechanism that we don't loose the performance of the box/application in production environment.
So far mprotect() looks like most effective for debugging my issue.Possibly mprotect(2) the page (mmap(2) that dynamically allocated array). Handle the signal, and check that memory location on each access?
Here is a sample program that I tested. Based on the output it is observed that the interrupt is generated multiple times although I try to write only once (ptr[0] = 0). I was expecting interrupt to be generated only once.Glad to hear you successfully have this (potentially fiddly to implement) debugging code in place.
With the mprotect I am assuming that you only allow read access (PROT_READ). The signal only gets triggered during a write
So from your signal handler, can you simply read the value (for now just make the dynamic array pointer a global and use something like extern to access it from the signal handler) and ignore if 0 or abort() if not (and then grab the coredump for a stacktrace).
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
static int size;
static char *m;
int *ptr;
static unsigned long cnt;
void handler(int sig_num, siginfo_t *sig, void *unused) {
cnt++;
printf("%d : Got %d for address location : 0x%lx : 0x%lx => ptr[0] = %d\n", cnt, sig_num, (long)sig->si_addr, &ptr[0], ptr[0]);
if (5 == cnt) // when count is 5, provide READ/WRITE permission
mprotect(ptr, size, PROT_READ | PROT_WRITE);
}
int main()
{
struct sigaction s;
memset(&s, 0, sizeof(s));
s.sa_flags = SA_SIGINFO;
sigemptyset(&s.sa_mask);
s.sa_sigaction = handler;
if (sigaction(SIGSEGV, &s, NULL) == -1)
{
perror("sigaction");
return(1);
}
size = sysconf(_SC_PAGE_SIZE) * sizeof(int);
ptr = (int*)malloc(size);
printf("Starting ....0x%lx : size = %lu\n", ptr, size);
sleep(5);
ptr[0] = 9;
mprotect(ptr, size, PROT_READ); // Only read permission is provided
ptr[0] = 0;
printf("All completed...\n");
//munmap(ptr, size); ****/
return 0;
}
I am purely guessing but I do suspect that it might not even be due to awkward code but instead some sort of memory error that is overflowing elsewhere "nearby" and writing all over the OP's dynamic array.
AddressSanitizer and Valgrind can only reliably detect access to canaries / bounds padding, whereas if the code is really broken, it could jump right past them!
From the signal handler if we print backtrace, it doesn't give the trace of the code that caused the signal, rather it gives something like this :see
How to write a signal handler to catch SIGSEGV?
I want to write a signal handler to catch SIGSEGV. I protect a block of memory for read or write using char *buffer; char *p; char a; int pagesize = 4096; mprotect(buffer,pagesize,PROT_NONE) Thisstackoverflow.com
From the signal handler if we print backtrace, it doesn't give the trace of the code that caused the signal, rather it gives something like this :
0x400c55 <print_trace+0x1f> at /data/home/user/mprotect
0x400d2d <handler+0x6a> at /data/home/user/mprotect
That's to be expected. The return address from the signal handler is the "retpoline" (portmanteau word for "return trampoline"). The retpoline is an assembler stub function that just calls the sigreturn syscall. Sigreturn will get the original instruction address from the mcontext that was synthesized when the signal triggered.
Multithread code is a bit different, the user signal handler is not called directly. Instead 'thr_sighandler' gets called. This calls the user sighandler (plus other stuff like masking and under some conditions locking), and on return from the user routine calls sigreturn itself.
It should be possible to detect the retpoline and work out the return address so that more of the stack can be displayed (I believe that this is what lldb/gdb do).
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <ucontext.h>
#include <stdio.h>
#include <execinfo.h>
static int size;
static char *m;
int *ptr;
static unsigned long cnt;
void print_trace(void) {
char **strings;
size_t i, size;
enum Constexpr { MAX_SIZE = 1024 };
void *array[MAX_SIZE];
size = backtrace(array, MAX_SIZE);
strings = backtrace_symbols(array, size);
for (i = 0; i < size; i++)
printf("%s\n", strings[i]);
puts("");
free(strings);
}
void handler(int sig_num, siginfo_t *sig, void *unused) {
//ucontext_t *u = (ucontext_t *)unused;
//unsigned char *pc = (unsigned char *)u->uc_mcontext.gregs[REG_RIP];
cnt++;
printf("%d : Got %d for address location : 0x%lx : 0x%lx => ptr[0] = %d\n", cnt, sig_num, (long)sig->si_addr, &ptr[0], ptr[0]);
print_trace();
if (5 == cnt)
mprotect(ptr, size, PROT_READ | PROT_WRITE);
}
void test()
{
printf("-- %s --\n", __FUNCTION__);
print_trace();
}
int main()
{
struct sigaction s;
memset(&s, 0, sizeof(s));
s.sa_flags = SA_SIGINFO;
s.sa_sigaction = handler;
sigemptyset(&s.sa_mask);
if (sigaction(SIGSEGV, &s, NULL) == -1)
{
perror("sigaction");
return(1);
}
size = sysconf(_SC_PAGE_SIZE) * sizeof(int);
ptr = (int*)malloc(size);
printf("Starting ....0x%lx : size = %lu\n", ptr, size);
sleep(5);
ptr[0] = 9;
mprotect(ptr, size, PROT_READ);
ptr[0] = 0;
printf("All completed...\n");
//munmap(ptr, size); ****/
test();
return 0;
}
Now to know the location of the code that caused signal generated, probably I need to hack the trampoline code and print the backtrace there. Is it really possible ?
The thing is even in the code you'd do what gdb is doing and most likely it won't be that good.
That's what I've mentioned earlier. While this does make an interesting issue in general if the focus is to fix the application that is in prod then using gdb is really the way to go. Anything else will cause more issues in the prod.Take a look at the lldb or gdb source to try to see how they do it.
This drove me crazy when I read code from my French colleagues. I thought it's a "French thing". My brain is not able to process the code like this, my hemispheres were fighting over this. I always had to manually redo it toif (5 == cnt) mprotect(ptr, size, PROT_READ | PROT_WRITE);
if(cnt==5)
in my local copy to make the code readable for me.This drove me crazy when I read code from my French colleagues. I thought it's a "French thing". My brain is not able to process the code like this, my hemispheres were fighting over this. I always had to manually redo it toif(cnt==5)
in my local copy to make the code readable for me.
I used this to build the program :I think that it's a problem with backtrace(3) (or rather libunwind) then. It should be able to walk across a signal frame.
See PR 243746.
How do you build the program?
Maybe try using libunwind from devel/libunwind
static void dump_trace() {
size_t max_frames = 1024;
void *buffer[max_frames];
size_t calls = backtrace(buffer, max_frames);
printf("##### %s #####\n", __FUNCTION__);
fprintf(stderr, "dump_trace - have %zu frames\n", calls);
backtrace_symbols_fd(buffer, calls, 2);
//_Exit(EXIT_FAILURE);
}
Actually we tried a lot to repro this issue in local environment, but it's really difficult sometime to replicate customer environment (when there are huge network traffic of different types and in different sequence). If there was a local repro of this issue, it would have been lot more easier to debug this using gdb.[FONT=monospace]Avk[/FONT] Why is it a problem to replicate this in your own environment ? E.g. do a copy of the physical box to a VM and try there?