Trigger an interrupt when the value of a memory location is modified in FreeBSD

Avk · Aug 5, 2022

Is it possible to generate an interrupt when the value of a variable or memory location get modified in FreeBSD or Linux environment using C program ?

In a C application there is an dynamically allocated array which is being used/modified from multiple locations. The application is pretty large and complex, it is difficult to trace all the places the array being used or modified from. The problem is in some condition/flow the array[2] element become 0 which is not expected as per this application. I can't run the application using gdb to debug this issue (because of some constraint). The only way to debug this issue is to modify the source code and run the binary where the issue is happening.

Is it possible to generate an interrupt when the arra[2] element is modified and print the backtrace to know which part of the codebase has modified it ?

Thanks!!!

zirias@ · Aug 5, 2022

Avk said:
I can't run the application using gdb to debug this issue (because of some constraint).

I guess it would make sense to elaborate on that first. Not that I don't believe you, but it's possible you overlooked something. I mean, that's exactly what debuggers are designed for.

jbo@ · Aug 5, 2022

This thread seems more suitable for the "Userland programming & scripting" category.

VladiBG · Aug 5, 2022

GDB - Conditional Breakpoints — Debugging documentation

Andriy · Aug 5, 2022

I think that this would be a better suggestion given the problem: https://undo.io/resources/gdb-watchpoint/watchpoints-more-than-watch-and-continue/
And, of course, https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html

reddy · Aug 5, 2022

From what I understand, the problem of the author is that because of a number of business constraints, he is not in a position to run an interactive debugging session in the environment where the problem occurs. His only option is therefore to try to log the issue, thus his question. Apparently they are not trying to debug something at development time on their local machine, they are trying to chase down a production issue. It is very common not to be able to attach a debugger in production, personally even if I could I probably wouldn't.

Since this is a not a crash that would leave an helpful stacktrace with debugging symbol, they are asking for a way to log the changes made to a variable, potentially using interrupts. The situation is made difficult by the fact that the variable is modified by many parts of their complex program.

In terms of solution, my 2 cents is that unless people familiar with C programming can propose an interrupt-based trick, I'd say the best approach may be to wrap the array in a method doing the logging you need before reading or changing the value. Even if you do not use an IDE that would make it convenient to find the places using of the variable, just wrap the variable in a method, this will break the build and the compiler errors will show you all the places using the variable so that you can update the data access code. Encapsulating access to shared-state is a best-practice anyway.

Edit: since C does not have classes or namespaces, just rename the variable to break the build, and moving forward ensure that all data access is done through your method.

ralphbsz · Aug 5, 2022

It can be done, and I've worked in a group that had such a tool. It is difficult. In a nutshell, you end up implementing your own debugger. Here is what we did: Modify the OS kernel (which we had control of), to add a special debugging hook. That hook is given the address range of the variables you want to protect against modification. The kernel then takes the VM page(s) containing that address range, and write-protects it. Anytime someone tries to modify anything in those pages, the page fault handler will start. The page fault handler has been modified to identify the pages that are being "watched": It looks at the address the page fault happened at, and checks whether it is the variable(s) being watched. If no, it manually performs the write, and then lets the program continue. If yes, it logs the write to a kernel trace log (including the address of the instruction that caused the trap, and the call stack), then performs the write and restart. The problem with this (other than the sheer complexity, and the need to have experts who understand kernel, VM subsystem, and processor architecture) is that it destroys the program's performance, and leaves very large log files. One way to pre-process the log files is to decorate the call paths that are "allowed" to write to the variable with a "unlock/lock" pair (which go to a separate kernel hook that temporarily disables the page protection).

With a team of good people (perhaps a half dozen), and with existing infrastructure (trace collection, kernel configuration), this could be implemented in a few weeks.

No, I don't know of an existing solution.

jbo@ · Aug 6, 2022

reddy Wow! Please teach me the skills of extracting this much detail from as little information as provided by OP's initial post! I am impressed!

ralphbsz If you're constrained to an environment not allowing to attach a debugger to a userland application, how is modifying the kernel an option (i.e. "allowed")? Surely you wouldn't want to modify the kernel of your production environment either, right? I'm honestly asking/interested.

ralphbsz · Aug 6, 2022

Our problem was not that we were not allowed to use a debugger. Debuggers just make the program run too slow. Most debuggers implement watchpoints by running the program one instruction at a time, then checking the value of the watched variable after each instruction. That reduces performance by a large factor (10 or 100), uniformly for all parts of the program. That slowdown may make testing impossible. The technique of using the page protection mechanism to look for unauthorized changes only slows down when there are writes to the area that contains the watched variable is being written to. In many cases, a program first initializes lots of data structures, then uses them. You can turn the "watching" on only after initialization, and then run at reasonable performance.

kpedersen · Aug 6, 2022

Possibly mprotect(2) the page (mmap(2) that dynamically allocated array). Handle the signal, and check that memory location on each access?

ralphbsz · Aug 6, 2022

Can you restart the code after the signal? Meaning, if you find that the access was valid, unprotect the page, run the write instruction again, then re-protect it? Doing that in user space means you end up implementing the core of the debugger inside your program.

Actually, I saw this morning that gdb can do memory watch points without single-stepping the code, but only on HP-UX with PA-RISC, and on Linux with x86. I bet it uses a page protection technique, probably with help from kernel-based debugging aids. Don't know whether that extends to other FOSS Unixes likes FreeBSD, and to amd64.

_martin · Aug 6, 2022

Guts of the gdb is ptrace syscall. If you can't do gdb from administrative point of view (i.e. tracing is prohibited) you won't be able to watch over it.
To add to the Andriy's link: gdb internals: watchpoints. HW watchpoints are HW dependent (kind of stating the obvious, I know).

If you can ptrace you'd fork (or create thread) and call PT_TRACE_ME (Linux: PTRACE_TRACEME) within the code you control. But then if you can ptrace gdb is the way to go.

Now can you hack around it? As kpedersen said - mprotect the page. Catch the sigsegv signal and analyze what is trying to write there. If conditions are true restart (continue) the code. If not you get the answer what you were looking for. setjmp(3) and sigsetjmp(3) (and friends) are very helpful. But if you go this route I bet it would be easier to analyze the actual application than to do this.
Or enable tracing on the host.

Avk · Aug 8, 2022

zirias@ said:
I guess it would make sense to elaborate on that first. Not that I don't believe you, but it's possible you overlooked something. I mean, that's exactly what debuggers are designed for.

In production environment we are not allowed to run using gdb as correctly mentioned by reddy.

zirias@ · Aug 8, 2022

That's why you should have an identical testing environment. Identical (virtual) machines, network configuration, operating systems, libraries and services used, and so on. If data is involved, it must be cloned (and, if necessary, anonymized) from production. I know many don't have that, but IMO, it's the only sane way. Trying to debug something "in production" is often attempted with all sorts of trickery, it's sometimes successful, sometimes not, nothing you should ever rely on...

Avk · Aug 8, 2022

Unfortunately we don't have repro in local environment ...

Avk · Aug 8, 2022

reddy said:
From what I understand, the problem of the author is that because of a number of business constraints, he is not in a position to run an interactive debugging session in the environment where the problem occurs. His only option is therefore to try to log the issue, thus his question. Apparently they are not trying to debug something at development time on their local machine, they are trying to chase down a production issue. It is very common not to be able to attach a debugger in production, personally even if I could I probably wouldn't.

Since this is a not a crash that would leave an helpful stacktrace with debugging symbol, they are asking for a way to log the changes made to a variable, potentially using interrupts. The situation is made difficult by the fact that the variable is modified by many parts of their complex program.

In terms of solution, my 2 cents is that unless people familiar with C programming can propose an interrupt-based trick, I'd say the best approach may be to wrap the array in a method doing the logging you need before reading or changing the value. Even if you do not use an IDE that would make it convenient to find the places using of the variable, just wrap the variable in a method, this will break the build and the compiler errors will show you all the places using the variable so that you can update the data access code. Encapsulating access to shared-state is a best-practice anyway.

Edit: since C does not have classes or namespaces, just rename the variable to break the build, and moving forward ensure that all data access is done through your method.

As far as I understand, if we rename the variable/array name it detects the error at compile time.
The limitations are :
1> If the array value is modified by some pointer operation (one pointer points that array element and then modify this at run time), it won't be detected by compiler. Of course it would tell which pointer points to that array element or that array. We need to trace all such pointers separately.
2> In our case that array is defined as macro and it is referred/used by multiple queues. If we rename then compilation error comes from all the places, including the queue we are concerned with.

I known there is no straight forward way to debug this.
Thanks !!!

Crivens · Aug 8, 2022

You want the code to stop when array[2] gets written, yes? Maybe drop a core dump for analysis?

Avk · Aug 8, 2022

Crivens said:
You want the code to stop when array[2] gets written, yes? Maybe drop a core dump for analysis?

Not exactly. For that we need to know where in the code array[2] is modified to 0 .... we don't know the place. And it is modified from multiple places. Also any value (address) other than 0 (null) is not an issue.

elgrande · Aug 8, 2022

One possibility would be to create a setter/wrapper method for array elements with a trace option and change to using the setter instead of changing the array element directly.

Crivens · Aug 8, 2022

That is not a planned programm flow. You should.. no, you HAVE to fix that. Find where these wild pointers are placed and stamp them out.

You may have success with watch registers in the CPU core, you may also place the array on a page boundary so that [1] is on page A and [2] is on page A+1. Then protect A+1 against write, and collect the core dumps. All this is debugging, you need to get that code into gear. It can't be a permanent thing in the program.

Andriy · Aug 8, 2022

Avk said:
Unfortunately we don't have repro in local environment ...

You can cheat and emulate whatever the debugger would do using ptrace(2) interface. I think that if you are on x86 you should be able to set a hardware watchpoint on the memory location that interests you.

Avk · Aug 9, 2022

Andriy said:
You can cheat and emulate whatever the debugger would do using ptrace(2) interface. I think that if you are on x86 you should be able to set a hardware watchpoint on the memory location that interests you.

But for setting a h/w watchpoint, first I need to do run the application using 'gdb' .... correct ? We need some mechanism that we don't loose the performance of the box/application in production environment.

_martin · Aug 9, 2022

Avk said:
But for setting a h/w watchpoint, first I need to do run the application using 'gdb' .... correct ? We need some mechanism that we don't loose the performance of the box/application in production environment.

Yes, that's the benefit of the HW watchpoints, you don't lose performance.
It doesn't make sense to create your own debugger within the code (using ptrace and logic around HW debug registers). If you need to do that you can simply use gdb instead.

Note if you attach debugger your application will stop. You could create gdb script and attach to application with it (script will have commands to set watchpoint and continue). But still once the watchpoint is hit application will stop. So it's not only about performance but you must keep in mind that application will stop when you hit the watchpoint. It's worth mentioning that watch command in gdb will tell you if HW watchpoint is in place when you set it.

Have you considered printf debugging first? Log with printf anywhere before array is modified, either directly or indirectly with the pointer.

Crivens · Aug 9, 2022

I consider writing to an array w.o. being able to find out/know where it is done a bug. Refactor early. Refactor often. So you don't end up in spaghettie code hell.

Avk · Aug 9, 2022

_martin said:
Yes, that's the benefit of the HW watchpoints, you don't lose performance.
It doesn't make sense to create your own debugger within the code (using ptrace and logic around HW debug registers). If you need to do that you can simply use gdb instead.

Note if you attach debugger your application will stop. You could create gdb script and attach to application with it (script will have commands to set watchpoint and continue). But still once the watchpoint is hit application will stop. So it's not only about performance but you must keep in mind that application will stop when you hit the watchpoint. It's worth mentioning that watch command in gdb will tell you if HW watchpoint is in place when you set it.

Have you considered printf debugging first? Log with printf anywhere before array is modified, either directly or indirectly with the pointer.

Sorry, probably I didn't understand your comment completely.

When you talk about HW watchpoint, do you mean to set a HW watchpoint I need gdb, but at the same time it will not lose overall performance ?

I am using the following HW model with FreeBSD 11.2 running.
Intel(R) Xeon(R) CPU

Thanks !!!

Trigger an interrupt when the value of a memory location is modified in FreeBSD

Avk

zirias@

jbo@

VladiBG

Andriy

reddy

ralphbsz

jbo@

ralphbsz

kpedersen

ralphbsz

_martin

Avk

zirias@

Avk

Avk

Crivens

Administrator

Avk

elgrande

Crivens

Administrator

Andriy

Avk

_martin

Crivens

Administrator

Avk