Solved Service in C: Resident set (column RES in ps) ever increasing?

zirias@

Developer
I have a service roughly designed like this:

- a main thread dispatching connections using pselect(), allocating some context and request objects and freeing them again once the response is sent
- a pool of worker threads doing the actual processing of the requests, the individual processing jobs doing quite some dynamic allocations but of course freeing them again before finishing

The only libs linked are libc, libthr and libz.

Now, I observed the following with ps: The RES value increases with every request processed. I used devel/valgrind to debug this, found one little leak indeed, fixed it, but the behavior is still the same, with valgrind reporting no leaks whatsoever.

I also did a test without zlib, just to make sure it's not some zlib-internal memory-management. Obviously isn't.

Now my question is: am I chasing some phantom here? Maybe that's just how malloc() and/or the pager work? Serving you "new" pages cause it's faster and only cleaning up the old ones when really necessary?

Or do I have a leak somewhere that valgrind can't find? :eek:
 
try a third party malloc or just use preallocated / global / static space (by allowing just one thread at a time)
then you rule out malloc
 
How big is the project? If not absolutely ginormous, I would resort to ghetto debugging and try to comment out large sections of the code executed by the worker thread to see if you can pinpoint where the leak is happening.

If the workers do nothing at all, is there still a leak? Then you will know it is the allocation and request data that is potentially causing the leak.

If you only have one worker thread in the pool, does the leak slow down?
 
try to comment out large sections of the code
Uhm. I took out the worker threads completely, just responding with an error response to every single request (and, they are malloc'd as well), and RES remains stable. That's weird. Next step: minmal work on the threads... thanks for the idea so far.

If there's really a leak, I wonder why valgrind can't detect it :-/
 
Now tried to do all request processing synchronously, not using my thread pool. RES still increases, sometimes. valgrind still doesn't find anything suspicious. I also had another closer look at my unmodified code: when it was idle for a while, RES does not increase for a few requests. After doing many requests, it does ... and handling them is always the same code.

Back to my initial suspicion: It might be the allocator or the pager. Maybe only releasing exactly in reverse order happens immediately? :-/

Sure, serving memory from my own static area would be a way to be sure, but that's really a lot of work to do :eek:
 
Sure, serving memory from my own static area would be a way to be sure, but that's really a lot of work to do :eek:
Hmm. I wonder if it is a standard memory leak (i.e from malloc, calloc, etc) or if it is a resource leak. I am thinking the latter because Valgrind can rarely pick them up.

Does AddressSanitizer and the leak checker give any information about leaked pages?

Could the requests be leaking? How are they coming in? Sockets? If so, perhaps enabling SO_REUSEADDR / SO_REUSEPORT could help avoid a temporary resource increase.
 
Clang’s static analyzer usually finds issues with memory leaks as well. Sometimes it reports also false positives, but that we would simply ignore.

This is how I could analyze one of my public projects on a FreeBSD machine -- https://github.com/cyclaero/ContentCGI. In this case this results into issues, which I resolved already upstream some time ago. So when you repeat the same with my code, it should be clean.
  1. Install devel/llvm12

  2. cd into the working directory of the project

  3. Run make through the scripted wrapper of Clang’s analyzer which is named scan-build12
    scan-build12 make
    Code:
    scan-build: Using '/usr/local/llvm12/bin/clang-12' for static analysis
    /usr/local/llvm12/bin/../libexec/ccc-analyzer  -g0 -O3 -march=native -mssse3 -std=gnu11 -fno-pic -fvisibility=hidden -fstrict-aliasing -fstack-protector  -Wno-multichar -Wno-parentheses -Wno-empty-body -Wno-switch -Wno-deprecated-declarations -Wshorten-64-to-32  -I/usr/local/include firstresponder.c -c -o firstresponder.o
    ...
    ...
    /usr/local/llvm12/bin/../libexec/ccc-analyzer  -g0 -O3 -march=native -mssse3 -std=gnu11 -fno-pic -fvisibility=hidden -fstrict-aliasing -fstack-protector  -Wno-multichar -Wno-parentheses -Wno-empty-body -Wno-switch -Wno-deprecated-declarations -Wshorten-64-to-32  -I/usr/local/include main.c -c -o main.o
    main.c:486:32: warning: Dereference of null pointer (loaded from variable 'r') [core.NullDereference]
       while (--r >= executable && *r != '/'); r++;
                                   ^~
    main.c:890:9: warning: Although the value stored to 'rc' is used in the enclosing expression, the value is never actually read from 'rc' [deadcode.DeadStores]
       if ((rc = chdir(webroot)) == no_error)
            ^    ~~~~~~~~~~~~~~
    2 warnings generated.
    ...
    
    scan-build: Analysis run complete.
    
    scan-build: Run 'scan-view /tmp/scan-build-2021-11-20-142023-24761-1' to examine bug reports.
  4. Now comes the really interesting and absolutely cool part, which most people never had imagined that such an amazing thing could exist. Look at the resulting web page with your favorite browser:
    # scp -rpP /tmp/scan-build-2021-11-20-142023-24761-1 \
    root@obsigna.com:/usr/local/www/Obsigna/webdocs/ContentCGI_Static_Analysis


See also: https://clang-analyzer.llvm.org/scan-build.html
 
obsigna thanks for this tip as well, but:
Code:
$ scan-build12 gmake BUILDCFG=release strip
scan-build: Using '/usr/local/llvm12/bin/clang-12' for static analysis         
   [CFG]  [release: CC=/usr/local/llvm12/bin/../libexec/ccc-analyzer CXX=/usr/local/llvm12/bin/../libexec/c++-analyzer]
[...]
scan-build: Analysis run complete.
scan-build: Removing directory '/tmp/scan-build-2021-11-20-191055-63909-1' because it contains no reports.
scan-build: No bugs found.

I would have been surprised if a static analyzer would have been able to find something, but, still interesting, didn't know about this tool 👍 It might be helpful on other occassions!

Well for now, I'll observe I guess. If there's really a leak valgrind can't find, I assume the SIZE column in ps should also increase eventually. So far it doesn't…
 
Does AddressSanitizer and the leak checker give any information about leaked pages?
AFAIK, address sanitizer on FreeBSD does not support leak checking…
Could the requests be leaking? How are they coming in? Sockets? If so, perhaps enabling SO_REUSEADDR / SO_REUSEPORT could help avoid a temporary resource increase.
They're coming in on sockets created by accept(). But – this implements HTTP and with Connection: keep-alive I still see RES growing when the browser reuses the same connection (reloading before connection timeout), so it's probably not the sockets...
Can you provide a minimal compilable example program which shows the problem?
No. Well, maybe, if my assumption holds that ordering of malloc()/free() matters. I could try …
 
assuming you have mallocs only (no callocs,reallocs) you can replace all malloc/free with crap_malloc, crap_free


Code:
#define SH_SZ 1024*1024*8

void * crap_malloc(size_t sz)
{
static char *crap_core = NULL;
static size_t sz_all = 0;
void *b;
if(!crap_core) {
 crap_core = calloc(1, SH_SZ);
 }
 if(sz_all + sz > SH_SZ) {
  fprintf(stderr,"crap_core exhausted\n");
  exit(-1);
  }
 b = (void *) crap_core;
 crap_core += sz;
 sz_all += sz;
 return b;
}

void crap_free(void *p)
{

}
 
Well thanks, I have a more sophisticated malloc() serving from a static buffer myself (with a linked free-list in headers, so free() works correctly), but it's also missing realloc(), which I need in my code a few times ... typical use cases like "dynamic arrays" and processing of input where calculating the output size upfront isn't feasable. I see no use in calloc(), after all, multiplications and zero-init can be done easily, but I really need realloc() :eek:

edit: re-thinking this, I could of course do a poor-mans-realloc just chaining malloc, memcpy and free… so yes, to be sure, I might build a version using some static char/uint8_t array replacing the heap...
 
For many years I use Clang’s static analyzer a lot for all sorts of programming projects. I run it usually from within Xcode, and it finds quite frequently suspects for memory leaks. In the example code above, I just included a test leak in the threaded socket loop, and it found it:
C:
...
...
void *socketLoop(void *listenSocket)
{
   long double mt, lastfail = microtime();
   const int optval = 1;
   int rc, socket;
      
   while (!gShutdownFlag)
   {
      char *leak = malloc(64);
      strlcpy(leak, "Dies ist ein Test!", 64);
      printf("%s\n", leak);

      if ((socket = accept(*(int *)listenSocket, NULL, NULL)) < 0)
      {
...
...


Code:
scan-build: Using '/usr/local/llvm12/bin/clang-12' for static analysis
/usr/local/llvm12/bin/../libexec/ccc-analyzer  -g -O0 -march=native -mssse3 -std=gnu11 -fno-pic -fvisibility=hidden -fstrict-aliasing -fstack-protector  -Wno-multichar -Wno-parentheses -Wno-empty-body -Wno-switch -Wno-deprecated-declarations -Wshorten-64-to-32  -I/usr/local/include main.c -c -o main.o
main.c:544:21: warning: Potential leak of memory pointed to by 'leak' [unix.Malloc]
      if ((socket = accept(*(int *)listenSocket, NULL, NULL)) < 0)
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
/usr/local/llvm12/bin/../libexec/ccc-analyzer firstresponder.o fastcgi.o connection.o interim.o utils.o main.o -L/usr/local/lib -lm -lpthread -lcrypto -lssl -o ContentCGI
scan-build: Analysis run complete.
scan-build: 1 bug found.
scan-build: Run 'scan-view /tmp/scan-build-2021-11-20-175604-32058-1' to examine bug reports.

Here is a screenshot of the web page scan-build-2021-11-20-175604-32058-1 which was just generated by the above analyzer run:

Bildschirmfoto 2021-11-20 um 17.59.07.png
 
Ah! Try this:
Code:
#include <stdint.h>
#include <stdlib.h>
#include <sys/random.h>
#include <unistd.h>

void *ptr[16];
struct rnd {
    uint16_t word;
    uint8_t byte;
} dat;

int main(void)
{
    for (;;)
    {
        getrandom(&dat, sizeof dat, 0);
        int i = dat.byte & 0xf;
        size_t s = dat.word % 3072 + 1024;
        free(ptr[i]);
        ptr[i] = malloc(s);
        getrandom(ptr[i], s, 0);
        sleep(1);
    }
}

Do you see any leaks in this code?

Compile, run, observe with top -p. For me, RES grows all the time. :eek:

Edit: obsigna sure, only passing the pointer to std lib functions that are known not to store and/or free it, that's an easy catch for a static analyzer. I'd say your typical scenario where you start hunting for leaks is slightly more complex ;)

Try passing the ptr to a function in a different compilation unit. Maybe make this function store it in some static buffer. Maybe also define a function that would free the ptr from that static buffer. I'd be very surprised if the static analyzer would still catch this ;)
 
Memory allocations in main() would not be considered memory leaks, because once main is finished, all allocated memory would be released anyway. If you stay on the narrow road, when it comes to allocations of memory in sub-routines, then the analyzer can do a very good job to find occasional bugs. If you play games with it like „let’s see who is more clever, you or I!“, then the analyzer will loose, because it is not designed to be more clever than the programmer.
It is designed to exhaustively (but stupidly) follow all possible execution paths and to see what happens with the allocations,
... and with non-initialized variables - a real bug when these become used,
... and with initialized but never used ones - not a big bug, but it might show that the code does something which it was not intended to do,
... and with ..., and, and, and ...

Anyway, nobody urges anybody to use the tool. It exists, thats it.
 
obsigna I already said it's useful and I'll keep it in mind! You just have to be aware static code analyzers have natural limitations. And when it comes to leaks, in complex programs, you often have things like passing ownership of a pointer to a different module or runtime decisions when to free something, so to find bugs with that, you need runtime analysis, like clang's address sanitizer (which unfortunately doesn't support leak detection on FreeBSD) or a tool like valgrind.

edit: example of what I mean, I introduced the one leak I found with valgrind again, it looks like this (it's a hashtable with linked lists stored in the buckets):
Code:
struct Template
{
    size_t size;
    union {
        uint8_t *tmpl;
        const uint8_t *stmpl;
    };
    TmplVar *buckets[64];
    int owned;
};

// [...]

void Template_destroy(Template *self)
{
    if (!self) return;
    for (uint8_t h = 0; h < 32; ++h)
    {
        TmplVar *var = self->buckets[h];
        while (var)
        {
            TmplVar *next = var->next;
            if (var->owned) free(var->val);
            free(var);
            var = next;
        }
    }
    if (self->owned) free(self->tmpl);
    free(self);
}

Note how the destruction code only iterates over 32 of the 64 buckets to delete variables stored there. There's no way a static analyzer can detect that (and sure enough, scan-build doesn't).

covacat you can also hardcode e.g. 1234 for the size and it will stop growing pretty soon, any value up to the page size seems to show this behavior. So it seems with a fixed size up to page size, a new allocation will use the space previously freed. Otherwise, the allocator prefers to give you "new" memory. I can just assume it would clean up later. Maybe order matters as well, I didn't check yet what would happen if you allocate random sizes, but free them in exactly reverse order.

In a nutshell, I take an increasing RES value is no proof of a leak :eek: So, I'll just continue to monitor this (and hope valgrind would have found it if there was a leak).
 
it looks it's the tuning of jemalloc which prefers speed at the cost of nr of pages used
with
#include <malloc_np.h>
const char * malloc_conf = "narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0,abort_conf:true";
it grows, then it shrinks back
 
it looks it's the tuning of jemalloc which prefers speed at the cost of nr of pages used
Which is of course what you typically want in production ;)
with
#include <malloc_np.h>
const char * malloc_conf = "narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0,abort_conf:true";
Awesome, thanks! Added this behind an #ifdef in my real code and testing it, after some requests, RES doesn't change any more, as expected :cool:
 
Garbage collection; it's almost always about cleaning up. Now standard c-lib malloc and friends don't really do garbage collection, but when you get into the nuts and bolts of the kernel where the memory actually gets allocated to your process, you wind up having what I'll call "delayed reclaimation".

Kernel typically allocates PAGE_SIZE buffers from somewhere, whacks off a chunk, then hands it to your process. You eventually free it, kernel marks it as available, and then what? Well, kernel is not going to release anything less than a page, a process often will alloc, use, free buffers of the same size, so kernel keeps is handy for you (the speed part of jemalloc).
Eventually the kernel will coalesce your freed memory (memory pressure, specific call) and then start to give pages back. (delayed reclaimation)
The timing and reasons for when and how are all related to how much memory your process actually uses at any given point in time.

Languages like Java, Lisp and others have a true garbage collection scheme where the user just allocates never frees because the allocations and usages are kept track of and when something is not being used (referenced) it gets freed for you. Depending on your patterns you can actually run out of memory in a bursty allocation pattern because they don't get freed quickly enough based on standard config of the garbage collector. You tweak the config and all of a sudden, no more memory issues.

Zirias did you try running the service under a ulimit/rlimit value? I wonder if you do that, would jemalloc actually do a better job of reclaiming freed memory.
 
did you try running the service under a ulimit/rlimit value? I wonder if you do that, would jemalloc actually do a better job of reclaiming freed memory.
I didn't, but that would be my expectation for sure – otherwise, it would be pretty broken :-/

For me, it's now good enough to know how to configure it for "immediate cleanup". With these settings, performance is worse indeed (I noticed requests taking longer!), but it's a nice way to verify your own code doesn't produce any memory leaks.
 
  • Like
Reactions: mer
i tried the rlimit thing for the test code Zirias posted and had no effect. Then i read setrlimit manpage at it seems rss limit its a kind of vm hint / soft quota.
Code:
 RLIMIT_RSS      When there is memory pressure and swap is available,
                     prioritize eviction of a process' resident pages beyond
                     this amount (in bytes).  When memory is not under
                     pressure, this rlimit is effectively ignored.  Even when
                     there is memory pressure, the amount of available swap
                     space and some sysctl settings like vm.swap_enabled and
                     vm.swap_idle_enabled can affect what happens to processes
                     that have exceeded this size.

                     Processes that exceed their set RLIMIT_RSS are not
                     signalled or halted.  The limit is merely a hint to the
                     VM daemon to prefer to deactivate pages from processes
                     that have exceeded their set RLIMIT_RSS.
... then i looked for jemalloc config/options :)
 
  • Like
Reactions: mer
BTW obsigna – this static analyzer just helped me find a new "local" bug where the wrong pointer was free()d because of a simple stupid typo :cool:

It was indeed nice to see how it proved by assumptions and following conclusions how that "can't be correct", by finding an execution path that would lead to a double free. So, it's a pretty capable analyzer, I'll definitely use it again! Only thing I was saying above: it can't replace runtime analysis, and especially with "leaks", the structure is often such that a static analyzer just can't possibly detect it.
 
The analyzer quite frequently finds these „small mistakes“ which simply happen to everybody, and I use it allways before committing changes to the various repositories.

I do all coding also for FreeBSD in Xcode, and I do the runtime analysis with tools provided by Apple, which are very well designed and snappy to use. A few mouse clicks and you see the leeak with all the history how it came to this point. I don’t know what the corresponding tools ara on FreeBSD, and how useful these are, so I didn’t mention this at all.

I know that clang's static analyzer on FreeBSD got the same capabilities than that shipped with Xcode, you only need to open a web page on FreeBSD while it is integrated in Xcode, therefore I mentioned it, and specially because I knew that it is useful for finding „common“ leaks, i.e. allocations in a compilation unit which seem to never become free’d.

Here are two screenshots for runtime analysis within Xcode/Instruments of a huge commercial project of mine which is targeted to FreeBSD.
Bildschirmfoto 2021-11-22 um 11.47.18.pngBildschirmfoto 2021-11-22 um 12.04.17.png
No leaks so far :)
 
Back
Top