Trouble: bus errors and segfaults in libc.so (from postgres)

Recently I started to get crashes from my postgres instances:
Code:
Feb 2 03:52:57 LOG: background worker "parallel worker" (PID 79324) was terminated by signal 10: Bus error
Feb 2 14:48:20 LOG: server process (PID 97673) was terminated by signal 11: Segmentation fault
Feb 12 03:38:03 LOG: background worker "parallel worker" (PID 26340) was terminated by signal 10: Bus error

The three crashes happened on three different postgres clusters, ruling out an application induced issue. The machine is stable for years, currently at 14.3-RELEASE, no significant changes.

The second crash happened right between port upgrading and rebooting, and while the ports commit log showed that the upgrade did change only the packing list of postgres (and therefore the *_N freebsd revision, but not the binary), I ruled that one out for "spurious circumstances". From the first one the coredump was not retained, but the third one looks as such:

Code:
* thread #1, name = 'postgres', stop reason = signal SIGBUS
 * frame #0: 0x0000000829930ac3 libc.so.7`___lldb_unnamed_symbol5890 + 131
   frame #1: 0x000000082992da28 libc.so.7`___lldb_unnamed_symbol5865 + 504
   frame #2: 0x000000082992e889 libc.so.7`___lldb_unnamed_symbol5871 + 2617
   frame #3: 0x000000082990ca84 libc.so.7`___lldb_unnamed_symbol5446 + 644
   frame #4: 0x000000082990c6b7 libc.so.7`___lldb_unnamed_symbol5445 + 839
   frame #5: 0x0000000829952945 libc.so.7`___lldb_unnamed_symbol6064 + 21
   frame #6: 0x0000000829900013 libc.so.7`___lldb_unnamed_symbol5410 + 755
   frame #7: 0x00000000009c0577 postgres`AllocSetContextCreateInternal + 199
   frame #8: 0x00000000006d588c postgres`ExecAssignExprContext + 108
   frame #9: 0x00000000006faab9 postgres`ExecInitSeqScan + 73
   frame #10: 0x00000000006cf188 postgres`ExecInitNode + 248
   frame #11: 0x00000000006c8440 postgres`standard_ExecutorStart + 1056

AllocSetContextCreateInternal does something with memory allocation.
Given this I assumed a problem with the extra shared memory used for inter-process communication of shared workers. (Postgres allocates that memory differently, because it is fetched on demand and due to KASLR it will not have the same location everywhere, and therefore variable pointers must be recomputed).

Yesterday happend two more crashes:
Code:
Feb 28 04:38:36 LOG:  server process (PID 30492) was terminated by signal 10: Bus error
Feb 28 04:39:17 LOG:  server process (PID 31023) was terminated by signal 11: Segmentation fault

This is again two different and independent database clusters. These happening at almost the same time points to a common issue, probably hardware related.

More interesting are the backtraces:
Code:
* thread #1, name = 'postgres', stop reason = signal SIGBUS
  * frame #0: 0x000000082aad3ac3 libc.so.7`___lldb_unnamed_symbol5890 + 131
    frame #1: 0x000000082aad0a28 libc.so.7`___lldb_unnamed_symbol5865 + 504
    frame #2: 0x000000082aad1889 libc.so.7`___lldb_unnamed_symbol5871 + 2617
    frame #3: 0x000000082aaae44d libc.so.7`___lldb_unnamed_symbol5434 + 525
    frame #4: 0x000000082aad65f0 libc.so.7`___lldb_unnamed_symbol5920 + 400
    frame #5: 0x000000082aaa3177 libc.so.7`___lldb_unnamed_symbol5410 + 1111
    frame #6: 0x00000000009c76a7 postgres`GenerationAlloc + 311
    frame #7: 0x00000000009c88db postgres`palloc0 + 43
    frame #8: 0x000000000050d950 postgres`heap_form_minimal_tuple + 192
    frame #9: 0x00000000009d02ae postgres`copytup_heap + 62
    frame #10: 0x00000000009d293a postgres`tuplesort_puttupleslot + 58
    frame #11: 0x00000000006fb7fc postgres`ExecSort + 316
    frame #12: 0x00000000006c880b postgres`standard_ExecutorRun + 299

Code:
* thread #1, name = 'postgres', stop reason = signal SIGSEGV
  * frame #0: 0x000000082a5fbac3 libc.so.7`___lldb_unnamed_symbol5890 + 131
    frame #1: 0x000000082a5f8a28 libc.so.7`___lldb_unnamed_symbol5865 + 504
    frame #2: 0x000000082a5f9889 libc.so.7`___lldb_unnamed_symbol5871 + 2617
    frame #3: 0x000000082a5d7a84 libc.so.7`___lldb_unnamed_symbol5446 + 644
    frame #4: 0x000000082a5d76b7 libc.so.7`___lldb_unnamed_symbol5445 + 839
    frame #5: 0x000000082a61d945 libc.so.7`___lldb_unnamed_symbol6064 + 21
    frame #6: 0x000000082a5cb013 libc.so.7`___lldb_unnamed_symbol5410 + 755
    frame #7: 0x00000000009c0711 postgres`AllocSetAlloc + 49
    frame #8: 0x00000000009c8664 postgres`MemoryContextAllocExtended + 68
    frame #9: 0x000000000086ee3a postgres`pgstat_entry_ref_hash_create + 74
    frame #10: 0x000000000086e18a postgres`pgstat_get_entry_ref + 1930
    frame #11: 0x000000000086aebd postgres`pgstat_prep_pending_entry + 45
    frame #12: 0x000000000086c7c9 postgres`pgstat_assoc_relation + 57

Even without debugging symbols one can clearly see that this is three times the same location in libc.

The invocation paths are each different, but they are all somehow related with some memory allocation.

Now what to make of this?
How likely is it that a machine flaw would always only concern postgres, and always manifest at the very same location in libc?
 
Even without debugging symbols one can clearly see that this is three times the same location in libc.
Crash handlers? I mean, the process crashed (for whatever reason) and libc needs to clean up that dead process?
 
I copied the unstripped libc from /usr/obj into place, and it shows a little bit more:

Code:
* thread #1, name = 'postgres', stop reason = signal SIGBUS
  * frame #0: 0x000000082aad3ac3 libc.so.7`___lldb_unnamed_symbol6622 + 131
    frame #1: 0x000000082aad0a28 libc.so.7`___lldb_unnamed_symbol6614 + 504
    frame #2: 0x000000082aad1889 libc.so.7`__je_extent_alloc_wrapper + 2617
    frame #3: 0x000000082aaae44d libc.so.7`__je_arena_extent_alloc_large + 525
    frame #4: 0x000000082aad65f0 libc.so.7`__je_large_palloc + 400
    frame #5: 0x000000082aaa3177 libc.so.7`__je_malloc_default + 1111
    frame #6: 0x00000000009c76a7 postgres`GenerationAlloc + 311

Code:
* thread #1, name = 'postgres', stop reason = signal SIGSEGV
  * frame #0: 0x000000082a5fbac3 libc.so.7`___lldb_unnamed_symbol6622 + 131
    frame #1: 0x000000082a5f8a28 libc.so.7`___lldb_unnamed_symbol6614 + 504
    frame #2: 0x000000082a5f9889 libc.so.7`__je_extent_alloc_wrapper + 2617
    frame #3: 0x000000082a5d7a84 libc.so.7`___lldb_unnamed_symbol6308 + 644
    frame #4: 0x000000082a5d76b7 libc.so.7`__je_arena_tcache_fill_small + 839
    frame #5: 0x000000082a61d945 libc.so.7`__je_tcache_alloc_small_hard + 21
    frame #6: 0x000000082a5cb013 libc.so.7`__je_malloc_default + 755
    frame #7: 0x00000000009c0711 postgres`AllocSetAlloc + 49

I my understanding this says there was a call to malloc().
I still have to look into the code at je_extent_alloc_wrapper(). (The remaining two frames might be optimized away, or whatever.)

It might be the system was out of memory, but then still SIGBUS is not what should happen.
 
Hm, I think I found something. A few months ago I tried to optimize the NUMA memory allocation a bit to my needs (including things in the kernel ;) ). And it looked good so far - but this issue seems to trigger only (or mostly) when the machine is in serios paging - which only happens during portbuild, i.e. quarterly plus something.
With that taken into account, the problem aligns with the change.
 
And it wasn't my fault!

Neither is it related to NUMA.
I disabled all of this, and the crashes continue to happen.

Now Tom Lane came up with something interesting.

Here is a new backtrace with some more information:

Code:
* thread #1, name = 'postgres', stop reason = signal SIGBUS
  * frame #0: 0x00000008296ad173 libc.so.7`extent_try_coalesce_impl(tsdn=0x00003e616a889090, arena=0x00003e616aa00980, r_extent_hooks=0x0000000820c5be88, rtree_ctx=0x00003e616a8890c0, extents=0x00003e616aa058d8, extent=0x00003e616ab0ec00, coalesced=0x0000000000000000, growing_retained=<unavailable>, inactive_only=<unavailable>) at jemalloc_extent.c:0
    frame #1: 0x00000008296aa0d8 libc.so.7`extent_record [inlined] extent_try_coalesce(tsdn=0x00003e616a889090, arena=0x00003e616aa00980, r_extent_hooks=0x0000000820c5be88, rtree_ctx=0x00003e616a8890c0, extents=0x00003e616aa058d8, extent=0x00003e616ab0ec00, coalesced=0x0000000000000000, growing_retained=<unavailable>) at jemalloc_extent.c:1678:9
    frame #2: 0x00000008296aa0c3 libc.so.7`extent_record(tsdn=0x00003e616a889090, arena=0x00003e616aa00980, r_extent_hooks=0x0000000820c5be88, extents=0x00003e616aa058d8, extent=0x00003e616ab0ec00, growing_retained=<unavailable>) at jemalloc_extent.c:1717:12
    frame #3: 0x00000008296aaf39 libc.so.7`__je_extent_alloc_wrapper [inlined] extent_grow_retained(tsdn=0x00003e616a889090, arena=0x00003e616aa00980, r_extent_hooks=0x0000000820c5be88, size=32768, pad=4096, alignment=<unavailable>, slab=<unavailable>, szind=<unavailable>, zero=<unavailable>, commit=<unavailable>) at jemalloc_extent.c:1383:4
    frame #4: 0x00000008296aaef7 libc.so.7`__je_extent_alloc_wrapper [inlined] extent_alloc_retained(tsdn=0x00003e616a889090, arena=0x00003e616aa00980, r_extent_hooks=0x0000000820c5be88, new_addr=0x0000000000000000, size=32768, pad=4096, alignment=<unavailable>, slab=<unavailable>, szind=<unavailable>, zero=<unavailable>, commit=<unavailable>) at
+jemalloc_extent.c:1480:12
    frame #5: 0x00000008296aaef7 libc.so.7`__je_extent_alloc_wrapper(tsdn=0x00003e616a889090, arena=0x00003e616aa00980, r_extent_hooks=0x0000000820c5be88, new_addr=0x0000000000000000, size=32768, pad=4096, alignment=64, slab=<unavailable>, szind=40, zero=0x0000000820c5bedf, commit=0x0000000820c5be87) at jemalloc_extent.c:1539:21
    frame #6: 0x0000000829687afd libc.so.7`__je_arena_extent_alloc_large(tsdn=<unavailable>, arena=0x00003e616aa00980, usize=32768, alignment=<unavailable>, zero=0x0000000820c5bedf) at jemalloc_arena.c:448:12
    frame #7: 0x00000008296afca0 libc.so.7`__je_large_palloc(tsdn=0x00003e616a889090, arena=<unavailable>, usize=<unavailable>, alignment=64, zero=<unavailable>) at jemalloc_large.c:47:43
    frame #8: 0x00000008296afb02 libc.so.7`__je_large_malloc(tsdn=<unavailable>, arena=<unavailable>, usize=<unavailable>, zero=<unavailable>) at jemalloc_large.c:17:9 [artificial]
    frame #9: 0x000000082967c8a7 libc.so.7`__je_malloc_default [inlined] tcache_alloc_large(tsd=0x00003e616a889090, arena=<unavailable>, tcache=0x00003e616a889280, size=<unavailable>, binind=<unavailable>, zero=false, slow_path=false) at tcache_inlines.h:124:9
    frame #10: 0x000000082967c82c libc.so.7`__je_malloc_default [inlined] arena_malloc(tsdn=0x00003e616a889090, arena=0x0000000000000000, size=<unavailable>, ind=<unavailable>, zero=false, tcache=0x00003e616a889280, slow_path=false) at arena_inlines_b.h:169:11
    frame #11: 0x000000082967c818 libc.so.7`__je_malloc_default [inlined] iallocztm(tsdn=0x00003e616a889090, size=<unavailable>, ind=<unavailable>, zero=false, tcache=0x00003e616a889280, is_internal=false, arena=0x0000000000000000, slow_path=false) at jemalloc_internal_inlines_c.h:53:8
    frame #12: 0x000000082967c818 libc.so.7`__je_malloc_default [inlined] imalloc_no_sample(sopts=<unavailable>, dopts=<unavailable>, tsd=0x00003e616a889090, size=<unavailable>, usize=32768, ind=<unavailable>) at jemalloc_jemalloc.c:1953:9
    frame #13: 0x000000082967c818 libc.so.7`__je_malloc_default [inlined] imalloc_body(sopts=<unavailable>, dopts=<unavailable>, tsd=0x00003e616a889090) at jemalloc_jemalloc.c:2153:16
    frame #14: 0x000000082967c818 libc.so.7`__je_malloc_default [inlined] imalloc(sopts=<unavailable>, dopts=<unavailable>) at jemalloc_jemalloc.c:2262:10
    frame #15: 0x000000082967c74a libc.so.7`__je_malloc_default(size=<unavailable>) at jemalloc_jemalloc.c:2293:2
    frame #16: 0x00000000009c0577 postgres`AllocSetContextCreateInternal + 199
    frame #17: 0x000000000058e56c postgres`StartTransaction + 332
    frame #18: 0x000000000058e3be postgres`StartTransactionCommand + 30
    frame #19: 0x00000000009a82fd postgres`InitPostgres + 461
    frame #20: 0x00000000007b8e0d postgres`AutoVacWorkerMain + 765
    frame #21: 0x00000000007b8ac7 postgres`StartAutoVacWorker + 39
    frame #22: 0x00000000007c0c21 postgres`sigusr1_handler + 785
    frame #23: 0x0000000822ae79b6 libthr.so.3`handle_signal(actp=0x0000000820c5c600, sig=30, info=0x0000000820c5c9f0, ucp=0x0000000820c5c680) at thr_sig.c:318:3
    frame #24: 0x0000000822ae6eba libthr.so.3`thr_sighandler(sig=30, info=0x0000000820c5c9f0, _ucp=0x0000000820c5c680) at thr_sig.c:261:2
    frame #25: 0x00000008210662d3
    frame #26: 0x00000000007c2545 postgres`ServerLoop + 1605
    frame #27: 0x00000000007bffa3 postgres`PostmasterMain + 3251
    frame #28: 0x0000000000720601 postgres`main + 801
    frame #29: 0x000000082958015c libc.so.7`__libc_start1(argc=4, argv=0x0000000820c5d6b0, env=0x0000000820c5d6d8, cleanup=<unavailable>, mainX=(postgres`main)) at libc_start1.c:180:7
    frame #30: 0x00000000004ff4e4 postgres`_start + 36

And this is the commentary by Tom Lane.

What I get from it is:
fork() inside a signal handler might be a problem when subsequent malloc() do happen.

Anybody here know something about that?
 
What I get from it is:
fork() inside a signal handler might be a problem when subsequent malloc() do happen.

Anybody here know something about that?

fork(2) is legal to call from a signal handler. malloc(3) most certainly is not.

Are you saying the malloc is inside the signal handler?
 
You could try,
Code:
service postgresql stop
pg_checksums -D /usr/local/pgsql/data --check
service postgresql start

Otherwise dump & re-inport,
Code:
pg_dumpall -f full_backup.sql
 
fork(2) is legal to call from a signal handler. malloc(3) most certainly is not.

Are you saying the malloc is inside the signal handler?

That seems to depend on the viewpoint: with the fork() inside the signal handler, the entire new process might be perceived inside the signal handler - and that new process will certainly do malloc() occasionally.
 
That seems to depend on the viewpoint: with the fork() inside the signal handler, the entire new process might be perceived inside the signal handler - and that new process will certainly do malloc() occasionally.

Surely it does a return somewhere?
 
That again did lead to nowhere. I did now upgrade to PG r16, which does not use that signal-handler construct anymore.
Nevertheless the crashes continue to happen. This time it's gethostbyaddr():

Code:
* thread #1, name = 'postgres', stop reason = signal SIGBUS
  * frame #0: 0x000000082bba3159 libc.so.7`extent_arena_get [inlined] extent_arena_ind_get(extent=0x79f696918ed45a56) at extent_inlines.h:40:23
    frame #1: 0x000000082bba3159 libc.so.7`extent_arena_get(extent=0x79f696918ed45a56) at extent_inlines.h:49:23
    frame #2: 0x000000082bba3a14 libc.so.7`extent_can_coalesce(arena=0x00003d43fd800980, extents=0x00003d43fd8058d8, inner=0x00003d43fd90f080, outer=0x79f696918ed45a56) at jemalloc_extent.c:1565:6
    frame #3: 0x000000082bba363b libc.so.7`extent_try_coalesce_impl(tsdn=0x00003d43fd67a090, arena=0x00003d43fd800980, r_extent_hooks=0x0000000820af5198, rtree_ctx=0x00003d43fd67a0c0, extents=0x00003d43fd8058d8, extent=0x00003d43fd90f080,
+coalesced=0x0000000000000000, growing_retained=true, inactive_only=false) at jemalloc_extent.c:1628:24
    frame #4: 0x000000082bba3448 libc.so.7`extent_try_coalesce(tsdn=0x00003d43fd67a090, arena=0x00003d43fd800980, r_extent_hooks=0x0000000820af5198, rtree_ctx=0x00003d43fd67a0c0, extents=0x00003d43fd8058d8, extent=0x00003d43fd90f080,
+coalesced=0x0000000000000000, growing_retained=true) at jemalloc_extent.c:1680:9
    frame #5: 0x000000082bba055f libc.so.7`extent_record(tsdn=0x00003d43fd67a090, arena=0x00003d43fd800980, r_extent_hooks=0x0000000820af5198, extents=0x00003d43fd8058d8, extent=0x00003d43fd90f080, growing_retained=true) at jemalloc_extent.c:1719:12
    frame #6: 0x000000082bba6043 libc.so.7`extent_grow_retained(tsdn=0x00003d43fd67a090, arena=0x00003d43fd800980, r_extent_hooks=0x0000000820af5198, size=65536, pad=4096, alignment=64, slab=false, szind=44, zero=0x0000000820af51ef,
+commit=0x0000000820af5197) at jemalloc_extent.c:1385:4
    frame #7: 0x000000082bba0f3f libc.so.7`extent_alloc_retained(tsdn=0x00003d43fd67a090, arena=0x00003d43fd800980, r_extent_hooks=0x0000000820af5198, new_addr=0x0000000000000000, size=65536, pad=4096, alignment=64, slab=false, szind=44,
+zero=0x0000000820af51ef, commit=0x0000000820af5197) at jemalloc_extent.c:1482:12
    frame #8: 0x000000082bba0d39 libc.so.7`__je_extent_alloc_wrapper(tsdn=0x00003d43fd67a090, arena=0x00003d43fd800980, r_extent_hooks=0x0000000820af5198, new_addr=0x0000000000000000, size=65536, pad=4096, alignment=64, slab=false, szind=44,
+zero=0x0000000820af51ef, commit=0x0000000820af5197) at jemalloc_extent.c:1541:21
    frame #9: 0x000000082bb7a87d libc.so.7`__je_arena_extent_alloc_large(tsdn=<unavailable>, arena=0x00003d43fd800980, usize=65536, alignment=<unavailable>, zero=0x0000000820af51ef) at jemalloc_arena.c:448:12
    frame #10: 0x000000082bba77b0 libc.so.7`__je_large_palloc(tsdn=0x00003d43fd67a090, arena=<unavailable>, usize=<unavailable>, alignment=64, zero=<unavailable>) at jemalloc_large.c:47:43
    frame #11: 0x000000082bba7612 libc.so.7`__je_large_malloc(tsdn=<unavailable>, arena=<unavailable>, usize=<unavailable>, zero=<unavailable>) at jemalloc_large.c:17:9 [artificial]
    frame #12: 0x000000082bb7c477 libc.so.7`__je_arena_malloc_hard(tsdn=<unavailable>, arena=<unavailable>, size=<unavailable>, ind=<unavailable>, zero=<unavailable>) at jemalloc_arena.c:1528:9 [artificial]
    frame #13: 0x000000082bb6f5a7 libc.so.7`__je_malloc_default [inlined] arena_malloc(tsdn=0x00003d43fd67a090, arena=0x0000000000000000, size=<unavailable>, ind=<unavailable>, zero=false, tcache=0x00003d43fd67a280, slow_path=false) at
+arena_inlines_b.h:176:9
    frame #14: 0x000000082bb6f598 libc.so.7`__je_malloc_default [inlined] iallocztm(tsdn=0x00003d43fd67a090, size=<unavailable>, ind=<unavailable>, zero=false, tcache=0x00003d43fd67a280, is_internal=false, arena=0x0000000000000000, slow_path=false) at
+jemalloc_internal_inlines_c.h:53:8
    frame #15: 0x000000082bb6f598 libc.so.7`__je_malloc_default [inlined] imalloc_no_sample(sopts=<unavailable>, dopts=<unavailable>, tsd=0x00003d43fd67a090, size=<unavailable>, usize=65536, ind=<unavailable>) at jemalloc_jemalloc.c:1953:9
    frame #16: 0x000000082bb6f598 libc.so.7`__je_malloc_default [inlined] imalloc_body(sopts=<unavailable>, dopts=<unavailable>, tsd=0x00003d43fd67a090) at jemalloc_jemalloc.c:2153:16
    frame #17: 0x000000082bb6f598 libc.so.7`__je_malloc_default [inlined] imalloc(sopts=<unavailable>, dopts=<unavailable>) at jemalloc_jemalloc.c:2262:10
    frame #18: 0x000000082bb6f4ca libc.so.7`__je_malloc_default(size=<unavailable>) at jemalloc_jemalloc.c:2293:2
    frame #19: 0x000000082bb6fa2d libc.so.7`__malloc(size=<unavailable>) at jemalloc_jemalloc.c:0 [artificial]
    frame #20: 0x000000082bad08a4 libc.so.7`_dns_gethostbyaddr(rval=0x0000000820af5a90, cb_data=<unavailable>, ap=<unavailable>) at gethostbydns.c:619:13
    frame #21: 0x000000082badeab2 libc.so.7`_nsdispatch(retval=0x0000000820af5a90, disp_tab=0x000000082bbd8800, database="", method_name="", defaults=<unavailable>) at nsdispatch.c:726:14
    frame #22: 0x000000082bad2be8 libc.so.7`gethostbyaddr_r(addr=0x0000000820af5ae0, len=<unavailable>, af=<unavailable>, hp=0x000000082bbebda0, buf="", buflen=8800, result=0x0000000820af5a90, h_errnop=0x0000000820af5a8c) at gethostnamadr.c:650:9
    frame #23: 0x000000082bad34f9 libc.so.7`gethostbyaddr(addr=0x0000000820af5ae0, len=16, af=28) at gethostnamadr.c:700:6
    frame #24: 0x000000082baddcd8 libc.so.7`getipnodebyaddr(src=0x0000000820af5ae0, len=<unavailable>, af=28, errp=0x0000000820af5b50) at name6.c:378:7
    frame #25: 0x000000082bad4242 libc.so.7`getnameinfo_inet(afd=0x000000082bbd8980, sa=0x00003d43fda5e098, salen=<unavailable>, host=<unavailable>, hostlen=<unavailable>, serv=<unavailable>, servlen=0, flags=4) at getnameinfo.c:311:8
    frame #26: 0x000000082bad405d libc.so.7`getnameinfo(sa=<unavailable>, salen=<unavailable>, host=<unavailable>, hostlen=<unavailable>, serv=<unavailable>, servlen=<unavailable>, flags=4) at getnameinfo.c:157:10
    frame #27: 0x0000000000a85081 postgres`pg_getnameinfo_all + 177
    frame #28: 0x0000000000774262 postgres`hba_getauthmethod + 1202
    frame #29: 0x000000000076a412 postgres`ClientAuthentication + 50
    frame #30: 0x0000000000a49fd1 postgres`InitPostgres + 2273
    frame #31: 0x00000000008eac4d postgres`PostgresMain + 285
    frame #32: 0x0000000000857108 postgres`BackendRun + 40
    frame #33: 0x0000000000855a1a postgres`ServerLoop + 7866
    frame #34: 0x000000000085300e postgres`PostmasterMain + 3278
    frame #35: 0x000000000077bac3 postgres`main + 803
    frame #36: 0x000000082ba72edc libc.so.7`__libc_start1(argc=4, argv=0x0000000820af8700, env=0x0000000820af8728, cleanup=<unavailable>, mainX=(postgres`main)) at libc_start1.c:180:7
    frame #37: 0x0000000000556de4 postgres`_start + 36]/code]
 
take it down and do a memtest, or clone the installation to another piece of hardware to see if it blows up similarly.
 
In addition to potential memory problems, I wouldn't rule out disk problems. I'd keep reliable, up to date backups if I were you and run a few S.M.A.R.T. tests and keep an eye for any sudden CAM status errors/failure messages ( tail -f /var/log/messages).
 
take it down and do a memtest,
Hm. I thought the purpose of the memory RAS features was to not need to take the machine down, but to have automatic detection of defective memory (the BIOS of the machine shows support for the (Xeon E5) advanced ECC Lockstep, Sparing and Mirroring modes, and while my memory configuration may not be symmentrical enough for these features to work, I was assuming that it should at least report corrected errors (and, even more, uncorrected ones).

There may still be a hardware problem - but then I do not think it is a memory chip issue.

or clone the installation to another piece of hardware to see if it blows up similarly.
Up to now it does not even happen on the similarly configured neighbour jail on the same machine.
 
In addition to potential memory problems, I wouldn't rule out disk problems. I'd keep reliable, up to date backups if I were you and run a few S.M.A.R.T. tests and keep an eye for any sudden CAM status errors/failure messages ( tail -f /var/log/messages).
Sorry, but that doesn't really help. I am trying to pinpoint this to a possible cause, and while some aspects actually speak for a hardware issue (e.g. that it started to appear out of thin air at the begin of Februrary, that it seems to happen more frequently over time, that it appears more likely during high memory demand but not strictly adheres to that condition, and that it occasionally happens to multiple independently running programs at almost the same time), other aspects make a hardware defect rather unlikely (e.g. that it crashes always at the same code line with the same error, in only one and the same application, in only one and the same jail). I might assume that jemalloc is used by a lot of prgrams, so then why is only one installation of one program having problems?

What I am doing with the data is an entirely different matter. Actually data is copied out to Neuschwabenland, Antarctica every night automatically. The machine has currently some 22 disks attached, and these are daily monitored and (in case of mechanical devices) run surface analysis every three weeks (for SSD, surface analysis has no benefit).
 
ECC memory can still fail. Are you looking at the syslog?

Not to mention CPU or chipset failure.

It's also SIGBUS, not SIGSEGV.
 
ECC memory can still fail. Are you looking at the syslog?
Yes, everything can fail, and there also is Murphy's law. But it should either report an error or halt the machine then. There are messages when it still can fix the error, and a checkstop when it can no longer. And yes, I'm reading that syslog (at least when something does misbehave).

There can certainly be an error that is somehow different to those errors that are covered by the error recovery. But there is also probability, and the improbability level seems quite high already - unless one considers improbability as syncrhonistically meaningful ;)

Not to mention CPU or chipset failure.

Let's see. The chipset is common to all tasks, but it mainly bothers i/o, not memory. The CPU I can choose with cpuset, and also the on-socket memory-controller and consequentially the physical memstick.

It's also SIGBUS, not SIGSEGV.

What You mean?
Maybe I don't fully understand this, but what I have learned is: SIGSEGV means the process does memory access to a place where it isn't allowed to. And SIGBUS means the process does memory access to some place where memory is not even supposed to be.

In this case:

frame #1: 0x000000082982c159 libc.so.7`extent_arena_get(extent=0xe678eaf4d86f3741) at extent_inlines.h:49:23
frame #1: 0x000000082bba3159 libc.so.7`extent_arena_get(extent=0x79f696918ed45a56) at extent_inlines.h:49:23
frame #1: 0x0000000829521159 libc.so.7`extent_arena_get(extent=0x4d26ee6c052a0000) at extent_inlines.h:49:23

It seems to me that this 'extent' address happens to be more-or-less random bits, which in some case might result in SIGBUS and others SIGSEGV - and here in the thread in the first message is example for both from the same line of code.

But that all does not yet determine one or the other cause.
For now I am going to re-enable NUMA, and swap the CPU sockets between the jail that does and the one that does not crash...
 
- Can you move everything to use of one filesystem.
- One easy method of authentication/security.

Another way of thinking of problem solving can sometimes be remove complexity.

And if software is important , but is virtualised/emulated moving to other place/hardware is piece of cake.
 
The crashes continue to happen. It killed my sunday backups in midflight.

After upgrading to 14.4 and postgres to r16, I have now moved compute to the other socket and memory to different controllers and channels, and the crash is still the same:

Code:
* thread #1, name = 'postgres', stop reason = signal SIGBUS
  * frame #0: 0x000000082baf5159 libc.so.7`extent_arena_get [inlined] extent_arena_ind_get(extent=0xbcf4419bcf298ee9) at extent_inlines.h:40:23
    frame #1: 0x000000082baf5159 libc.so.7`extent_arena_get(extent=0xbcf4419bcf298ee9) at extent_inlines.h:49:23
    frame #2: 0x000000082baf5a14 libc.so.7`extent_can_coalesce(arena=0x00001552a60008c0, extents=0x00001552a6005818, inner=0x00001552a600a1c0, outer=0xbcf4419bcf298ee9) at jemalloc_extent.c:1565:6
    frame #3: 0x000000082baf563b libc.so.7`extent_try_coalesce_impl(tsdn=0x00001552a347a090, arena=0x00001552a60008c0, r_extent_hooks=0x000000082147d7d8, rtree_ctx=0x00001552a347a0c0, extents=0x00001552a6005818, extent=0x00001552a600a1c0, coalesced=0x0000000000000000, growing_retained=true, inactive_only=false) at jemalloc_extent.c:1628:24

One can see the arena pointers are now at some other banks of memory, but there is that same stray pointer outer=0xbcf4419bcf298ee9. It appears when calling extent_can_coalesce() as a value formerly returned by extent_lock_from_addr():

Code:
        extent_t *next = extent_lock_from_addr(tsdn, rtree_ctx,
            extent_past_get(extent), inactive_only);
        if (next != NULL) {
            /*
             * extents->mtx only protects against races for
             * like-state extents, so call extent_can_coalesce()
             * before releasing next's pool lock.
             */
            bool can_coalesce = extent_can_coalesce(arena, extents,
                extent, next);

Still in extent.c:

Code:
static extent_t *
extent_lock_from_addr(tsdn_t *tsdn, rtree_ctx_t *rtree_ctx, void *addr,
    bool inactive_only) {
    extent_t *ret = NULL;
    rtree_leaf_elm_t *elm = rtree_leaf_elm_lookup(tsdn, &extents_rtree,
        rtree_ctx, (uintptr_t)addr, false, false);
...

Then in rtree.h:

Code:
JEMALLOC_ALWAYS_INLINE rtree_leaf_elm_t *
rtree_leaf_elm_lookup(tsdn_t *tsdn, rtree_t *rtree, rtree_ctx_t *rtree_ctx,
    uintptr_t key, bool dependent, bool init_missing) {
    assert(key != 0);
    assert(!dependent || !init_missing);

    size_t slot = rtree_cache_direct_map(key);
    uintptr_t leafkey = rtree_leafkey(key);
    assert(leafkey != RTREE_LEAFKEY_INVALID);

    /* Fast path: L1 direct mapped cache. */
    if (likely(rtree_ctx->cache[slot].leafkey == leafkey)) {
        rtree_leaf_elm_t *leaf = rtree_ctx->cache[slot].leaf;
        assert(leaf != NULL);
        uintptr_t subkey = rtree_subkey(key, RTREE_HEIGHT-1);
        return &leaf[subkey];
    }
...

uintptr_t key is produced as extent_past_get(extent), these are functions in extent_inlines.h,
basically the sum of extent->e_addr and extent->e_size_esn:

Code:
(lldb) p *extent
(extent_t) {
  e_bits = 17592246919328
  e_addr = 0x00001552a8801000
   = (e_size_esn = 6287360, e_bsize = 6287360)

Code:
#define EXTENT_SIZE_MASK    ((size_t)~(PAGE-1))
#define PAGE_ADDR2BASE(a)   ((void *)((uintptr_t)(a) & ~PAGE_MASK))
#define PAGE_MASK           ((size_t)(PAGE - 1))
#define PAGE                ((size_t)(1U << LG_PAGE))
#define   LG_PAGE                 PAGE_SHIFT
#define PAGE_SHIFT 12

So key is 0x00001552a8e00000.

rtree_cache_direct_map() (in rtree.h) does compute from various #defines:

Code:
JEMALLOC_ALWAYS_INLINE size_t
rtree_cache_direct_map(uintptr_t key) {
    unsigned ptrbits = ZU(1) << (LG_SIZEOF_PTR+3);
    unsigned cumbits = (rtree_levels[RTREE_HEIGHT-1].cumbits -
        rtree_levels[RTREE_HEIGHT-1].bits);
    unsigned maskbits = ptrbits - cumbits;
    return (size_t)((key >> maskbits) & (RTREE_CTX_NCACHE - 1));
}

Code:
#define ZU(z)    ((size_t)z)
#define LG_SIZEOF_PTR 3           # for amd64

typedef struct rtree_level_s rtree_level_t;
struct rtree_level_s {
    unsigned        bits;
    unsigned        cumbits;
};

static const rtree_level_t rtree_levels[] = {
#if RTREE_HEIGHT == 1
    {RTREE_NSB, RTREE_NHIB + RTREE_NSB}
#elif RTREE_HEIGHT == 2
    {RTREE_NSB/2, RTREE_NHIB + RTREE_NSB/2},
    {RTREE_NSB/2 + RTREE_NSB%2, RTREE_NHIB + RTREE_NSB}
#elif RTREE_HEIGHT == 3
    {RTREE_NSB/3, RTREE_NHIB + RTREE_NSB/3},
    {RTREE_NSB/3 + RTREE_NSB%3/2,
        RTREE_NHIB + RTREE_NSB/3*2 + RTREE_NSB%3/2},
    {RTREE_NSB/3 + RTREE_NSB%3 - RTREE_NSB%3/2, RTREE_NHIB + RTREE_NSB}
...

#if RTREE_NSB <= 10
#  define RTREE_HEIGHT 1
#elif RTREE_NSB <= 36
#  define RTREE_HEIGHT 2
#elif RTREE_NSB <= 52
#  define RTREE_HEIGHT 3

#define RTREE_NSB      (LG_VADDR - RTREE_NLIB)
#define RTREE_NLIB     LG_PAGE               
#define RTREE_NHIB ((1U << (LG_SIZEOF_PTR+3)) - LG_VADDR)

#ifdef _USE_LG_VADDR_WIDE
#  define LG_VADDR              64
#else
#  define LG_VADDR              48
#endif

#define RTREE_CTX_NCACHE (1 << RTREE_CTX_LG_NCACHE)   
#define RTREE_CTX_LG_NCACHE 4

Here it gets somehow interesting: _USE_LG_VADDR_WIDE got changed per https://reviews.freebsd.org/D46686

We get ptrbits = 64, cumbits = 34, maskbits = 30, and return 0xa, so (back up in rtree_leaf_elm_lookup()) slot = 0xa.

rtree_leafkey() (also in rtree.h) does mostly the same compute again:

Code:
JEMALLOC_ALWAYS_INLINE uintptr_t
rtree_leafkey(uintptr_t key) {
    unsigned ptrbits = ZU(1) << (LG_SIZEOF_PTR+3);
    unsigned cumbits = (rtree_levels[RTREE_HEIGHT-1].cumbits -
        rtree_levels[RTREE_HEIGHT-1].bits);
    unsigned maskbits = ptrbits - cumbits;
    uintptr_t mask = ~((ZU(1) << maskbits) - 1);
    return (key & mask);
}

We get ptrbits = 64, cumbits = 34, maskbits = 30, mask = 2^30-1, and return 0x155280000000, which becomes (back up in rtree_leaf_elm_lookup()) leafkey.
Code:
(lldb) p rtree_ctx->cache[0xa]
(rtree_ctx_cache_elm_t) {
  leafkey = 23444078985216
  leaf = 0x00001552a3a000c0
}

0x155280000000 equals 23444078985216, enter the if-clause and get leaf = 0x00001552a3a000c0

rtree_subkey() (in rtree.h) does again compute from #defines:

Code:
JEMALLOC_ALWAYS_INLINE uintptr_t
rtree_subkey(uintptr_t key, unsigned level) {
    unsigned ptrbits = ZU(1) << (LG_SIZEOF_PTR+3);
    unsigned cumbits = rtree_levels[level].cumbits;
    unsigned shiftbits = ptrbits - cumbits;
    unsigned maskbits = rtree_levels[level].bits;
    uintptr_t mask = (ZU(1) << maskbits) - 1;
    return ((key >> shiftbits) & mask);
}

ptrbits = 64, cumbits = 52, shiftbits = 12, maskbits = 18, mask = 2^18-1, and (uplevel) subkey becomes 0x28e00.

Finally, the return value of rtree_leaf_elm_lookup() is &leaf[subkey], and looking at this:

Code:
(lldb) p ((rtree_leaf_elm_t *)(0x00001552a3a000c0))[0x28e00]
(rtree_leaf_elm_t) {
  le_extent = (repr = 0xbcf4419bcf298ee9)
  le_szind = (repr = 2614068281)
  le_slab = (repr = true)
}

Here is the crappy pointer! So at least, following the code with all the computations was correctly done.

But what to make of this now? I have no idea how jemalloc is supposed to work; to me an array index of 0x28e00 looks a bit big, but then again this is designed for la57 5-level paging, so things might be big.

Looking at the whole span of that array, most of it consists of 0-bytes, while some areas do indeed contain proper values for rtree_leaf_elm_t. Looking closer:

Code:
from 0x1552a3a000c0 to 0x1552a3c380b0   0-bytes                                 145408 recs 568 pg
from 0x1552a3c380c0 to 0x1552a3c3a0b0   valid struct mixed w/ some 0-bytes         512 recs   2 pg
from 0x1552a3c3a0c0 to 0x1552a3c400b0   0-bytes                                   1536 recs   6 pg
from 0x1552a3c400c0 to 0x1552a3c540d0   valid struct mixed w/ some 0-bytes        5122 recs
from 0x1552a3c540e0 to 0x1552a3c56ff0   0-bytes                                    754 recs
from 0x1552a3c57000 to 0x1552a3c57ff0   dense garbage                              256 recs   1 pg
from 0x1552a3c58000 to 0x1552a3c69ff0   3 valid structs, otherwise 0-bytes        4608 recs  18 pg
from 0x1552a3c6a000 to 0x1552a3c6aff0   dense garbage                              256 recs   1 pg
from 0x1552a3c6b000 to 0x1552a3c8dff0   3 valid structs, otherwise 0-bytes        8960 recs  35 pg
from 0x1552a3c8e000 to 0x1552a3c8eff0   dense garbage                              256 recs   1 pg
from 0x1552a3c8f000 to 0x1552a3d0d4b0   0-bytes
no idea how long this is supposed to be
 
Back
Top