Samba smbd abnormally high startup memory usage

ultramage · Aug 5, 2025

It took me a long time to notice, but smbd is showing hundreds of megabytes of memory usage per process, even at startup and with a blank config file. I've used it for a long time and don't remember it ever standing out, so it must have been in the 30MB range, not the 3x 250 it is now.

freshly installed freebsd 14.3 vm, before and after starting 4.19 or 4.20 smbd:

Code:

Mem:  10M Active,  860K Inact, 201M Wired, 25M Buf, 3740M Free
Mem: 310M Active, 2424K Inact, 927M Wired, 38M Buf, 2712M Free
SIZE RES
292M 485M smbd --daemon
251M 407M smbd: notifyd
251M 407M smbd: cleanupd

for comparison, new linux ubuntu 24 lts vm with smbd 4.19.5:

Code:

Mem: 3824.7 total, 3350.8 free, 475.3 used, 217.8 buff
Mem: 3824.7 total, 3337.2 free, 481.8 used, 228.1 buff
VIRT RES SHR
 88M 25M 22M smbd
 85M  6M  3M smbd-notifyd
 85M  6M  2M smbd-cleanupd

This is what I'd expect the numbers to look, and I'm pretty sure this is what they looked like at some point. I have no idea what caused the change or how to determine its cause. The freebsd port has 47 patches, so it could be one of them. Could also be bad code in the codebase.

cracauer@ · Aug 5, 2025

I find the virtual size difference to be even more puzzling.

/proc/$pid/map might have more clues on FreeBSD, /proc/$$/maps on Linux.

ultramage · Aug 5, 2025

The vnode-backed entries add up to 39MB. The swap-backed entries are 43.5MB. There's only one other that stands out...

Code:

( 36kB) 0x2f24bd8a3000 0x2f24bd8ac000  9 0 0xfffff8000307c000 r--  6 0 0x1000  COW  NC vnode /usr/local/sbin/smbd NCH -1
( 28kB) 0x2f24bd8ac000 0x2f24bd8b3000  7 0 0xfffff8000307c000 r-x  6 0 0x1000  COW  NC vnode /usr/local/sbin/smbd NCH -1
(  4kB) 0x2f24bd8b3000 0x2f24bd8b4000  1 0 0xfffff8002ca14528 r--  3 0 0x0210  COW  NC swap  - CH 0
(  4kB) 0x2f24bd8b4000 0x2f24bd8b5000  1 0 0xfffff800a5abe528 rw-  1 0 0x3310  COW NNC swap  - CH 0
(511.875MB) 0x2f2cbde9c000 0x2f2cdde7c000 0 0 0               ---  0 0 0x0000 NCOW NNC none  - NCH -1
(128kB) 0x2f2cdde7c000 0x2f2cdde9c000 10 0 0xfffff800d1f35420 rw-  1 0 0x3310  COW NNC swap  - CH 0
(  4kB) 0x2f2cde70d000 0x2f2cde70e000  1 0 0xfffff80003d71e70 r-x 25 0 0x6000 NCOW NNC phys  - NCH -1
( 36kB) 0x2f2cdec0f000 0x2f2cdec18000  9 0 0xfffff8002cbed000 r-- 20 6 0x1000  COW  NC vnode /usr/local/lib/samba4/private/libsecrets3-private-samba.so NCH -1
...

Meanwhile the linux proc dump is simple, almost everything is .so files, 500kB heap, a small stack, that's about it.

So, this thing. It appears very early, after a 32GB gap from the main executable. It's 512MB minus 128kB, followed by 128kB and then a single 'phys' (guard page?), followed by the initial executable mappings with no gaps. It is non-swappable, non-accessible and has no kernel obj. This makes me wonder if this is some botched DATA segment force-fed to the linker. Maybe I should dissect sbin/smbd...

cracauer@ · Aug 5, 2025

It would also be an interesting data collection point to run the Linux binary on FreeBSD and see how that behaves.

Andriy · Aug 6, 2025

cracauer@ said:
/proc/$pid/map

procstat vm

ultramage · Aug 6, 2025

That does produce a slightly more useful listing.
Here it is in full in case you need it (5MB download): https://www.easypaste.org/file/oBCh7Ub2/procstat.txt
The strange 512MB block identifies as 'gd' (guard VM object pseudo-type).
There are 60,000 lines identifying with path 'posixshm@anon'. Proc map last time only listed 10,000 of them. They're each a single 4kB page. Some are clustered, some have up to 128MB gaps between each other. I don't know what this is either, but this would add up to 240MB total which seems to correspond to the process SIZE.

cracauer@ · Aug 6, 2025

Can you post a /proc/$$/smaps from Linux, to compare resident parts of mappings?

Also, it would be good to use an upload site that doesn't require downloading. Plain text files should just display in the browser.

ultramage · Aug 6, 2025

Here is the linux equivalent proc map: https://pastebin.com/raw/yWC6KsTS
I have really tried but I couldn't find any paste site that would accept a 5MB blob, wouldn't completely freeze the browser for minutes, and at the same time allow viewing of the raw text. I also did not attempt to paste it directly into the thread as a code block.

cracauer@ · Aug 6, 2025

That's maps, not smaps.

/proc/$$/smaps on Linux contains the resident information per mapping.

ultramage · Aug 6, 2025

Sorry, I didn't realize it was something else. Here's the output: https://pastes.io/raw/smaps-4408

cracauer@ · Aug 6, 2025

I have no idea what SMBd is doing with shared memory. And I can't find references to it in the samba source. Maybe the FreeBSD port pulls in a library that does it?

syscall tracing and then breakpointing on the posixshm usage would be the next step.

cracauer@ · Aug 6, 2025

The 30960 shared memory segments I have here are created by calls to get process-shared locks.

_umtx_op(0x0,UMTX_OP_SHM,0x4,0x245d5c7443b8,0x0) ERR#3 'No such process'
_umtx_op(0x0,UMTX_OP_SHM,0x1,0x245d5c7443b8,0x0) = 15 (0xf) mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_SHARED,15,0x0) = 39988531408896 (0x245e8e
close(15) = 0 (0x0)

ultramage · Aug 15, 2025

Thanks, I think I got something. The above syscalls happen in __thr_pshared_offpage().

Code:

Catchpoint 1 (call to syscall _umtx_op), _umtx_op () at _umtx_op.S:4
#0  _umtx_op () at _umtx_op.S:4
#1  in pshared_clean at /usr/src/lib/libthr/thread/thr_pshared.c:211
#2  in __thr_pshared_offpage at /usr/src/lib/libthr/thread/thr_pshared.c:234
#3  in __Tthr_mutex_init at /usr/src/lib/libthr/thread/thr_mutex.c:397
#4  in tdb_mutex_init at ../../common/mutex.c:591
#5  in tdb_new_database at ../../common/open.c:146
#6  in tdb_open_ex (name="/var/db/samba4/smbXsrv_version_global.tdb", hash_size=131, tdb_flags=6145, open_flags=1049090, mode=384, log_ctx=0x7fffffffe178, hash_fn=0x0) at ../../common/open.c:489
#7  in tdb_wrap_private_open at ../../lib/tdb_wrap/tdb_wrap.c:109
#8  in tdb_wrap_open at ../../lib/tdb_wrap/tdb_wrap.c:158
#9  in db_open_tdb at ../../lib/dbwrap/dbwrap_tdb.c:484
#10 in dbwrap_local_open at ../../lib/dbwrap/dbwrap_local_open.c:38
#11 in db_open at ../../source3/lib/dbwrap/dbwrap_open.c:190
#12 in smbXsrv_version_global_init at ../../source3/smbd/smbXsrv_version.c:81
#13 in main at ../../source3/smbd/server.c:1949

Of particular interest is this loop in tdb_mutex_init():

Code:

     for (i=0; i<tdb->hash_size+1; i++) {
         pthread_mutex_t *chain = &m->hashchains[i];
         ret = pthread_mutex_init(chain, &ma);
         if (ret != 0) {
             goto fail;
         }
     }

So each tdb creates an array of mutexes, presumably to facilitate parallel operations. Unfortunately, there's 60k of them...

Code:

smbXsrv_version_global.tdb : 131
smbXsrv_client_global.tdb : 131
smbXsrv_session_global.tdb : 10007
smbXsrv_tcon_global.tdb : 10007
brlock.tdb : 10007
locking.tdb : 10007
leases.tdb : 10007
~~~gencache.tdb : 10000~~~ (not posix mutexes)
smbXsrv_open_global.tdb : 10007

Each mutex structure is 96 bytes, but libthr uses mmap(page_size), so the 5.5MB of data gets stretched onto 235MB of allocations. And this memory seems to not be sharable between forks, so this memory load is multiplied (3 + numClients) times. DEFAULT_HASH_SIZE is 131, SMBD_VOLATILE_TDB_HASH_SIZE is 10007.
At this point I'd need more info to determine whether this behavior is intended, and freebsd just does it poorly (supposedly glibc pthreads use efficiently stored futexes?), or if there's some malfunction or bad patch.

cracauer@ · Aug 15, 2025

Nice detective work.

ultramage · Aug 16, 2025

Well, at the very least it allows me to do a quick and dirty edit when using bundled libs:

Code:

lib/tdb/common/tdb_private.h
-#define DEFAULT_HASH_SIZE 131
+#define DEFAULT_HASH_SIZE 1

source3/include/local.h
 /* tdb hash size for the databases having one entry per open file. */
-#define SMBD_VOLATILE_TDB_HASH_SIZE 10007
+#define SMBD_VOLATILE_TDB_HASH_SIZE 1

It is unclear what the full consequences of such a change are, and that code comment doesn't seem right, but whatever.

Code:

Mem:   9M Active,  672K Inact, 200M Wired, 24M Buf, 3741M Free
Mem:  55M Active, 1492K Inact, 256M Wired, 31M Buf, 3635M Free
SIZE RES
 53M  20M smbd --daemon
 51M  19M smbd: notifyd
 51M  19M smbd: cleanupd

These are the new numbers in a clean vm with samba420 partial source build and ~~an empty config~~ a config with just wins support = yes (saves 50M for unknown reason). On my mini server with a more custom source build and a proper config, it's even less, 40M/30M for all 3 instances. At least this confirms that the tdb mutex allocation is indeed the sole factor. As to why this wasn't an issue before, maybe samba3 used a single global mutex or something. I'm still lacking a lot of info. Maybe I should open a bugreport for this.

ultramage · Aug 16, 2025

Next I had the idea to check the samba revision history for these defines.

Most of the above issue manifested after Dec 2022 change https://github.com/samba-team/samba/commit/87fddbad which "cranked up the hashsize" by applying it to 4 additional databases. The comment says by going with the much larger hash size we get O(1) lookup instead of O(n), giving me the realization that tdb is an in-memory chained hash table, the hash size controls the number of hash buckets (that's why both are prime numbers), and that they went with each bucket having its own mutex. So my patch is only really good for my light single-user activity use case. Maybe switching everything to 131 (= 4MB allocation overhead) would be a balanced enough move.

Code:

#define SMBD_VOLATILE_TDB_HASH_SIZE DEFAULT_HASH_SIZE

Regarding freebsd, "process-shared locks support" was added as a one-and-done deal in 2016 in https://github.com/freebsd/freebsd-src/commit/1bdbd705 (not touched since). From the libthr code it seems all mutexes are private by default and require an attribute change to become shared. In libtdb this happens just before the mutexes are initialized. So they should be only getting allocated once, and then mapped to other forks for free.

I also revisited the oddity that is gencache.tdb. Its hash size is 10000 which is not a prime number. It is configurable via gencache:hash_size but changing it to 1 has no noticeable impact on memory usage. It seems to have to do with its tdb flags, which include TDB_MUTEX_LOCKING, but without the required only with tdb >= 1.3.0 and TDB_CLEAR_IF_FIRST flag, resulting instead in the mmap allocation of an array of 8-byte posix_mutex_t handles. I was incapable of following the spaghetti code further, but because the lack of observed excess allocation, it must use some sort of fallback mechanism. Whether any of this is intended, or a result of bad coding, is unclear.

ultramage · Aug 18, 2025

I have read that supposedly tdb is a fancy volatile construct that operates purely on the mmap'd disk file and uses offset calculations instead of pointers to get around. What I then do not get is where all this memory usage is coming from. I inflated volatile tdb hash size from 10000 to 100000 'to render the effect visible to the naked eye':

Code:

Mem:   11M Active,  848K Inact,  240M Wired, 24M Buf, 11640M Free
service samba_server onestart
service samba_server onestop
Mem:    9M Active, 1012K Inact,  801M Wired, 38M Buf,   ~11G Free
service samba_server onestart
Mem: 2417M Active, 2072K Inact, 6836M Wired, 38M Buf,  2397M Free
kill <notifyd>
Mem: 2417M Active, 2284K Inact, 4962M Wired, 38M Buf,  4265M Free
(no change if cleanupd is killed, presuming that that one forks cleanly)

So it seems 600k mutexes, which should have a raw overhead of 2344MB, cost around 8600MB across the three processes. Killing the subprocess temporarily reveals a split of 6735+1868.
Observing smbd as it starts, I see that for 28s at 1.00 load it allocates (+1967M active / +2824M wired), then it forks and the system immediately gets (+0M active / +1573M wired), then it continues for another 17s to allocate a further (+433M active / +1635M wired) just for the main process, which then shows an increase of (size=+432M, res=+845M) in 'top'. The forks show no cpu load during the entire thing.
I'm not exactly sure what I witnessed, but this makes me wonder if the problem is only partially with the 2.3% allocation efficiency of the mutexes, and the core of the matter is instead the allocation of temporary structures to support the hashtables. I checked 'procstat vm' again, but none of the reported allocations account for the increase in memory usage (there's only around 30MB of actual allocations). That then makes me wonder if all this overhead is on the kernel level. Someone more familiar with the OS should start looking into this.

covacat · Aug 28, 2025

CONFIGURE_ARGS+= --disable-rpath \
--disable-rpath-install \
--bundled-libraries=NONE \
--builtin-libraries=replace \
--without-gettext \
--disable-tdb-mutex-locking

if you add this to the Makefile of databases/tdb it will lock using fcntl and all the mutexes are gone

BUT
it seems you have to kill all .tdb and re-create them
i nuked /var/db/samba4 and restarted (i had to rejoin domain)

ultramage · Aug 30, 2025

That is a curious suggestion. I have found a historical note from 2014 at lib/tdb/docs/mutex.txt and the commit that initially implemented this at db5bda56. It talks about a then-relevant linux kernel issue involving a single spinlock for all threads waiting on fcntl, I think, as rationale for switching to posix mutexes. It also mentions that these locks are implemented by locking the first byte of the start of each hashtable linked list, which feels equivalent to using a mutex object. The performance might even be comparable. Though I have no idea if freebsd handles concurrent waits any better or worse than linux. In my tiny zero concurrency setup it would work just fine. I'll slip in that configure option the next time I'm rebuilding samba.

Based on my extreme scaling test from earlier, freebsd uses (very roughly) 10kB of kernel memory and 4kB of user memory to implement one robust shared memory posix mutex. I have learned of the command `vmstat -z` which prints a listing of various kernel structures, but I don't know things enough to draw conclusions from it.