C/C++ Realistic estimate of the amount of RAM actually available

I'm trying to make a fairly realistic estimate of how much RAM is "really" available to a program at any given time (clearly it will vary, it's just an estimate)

By "really free" I mean the amount that can be allocated by a single program, albeit divided into multiple threads, before causing a serious system slowdown, swapping and possibly even trashing

It is, in effect, an archiver that at a certain moment "takes" all the available RAM (minus 25% in order not to slow down excessively) for a certain job.
On Windows the estimate I have made, empirically, is quite good
But now I'm porting to BSD

The header
...and
shows it is not so easy to answer

A very-quick-and-dirty total RAM size
Code:
#if defined(_SC_PHYS_PAGES)
#if defined(_SC_PAGE_SIZE)
    long pages=sysconf(_SC_PHYS_PAGES);
    long page_size=sysconf(_SC_PAGE_SIZE);
    return pages*page_size;
#endif
#endif

I cannot / want to use complex libraries, dmesg parsing or command redirection (sysctl, top and so on)
Code:
root@aserver:/tmp/zp # top |grep -Em2 '^(Mem|Swap):'
Mem: 19M Active, 34G Inact, 188M Laundry, 26G Wired, 739M Buf, 1566M Free
Swap: 16G Total, 217M Used, 16G Free, 1% Inuse
Short version: a quick way to estimate the "Inact" memory (yes, there is ARC that can be "grabbed" etc. But, as I said, better stay on the safe side)
Something like the available of free -m (on Linux)


Thanks to all responders
 
Hi
In one application, this is the code I use to get the amount of free memory.

C:
uint64_t get_free_memory(void)
{
        int pagesize=0;
        int inactive=0;
        int free=0;
        size_t size = sizeof free;

        sysctlbyname("vm.stats.vm.v_free_count", &free, &size, NULL, 0);
        sysctlbyname("vm.stats.vm.v_inactive_count", &inactive, &size, NULL, 0);
        size = sizeof pagesize;
        sysctlbyname("vm.stats.vm.v_page_size", &pagesize, &size, NULL, 0);
        return (free + inactive) * pagesize / 1024 ;

}
This code is derived from top(1)
 
Is the program going to do IO, in particular file system IO? In that case, it will need file system buffers. When I say "need", that means that performance of IO will be much better if it has sufficient buffer and cache memory. So if the userspace program first locks down (allocates) all the "free" memory, and then starts IO, it shoots itself in the foot.

You said you will reserve 25%; that might be enough to cover the need of the IO system; or it might not.
 
Is the program going to do IO, in particular file system IO? In that case, it will need file system buffers. When I say "need", that means that performance of IO will be much better if it has sufficient buffer and cache memory. So if the userspace program first locks down (allocates) all the "free" memory, and then starts IO, it shoots itself in the foot.

You said you will reserve 25%; that might be enough to cover the need of the IO system; or it might not.
The archivier (zpaq fork) do not write anything (unless you choose to, for real filesystem restorability, it is the -paranoid switch :)

Read data (from disk), deduplicate-uncompress, then write "in RAMDISK" (in fact by malloc() of course, not a "real" ramdisk) the extracted data

Then do some things and, in some cases, write the "ramdisk" on media, with one thread (on spinning drive, no seeks, therefore max bandwith (+latency for small files of course)) or as many threads as you want (for SSD, NVMe and "real" RAMDISKs, max bandwidth)

It is necessary to check archives where the biggest single file is NOT bigger then "real-free-RAM" (to prevent swapping and even trashing).

From a performance point of view, there are two possibilities (that you can choose in my software)

- the first, the "greedy" one, which allocates all the memory it can to lessen the number of iterations (=the-largest-chunk-possible). The question of this thread. My smallest machine have 128GB, the largest 768GB

- the second, the "frugal" one, where the minimum necessary memory is allocated (=more iterations, but good for 8/16GB physical RAM)

Obviously this is not feasible for really large files (eg virtual disks), it essentially works for fileservers, where there are many files, but a few tens of GB at the most (mbox, MP4 video etc)

Otherwise the use of a temporary disk is mandatory (no "ramdisk")

In this example (greedy) about 124GB will be extracted in 4 chunks each of ~34GB each
Code:
73 versions, 175.679 files, 1.530.407 fragments, 88.416.098.834 bytes (82.34 GB)
Free RAM (-25%)               34.410.501.120 (as reported by OS)
Minimum needed  (+10%)         5.180.731.422 /tank/d/posta/postathun/locale/Trash
Chunks 0004 x                 34.410.501.120 (total decompressed size 124.758.792.572)

In this example (frugal) 22 iterations of 6GB each
Code:
73 versions, 175.679 files, 1.530.407 fragments, 88.416.098.834 bytes (82.34 GB)
Free RAM (-25%)               34.409.527.296 (as reported by OS)
Minimum needed  (+10%)         5.180.731.422 /tank/d/posta/postathun/locale/Trash
Chunks 0022 x                  6.000.000.000 (total decompressed size 124.758.792.572)

Basically the goal is to use as much CPU (/cores) and RAM to get an extraction as possible CPU-bound and not IO-bound, without wearing the device with temporary files

If you know zpaq, in particular unzpaq206, the reference decoder that operates essentially in RAM, my version is (obviously depending on the circumstances, type of CPU etc) about X times faster, where X is the number of CPU's cores (just about linear). At least with 16 cores (Win). For more cores I need... the BSD version (just about all my servers runs BSD)

Is it clearly not possible to obtain a better result?
Not really sure, it would be possible to read in parallel (= NVMe), but it becomes really difficult to completely redo the extraction procedure

One step at time, it is a really complex piece of software
 
Top