C How should I query the level 1 data cache size in C?

xieyuheng · Mar 24, 2025

I use sysconf(_SC_LEVEL1_DCACHE_SIZE) in linux,
but it seems freebsd does not have this:

sysconf

man.freebsd.org

I need to query cache size in the following code:

C:

// aligned to cache line to avoid false sharing
void *
allocate_shared(size_t size) {
    size_t cache_line_size = sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
    assert(cache_line_size > 0);
    size_t real_size = ((size / cache_line_size) + 1) * cache_line_size;
    void *pointer = aligned_alloc(cache_line_size, real_size);
    memset(pointer, 0, real_size);
    assert(pointer);
    assert(pointer_is_8_bytes_aligned(pointer));
    assert(pointer_is_cache_line_aligned(pointer));
    return pointer;
}

atax1a · Mar 24, 2025

looks like you'll have to do inline asm with the CPUID instruction to get this data

cracauer@ · Mar 24, 2025

Run memtest86+. It'll report the cache sizes and speeds.

atax1a · Mar 24, 2025

I think OP was asking how to do it programmatically. MacOS exposes this value via sysctl; freebsd doesn't seem to.

VladiBG · Mar 24, 2025

Accessing macOS System Information - Free Pascal wiki

wiki.freepascal.org

onnxruntime/onnxruntime/core/platform/posix/env.cc at main · microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator - microsoft/onnxruntime

github.com

Code:

#if (defined(__APPLE__) || defined(__FreeBSD__) || defined(__NetBSD__)) && defined(HW_L2CACHESIZE)
    int mib[2] = {CTL_HW, HW_L2CACHESIZE};
    size_t len = sizeof(value);
    if (sysctl(mib, 2, &value, &len, NULL, 0) < 0) {
      return -1;  // error
    }
#endif
    return value;

atax1a · Mar 24, 2025

unfortunately, at least on the 14.2 amd64 machines we've seen, that sysctl node does not exist:

Code:

# sysctl hw.l2cachesize
sysctl: unknown oid 'hw.l2cachesize'

cracauer@ · Mar 24, 2025

Are cache sized even a constant on any given machine now? What about Intel's E-Cores? And AMD's two-CCD x3d chips?

Also, using this for optimization is problematic. Due to the caches being associative you can't fill them to the tilt with just the data you want (unless you take that into account).

xieyuheng · Mar 25, 2025

Maybe using PAGE_SIZE is a good workaround.
I will do more experiments to see.

Crivens · Mar 25, 2025

Cache size optimizations usually only make sense when you want maximum performance, as in number crunching rinning full throttle. Any small-but-weak cores do not factor in here. This also usually is best handled in the lower libraries, like BLAS/LAPACK/... which can even auto-tune to the cache size and which employ algorithms using the cache workings to the max. I would not bet against the ones writing that code that I could do better. I tried, and while my code still scaled better than linear with the number of cores, that was not good enough.

Optimizing for cache line size usually makes much more sense, valgrind will help you there sorting your data structures for cache line locality. Do that first, then the cache size may start to be a thing to address.

T-Aoki · Mar 25, 2025

Maybe optimizing for cache sizes and/or cache line sizes on recent heterogenous (non-fully-symmetrical) CPUs would strongly (almost mandatorily) want almost perfect support of the scheduler, contributed directly from each CPU vendors.

xieyuheng · Wednesday at 2:08 AM

Thanks for the advice, I am writing a simple (single producer single consumer) queue,
I do experiments to see the effect of the cache size optimization, which is about 2x - 3x.

Here is part of the optimization (I also to tricks like cache cursors, but not showed in the code):

Before:

C:

struct queue_t {
    size_t size;
    size_t mask;
    void **values;
    atomic_cursor_t front_cursor;
    atomic_cursor_t back_cursor;
    destroy_fn_t *destroy_fn;
};

queue_t *
queue_new(size_t size) {
    assert(size > 1);
    assert(is_power_of_two(size));
    queue_t *self = new_shared(queue_t);
    self->size = size;
    self->mask = size - 1;
    self->values = allocate_pointers(size);
    self->back_cursor = 0;
    self->front_cursor = 0;
    return self;
}

After:

C:

struct queue_t {
    size_t size;
    size_t mask;
    void **values;
    atomic_cursor_t *front_cursor;
    atomic_cursor_t *back_cursor;
    destroy_fn_t *destroy_fn;
};

queue_t *
queue_new(size_t size) {
    assert(size > 1);
    assert(is_power_of_two(size));
    queue_t *self = new_shared(queue_t);
    self->size = size;
    self->mask = size - 1;
    self->values = allocate_pointers(size);
    self->back_cursor = new_shared(atomic_cursor_t);
    self->front_cursor = new_shared(atomic_cursor_t);
    return self;
}

Where:

C:

#define new(type) allocate(sizeof(type))
#define new_shared(type) allocate_shared(sizeof(type))

C How should I query the level 1 data cache size in C?

Administrator