IMHO it is perfectly advantageous to use assert() as long as you use the NDEBUG macro to disable it during production build. Conditional compilation is your friend: #ifdef...#endif
conditional compilation is a testing nightmare as you have to test all possible combinations of the macros, which quickly explodes in complexity. Assert based error handling, too, is troublesome to test, as you can't verify the "error handling" codes' ability to handle the error without crashing the test binary, and if you configure your test suite to require a crash of the binary on a given input, it will be difficult to test the reason it crashed was the error being identified or a different issue caused the program to terminate. To properly validate the application crashed in an expected way you have to tune the test framework a lot, and many test frameworks cannot do it at all. Return code checking is much more precise and doesn't require complex test framework setup, either.
To take an example, assume the code in this post's snippet is to be tested to handle a failure of
aligned_alloc(2):
I use
sysconf(_SC_LEVEL1_DCACHE_SIZE)
in linux,
but it seems freebsd does not have this:
man.freebsd.org
I need to query cache size in the following code:
C:
// aligned to cache line to avoid false sharing
void *
allocate_shared(size_t size) {
size_t cache_line_size = sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
assert(cache_line_size > 0);
size_t real_size = ((size / cache_line_size) + 1) * cache_line_size;
void *pointer = aligned_alloc(cache_line_size, real_size);
memset(pointer, 0, real_size);
assert(pointer);
assert(pointer_is_8_bytes_aligned(pointer));
assert(pointer_is_cache_line_aligned(pointer));
return pointer;
}
namely the sequence
Code:
void *pointer = aligned_alloc(cache_line_size, real_size);
memset(pointer, 0, real_size);
assert(pointer);
assert(pointer_is_8_bytes_aligned(pointer));
assert(pointer_is_cache_line_aligned(pointer));
Let's assume we test this snippet with the FreeBSD libc's
aligned_alloc
to perform an allocation we know in forward it cannot handle, and we want to verify the snippet handles the error, with the expectation that the test application asserts and thereby terminates. For this purpose, the test framework is instructed this test is known to fail and in fact expected to fail. After all, we're
abort(3)-ing in the
assert(pointer)
line, right?
When the test is run, the test application abnormaly terminates, returning to the test framework's runner with non-success child proccess exit code, which is what we told the framework to expect, so the error handling works, right?
What happened, however, was that that the first line assigned NULL to
pointer
, the next line then asked
memset(3) to write into that NULL, and
memset
caused a segmentation fault. The segfault then caused an abnormal termination, without the error handling code having been involved, since it checks the value of
pointer
being acceptable only after having passed it to
memset
.
If the snipet is fixed by first asserting the return value being acceptable and only invoking memset after it passed the assertion, the release build, where the assertion, i.e. the error handling, was "optimized out", will still run straight into memset and segfault.
The next line after the null pointer check asserts 8 byte alignment. This smells like the memory chunk is meant to be used for
mmintrin.h stuff, or handwriten SIMD assembly code. If it is used for that purpose, the value should be checked at allocation, and on failure emit a diagnostic any end user understands before returning to the applications' main loop or retrying the allocation, as continuing to the SIMD code would emit SIGILL and coredump on some opcodes and hardly anyone would be able to understand what happened. If the assertion is "optimized out" for release builds... it should provide a meaningful diagnostic nevertheless, and equally have a means of recovery, either by retrying the allocation or by returning to the main loop in order to process other input that doesn't need 8 byte aligments while the SIMD request is handled by another thread/process/node, or use a potentially slower non-SIMD implementation to process the request - e.g.
multimedia/ffmpeg does multithreaded video decoding using xmmintrin code, but has pure C fallback implementations in case it can't use xmmintrin which it checks at runtime on a per-thread level, so it can potentially be using SIMD on some threads and non-SIMD on others (which AMD folks are probably quite happy about)... it also has a number of CVEs for asserting on values only after having used them, just like the snippet that passes an unchecked pointer to memset and then asserts on it being nonzero.
For the last line, I doubt it should be handled as an error. All it affects is performance, "eventually", and I wouldn't bother unless a profiler proofs that cache coherency is actually being an issue. Number crunching apps, which are the only ones that more or less could be considered to actually be affected by cache coherency in a measurable dimension, are common to be run on clusters and GPUs. In either environment, the process can be moved through nodes at runtime, meaning that cache line sized might vary at runtime. If it's cheap node usage, such as cloud infrastructure that is bought on demand and canceled on job completion to safe costs, the compute nodes can be live migrated transparently at run time, potentially causing the application to continue running on a system with different cache line size. FreeBSD can handle being live migrated, and reinvoking
sysctl(2) will provide the new cache line size that can be used for future allocation, but what will happen to already allocated memory chunks? How much performance would it waste to check whether the process got live migrated? If you bail and coredump over an unaligned allocation, losing all data of all threads that is not in consistent storage, and restart the process in hope it will succesfully allocate aligned memory, rerequest work from the compute clusters' master node and thereby cause the master node to have to work more, too, is it really faster then just proceeding with unaligned memory and losing some cache coherency on a single thread for a single request?