Async socket

[edit: see posts below for the obligatory premature optimization warning. Make sure the simplest (but still keeping the library in mem; not calling out to an executable) possible approach isn’t sufficient first.]

The (optimization) problem here is primarily to make sure you are overlapping IO (disk/network) and processing (CPU) time; you don’t want to too many compute tasks to become runnable at the same time — you need some cadence that pipelines — job B is on CPU at the same time A is writing back to the network and C is loading from disk.

Note sendfile also works with shared memory objects, so if these are disposable thumbnails you’re creating (not saved) that might be something to consider. (See shm_open(2).) But I’m not convinced from the problem description that you’re likely to see much benefit - you’re not expecting to pump near line-rate data out of the box.

Assuming you are working off files on disk, I would be tempted to have a handful of loader threads (you’ll need to test on your workload and hardware; more is not necessarily better, especially if spinning rust) that use blocking IO to read the file for a single request (per thread at a time) from disk to memory buffers — partially loading tons of files in parallel through a ton of fancy AIO calls doesn’t get any of them to the resampling step faster, and just creates a new bottleneck later for processing time. Have those threads populate a work buffer queue. {Socket handle, inbuf, outbuf, any additional meta needed for libvips} The goal here is to make sure you are saturating your disk read bandwidth (throughput) rather than launching thousands of concurrent IOs.

Have another thread calling the resample function and populating a completed queue of resample images; it looks like libvips is already parallelized, so one thread may be OK, but you can set how parallel libvips runs and have a handful of processsing queues (to avoid one large file from blocking a number of small ones, for example; up to a maximum of the number of cores you have with libvips set to use one core). Avoid oversubscribing cores with performance-optimized code. Each pass through the loop would pop off a workitem, populate (resample into) outbuf and free inbuf, and place it on the completed queue, and won’t be waiting on any disk/network IO itself. (Ideally; paging can make anything disk-reliant.) You could spend a lot of time trying to optimize the scheduling here; if (otherwise unloaded) single-threaded vips provides fast enough turn-around on your images, I would consider spawning one processing thread per CPU, pinning it, and telling libvips to run single threaded.)

I’d additionally spawn (~2x what you think the max number of concurrent requests will be) a thread pool for the return (writing to the client socket) operations; these threads may unexpectedly block or take a long time based on traffic and bandwidth to the client. These “return” threads pop off the completed queue the socket to write on and the completed thumbnail buffer associated with it.

At some point you’ll need to decide how best to handle DOS attacks. (Requests for lots of thumbnails, and reading the returns at artificially low bits per minute…) Limiting outstanding completions (~ sum of both queue sizes) and delaying accepts is one potential option.
 
i don't think sendfile is fs dependent
Depends on what you mean by "dependent" here.

Sure, its semantics can be implemented with any fs. But the real purpose is how it's done (completely inside the kernal, ideally without copying anything, so, using the same memory for reading the file and sending to the socket). For this to work, some support in the fs is needed.
 
Does the overhead for whatever concurrency scheme on the webserving side of things really matter if the operation performed on any HTTP request is a full image conversion? Is that conversion via a C library or via fork/exec of a commandline program?

It's not that the caching is handled by this program.

And why can't you use sendfile(2)? That should be fastest, assuming you have the output file on disk and the filesystem is not ZFS.
Thank you! sometimes people get tied up thinking about micro optimizations that don't matter as much in the larger picture.
 
Thank you! sometimes people get tied up thinking about micro optimizations that don't matter as much in the larger picture.

Too true. All my suggestions above should only be even considered if you are sure you are running into bottlenecks running the simplest (but still library calls rather than launching an executable) solution possible!
 
ZFS on FreeBSD and Linux doesn't have sendfile since its cache is not integrated with the VM cache. So zero-copy is unattainable right now. I don't know whether ZFS on Solaris has sendfile.
 
Back
Top