C Increase named pipe size

mhlm98 · Apr 7, 2020

Hi,

What is the preferred way of increasing a named pipe (fifo) size in FreeBSD. I just found the fcntl with F_SETPIPE_SZ option borrowed from Linux and I'm not sure whether it is the right approach as I can't make it compile even if I'm using:

Code:

#define _GNU_SOURCE

Thanks a lot!

mark_j · Apr 8, 2020

I guess the real question is why? If you have a slow reader, for example, then it's a design issue not a buffer issue.
I would have to check but I'm pretty sure FreeBSD uses a dynamic buffer.

Edit: I had a quick look. Circular buffer sizes slide from 5461 to 10922. If you're ever determined to code within some system buffer, you're doing it wrong. Design a protocol and use your own buffer to pull out the data.

ralphbsz · Apr 8, 2020

To begin with: What you are changing here is not the actual size of the pipe, in how many bytes it can hold. That's because pipes have infinite size: As long as the reader absorbs data fast enough, you can write infinitely much into them. Contrariwise, if the reader doesn't read the data, the pipe will be soon very very soon, typically after a few kilobytes.

What this controls is the size of the memory buffer, which is saying: how much the writer can be ahead of the reader. Which is equivalent to saying: the largest atomic write a writer can perform using a single write(2) system call. I remember that pipes usually allocate one or a few VM pages (4KiB each), so this call adjusts the size of that buffer, not the total.

No, I don't know a way to adjust this on FreeBSD, but for most use of pipes, it shouldn't be necessary in the first place, since most protocols will chop the data into reasonable-sized packets.

elephant · Apr 8, 2020

$ sysctl -a | grep 'pipe'

That should answer your question.

mark_j · Apr 8, 2020

No, pipekva report current memory used in pipes, not buffer sizes. maxpipekva is the pagable limit. From the source:

Code:

* Based on how large pipekva is relative to maxpipekva, the following
 * will happen:
 *
 * 0% - 50%:
 *     New pipes are given 16K of memory backing, pipes may dynamically
 *     grow to as large as 64K where needed.
 * 50% - 75%:
 *     New pipes are given 4K (or PAGE_SIZE) of memory backing,
 *     existing pipes may NOT grow.
 * 75% - 100%:
 *     New pipes are given 4K (or PAGE_SIZE) of memory backing,
 *     existing pipes will be shrunk down to 4K whenever possible.

mhlm98 · Apr 8, 2020

Thanks for the answers! I am using a named pipe in the following scenario: The writer will continuously gather data about the system from a monitoring tool and will pass it to the reader in chunks (size + data). The reader is slower than the writer because it needs to assign received data to some data structures. The pipeline works correctly for a while but at some point it will fail because the reader has less data to read than it was indicated by the writer. Given this scenario, the only explanation I could find is that the fifo buffer reached its maximum capacity and data is overwritten (otherwise I assume the write call would fail)

mark_j · Apr 8, 2020

Well then you need to rethink your approach.

No matter what size your buffer is, an error will eventually occur, probably SIGPIPE.

I would approach it this way:

You will need a buffer in your reader which will be basically a sliding buffer, with data coming in the head, extracted and the rest "moved". It really doesn't matter how large the buffer is, but I would assume probably large enough to do a read(2) and empty the kernel's buffer, whatever size that may be.

I would devise some protocol to discern a packet of data from the writer.

To assist the reader in processing from the writer, you would use select(2) ( or even kqueue(2) ) to advise your reader when data is available. It will notify you when it's got data available. You then need to decide if it's the end of the packet of data or there's still more to come. If the latter, you just wait to be notified again.

Once that complete packet of data is received in your buffer, you copy it and pass it off for processing. You can either process it in the main thread or have a thread waiting for this packet or create a new thread for this packet. That all depends on your data demands and speed requirements.

Named pipes, sockets, ports and so on all act the same. Data in, data out. It's up to you to devise a scheme to put the data in and take the data out.

An example:
Suppose you want a packet to have a % at the end of it to mark/delimit the data.
Suppose the data is This is my test data%It is mundane but important%. Suppose now that select(2) is in a loop and it fires off when you've received:

This is my test
So you put this in your buffer using read(2), scan it and find there's no % in there, so you wait for another select(2) until you finally get the percent sign.

Let's suppose the next select(2) call returns:
data%It is mun

So, you read(2) it and place that in your buffer, scan the buffer from head until either the percent or the end of the buffer. In your case, you've now got This is my test data%, so copy that out of the buffer, move It is mun to the start (logically) and when the next select(2) alerts you to data being available, append the incoming data to that.

And so on.

Of course you could even just do a select(2) and read(2) of 1 byte, processing the data 1 byte at a time. It is effective but horrible. There's many ways to achieve what you're apparently attempting.

elephant · Apr 8, 2020

Another option is to use an mqueue.

mq_send

www.freebsd.org

mhlm98 · Apr 8, 2020

Thanks a lot for the help! I'll give it a go

ralphbsz · Apr 8, 2020

I've found that pipes (whether named and accessibly through the file system, or unnamed and passed between processes) are hard to use for anything but really short (dozens or hundreds of bytes) messages, which are self describing. Why don't longer messages work? Because, as mark_j said, the size limit: Both the reader and writer might block, and they have to deal with partial messages.

I have two solutions that work. One is: if the data is pretty large (many KB to MB and more), then use the file system. The writer takes a whole bunch of data, and plops it in the file system somewhere as a file. The reader then finds those files, reads them, processes, and deletes them. One thing I've done is to use a named pipe as a mechanism that mostly delivers the file locations: The writer puts a short self-contained message in the pipe, which could just say (in a suitable encoding) "New data is in /tmp/xfer/123456.data\n", and the reader looks for lines of text (terminated by newline) in the pipe. One advantage of using a file system is that the data survives either reader or writer process crashing. One disadvantage of this scheme is that if the reader crashes and restarts, it will have to look in a well-known location (in my above example /tmp/xfer) for all the files it SHOULD have processed and missed. Clearly this scheme is inefficient (all the data has to go to/from disk), but for small quantities, that just doesn't matter.

The better solution is to give up on using pipes and writing your own protocol, and using machinery that others have already written. There are some really good libraries to use sockets and protocols directly, without having to think about the details. Recently, I started using gRPC as a mechanism for two Python programs to communicate. The beauty is that you don't have to implement the handling of partial messages, lost messages, and all that yourself any more. Just describe your communication in terms of what data flows in which packets (called RPCs), write the functions, and it starts working. And it works not only within a computer, but across the network.

Eric A. Borisch · Apr 8, 2020

Putting misc/mbuffer in your pipeline can give you an arbitrary-sized buffer without changing your utilities. Sticking it right before the reader: mbuffer -i ${NAMEDPIPE} | reader will likely do the trick, but if you need the buffer to be larger, see mbuffer(1).

I've used it to great effect in things like ZFS send/recv where either end has portions of the process that are slow relative to what the other end is doing; rather than having the send block while the recv is writing out lots of small metadata, for example, the data can continue to be shoveled over the (typically limiting bottleneck) network connection and soaked up in the buffer until the writes catch up.

mhlm98 · Apr 8, 2020

I found that gRPC doesn't have an API for C. Would protocol buffers be an potion?

Eric A. Borisch · Apr 8, 2020

If you want to redo your tools with a new queue layer, you might look at devel/zmq.

ralphbsz · Apr 8, 2020

mhlm98 said:
I found that gRPC doesn't have an API for C. Would protocol buffers be an potion?

If you want a potion, I recommend a good brandy. I'm particularly in favor of Greek, Spanish and Portuguese ones. "Cardenal Mendoza", "Gran Duque de Alba", "Metaxa 7 star", "Maciera", and so on. (I know, the Metaxa is not just a brandy, but also contains wine and spices.)

OK, once you recover from drinking those ...

Protocol buffers are not an alternative to gRPC, but an ingredient of gRPC. You use the protocol buffer language to describe the data that is sent back and forth for each RPC. For example, in my home pump monitoring system there is a RPC called "get status": The client sends a request, which contains one bit of data, namely a boolean called "short" whether the client wants a short and efficient version of the state, or a long and detailed (but slow) version. The reply to the RPC has several fields, namely an overall status OK (a boolean), a floating point time value that indicates when the status was obtained, the version of the monitoring server as a string, and finally the status. And to be honest, since I like to program in Python at home, I've been mixing protocol buffers with Python Pickles: I take the status (an array of many things, which are individually complex objects), and just pickle it into a byte array and transmit it.

Having defined my requests and responses (the protocol) using protocol buffers, I then use the gRPC as the framework for sending and receiving. It really takes nearly all the work out of it; both client and server framework are a handful lines of code (really, about 5 lines).

Yes, there is no binding of gRPC to C itself, only to C++. That's because (like most modern frameworks), gRPC is object-oriented, and relies on inheritance. If you are absolutely stuck on using C, then this won't work for you. But here would be my suggestion: Your programs would probably compile just fine with a C++ compiler, even though there are just native C. If that works for you, there is nothing wrong with mixing C and C++ in the same program.

And one more warning: A program that can handle RPCs is pretty much by definition a parallel or multi-threaded program. There are easy ways to deal with that, such as locking or queues. But you do have to think about these issues.

There are many other RPC and message queue packages around.

gpw928 · Apr 9, 2020

mhlm98 said:
Thanks for the answers! I am using a named pipe in the following scenario: The writer will continuously gather data about the system from a monitoring tool and will pass it to the reader in chunks (size + data). The reader is slower than the writer because it needs to assign received data to some data structures. The pipeline works correctly for a while but at some point it will fail because the reader has less data to read than it was indicated by the writer. Given this scenario, the only explanation I could find is that the fifo buffer reached its maximum capacity and data is overwritten (otherwise I assume the write call would fail)

I think it's important to understand the problem you are trying to solve.

If you write to a full named pipe, the write system call will block until the reader reads out enough of the kernel buffer to free up enough space for the write to complete. If the reader can't keep up very well, the writer will just spend most of its time sleeping in the kernel waiting for the required buffer space to become available. Nothing should break. But progress may be slow or jerky.

There are many options to speed things up, or smooth out the process. These generally involve adding more processes and/or buffering to alleviate the bottleneck.

You generally only need to consider the speed-up if there are real-time issues. The classic one is to keep a tape drive constantly spinning (avoid start/stop mode). But gathering of any real-time data may need timely processing.

Sometimes extra buffering is enough. Eric A. Borisch mentioned mbuffer(1).

Other times you have to create a queue for the reader processes to clear down the load in an orderly fashion. In first principles, you need Edsger Dijkstra's semaphores. Eric A. Borisch mentioned devel/zmq.

mhlm98 · Apr 9, 2020

gpw928 said:
I think it's important to understand the problem you are trying to solve.

If you write to a full named pipe, the write system call will block until the reader reads out enough of the kernel buffer to free up enough space for the write to complete. If the reader can't keep up very well, the writer will just spend most of its time sleeping in the kernel waiting for the required buffer space to become available. Nothing should break. But progress may be slow or jerky.

There are many options to speed things up, or smooth out the process. These generally involve adding more processes and/or buffering to alleviate the bottleneck.

You generally only need to consider the speed-up if there are real-time issues. The classic one is to keep a tape drive constantly spinning (avoid start/stop mode). But gathering of any real-time data may need timely processing.

Sometimes extra buffering is enough. Eric A. Borisch mentioned mbuffer(1).

Other times you have to create a queue for the reader processes to clear down the load in an orderly fashion. In first principles, you need Edsger Dijkstra's semaphores. Eric A. Borisch mentioned devel/zmq.

I was expecting the writer to be blocking if the pipe is full (especially because on the writer side, the write() system call returns the correct number of bytes) but the data gets lost (or overwritten).

gpw928 · Apr 9, 2020

If the write returns (without an error) then the data have been stored in a kernel buffer, and should be readable. I'd be fairly confident that you have a bug in your code. That's certainly where I would start the investigation.

Eric A. Borisch · Apr 9, 2020

mhlm98 said:
I was expecting the writer to be blocking if the pipe is full (especially because on the writer side, the write() system call returns the correct number of bytes) but the data gets lost (or overwritten).

I would be shocked if this is the case. Pipes are used every day to transport large and critical
data without dropping bits. I believe you will find a bug in your code; perhaps not checking the return value of write(2) to notice and complete a partial write or likewise a partial read in read(2).

Edit: changed retry to complete for partial read/write to be more explicit. You have to pick up where it finished, not retry the whole operation.

mhlm98 · Apr 9, 2020

So a partial read can happen even if the write operation was completed?

Eric A. Borisch · Apr 9, 2020

If you have a simple code reproducer, we can try to help debug it. Do you write the size & payload in one write() call?

In general, if you haven't set non-blocking, I would expect the read to block until the bytes are available from a pipe, and likewise the write should block when it becomes filled. That said, you should always check the return codes from write() and read() for completion and potential continuation.

I'll also point to this in read(2):

The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.

mhlm98 · Apr 9, 2020

I managed to solve it, thanks a lot for the guidance!

Eric A. Borisch · Apr 10, 2020

mhlm98 said:
I managed to solve it, thanks a lot for the guidance!

So what was the solution?

PMc · Apr 11, 2020

Now, first off, what is this actually about here?
Are we talking about pipes, which are a unix tool to exchange data between local processes, or are we talking about networking stuff that would involve sockets (as some comments suggest)?

The following is concerned with local pipes. If networking is involved, then we would need to look into socket programming, which makes things quite a bit more complex.

mhlm98 said:
Thanks for the answers! I am using a named pipe in the following scenario: The writer will continuously gather data about the system from a monitoring tool and will pass it to the reader in chunks (size + data). The reader is slower than the writer because it needs to assign received data to some data structures. The pipeline works correctly for a while but at some point it will fail because the reader has less data to read than it was indicated by the writer. Given this scenario, the only explanation I could find is that the fifo buffer reached its maximum capacity and data is overwritten (otherwise I assume the write call would fail)

This is strange. I am doing this all the time, since using unix. There should not be a technical difference between a named pipe and a commandline '|' pipe, and I am using commandline pipes, as well as named pipes, to chain together dozens of things, and there was never such a problem. If the writer doesn't send, the reader waits, and if the reader doesn't read, the writer waits. (Actually my first unix program was a converter for mails from usenet to some mailbox format, and it consisted solely of sed commands chained together in a long pipe.The next guy who got that script, complained that his system did run out of processes that way. That was some Xenix/pc, limited to 40 processes by default.)

There is, however, a problem with pipes, and it has to do with blocksizes and chunks. It is not fully clear to me what actually happens there, but, as it appears, the pipe itself has no size at all, and only passes on the buffers as it receives them.
Now if the blocksize and the chunksize differ from each other, under some conditions there may be data loss.

I ran into this most often with dd. dd is commonly used to read from devices, and it does so by using the native blocksize of the device. (If you try to read a tape with the wrong blocksize, that just doesn't work. Same if you try to read from a CD with a blocksize not even to 2048.) So, the blocksize is some kind of attribute of a data stream, and it is important (in certain cases).

But then, you can also use dd to read an ordinary file (or another pipe). But what is the native blocksize of an arbitrary file or pipe? It appears to be arbitrary.

And this will hit you badly, as in a couple of cases it had hit me. The first is, when you try to use the conversion features of dd. These features are block based, and if the blocksize is arbitrary, they behave erratic, and data will be lost. And you can twiddle around with ibs= obs= cbs= as much as you want, and you can put as many pipes before it as you like, it just doesn't work - you don't get rid of that blocksize attribute.

The second, maybe easier to understand, it the following use-case. Imagine you have only one file-handle, and need to transfer multiple files to it. If these files are indeed streams, so you don't know their individual size beforehand, there is no way to tell the other end where one file would end and the next begin.

The way to solve this is to use a chunked approach. Split each file, as it appears, into little chunks of data, and send each chunk with a header that tells the length of the chunk. So, whenever that length is zero, the other end knows the current file has ended.
The most simple approach to do this would be something like this (stdin to stdout):

Code:

size=-1
while test size -ne 0; do
  dd bs=128k count=1 of=tempfile
  size=`wc -c tempfile`
  echo "chunk: size = $size"
  cat tempfile
done

But, this does not work. It will loose data in a strange and random fashion.
That is, because dd reads the input in it's native blocksize. And, if that does not even up with the 128k, and since there is no receiver for the overhead, it will throw it away.

So, maybe Your use-case is in some way related to this phenomenon. In that case, twiddling with the pipes does not help. You can put as many pipes in the scheme as you want, that does not change the issue - the pipes appear to be actually zero-sized, they do not buffer.

In these cases, I had to resort to C coding, and use nothing else than an ordinary, simple read(2), which does the correct thing.

Eric A. Borisch said:
I'll also point to this in read(2):

The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.

Click to expand...

Certainly one needs to evaluate the return values of the read, and repeat it as long as necessary.

Code:

int readn(int num, char *buf)
{
  int count = 0, rnum;
  
  while(TRUE) {
    rnum = read(fileno(stdin), buf + count, num - count);
    if(debug)
      fprintf(stderr, "read %d of %d\n", rnum, num);
    if (rnum <= 0)
      if (rnum == 0)
        return(count);
      else {
        fprintf(stderr, "bufcpy: readerr %d\n", errno);
        exit(3);
      }
    else {
      count += rnum;
      if(count == num)
        return(count);
    }
  }
}

mhlm98 · Apr 12, 2020

Eric A. Borisch said:
So what was the solution?

I had an unrelated bug and used streams with the named pipes, which seem to prevent the problem from occurring but I still have doubts that it actually solved it.