Now, first off, what is this actually about here?
Are we talking about
pipes, which are a unix tool to exchange data between
local processes, or are we talking about networking stuff that would involve
sockets (as some comments suggest)?
The following is concerned with local pipes. If networking is involved, then we would need to look into socket programming, which makes things quite a bit more complex.
Thanks for the answers! I am using a named pipe in the following scenario: The writer will continuously gather data about the system from a monitoring tool and will pass it to the reader in chunks (size + data). The reader is slower than the writer because it needs to assign received data to some data structures. The pipeline works correctly for a while but at some point it will fail because the reader has less data to read than it was indicated by the writer. Given this scenario, the only explanation I could find is that the fifo buffer reached its maximum capacity and data is overwritten (otherwise I assume the write call would fail)
This is strange. I am doing this all the time, since using unix. There should not be a technical difference between a named pipe and a commandline '|' pipe, and I am using commandline pipes, as well as named pipes, to chain together dozens of things, and there was never such a problem. If the writer doesn't send, the reader waits, and if the reader doesn't read, the writer waits. (Actually my first unix program was a converter for mails from usenet to some mailbox format, and it consisted solely of
sed
commands chained together in a long pipe.The next guy who got that script, complained that his system did run out of processes that way. That was some Xenix/pc, limited to 40 processes by default.)
There is, however, a problem with pipes, and it has to do with blocksizes and chunks. It is not fully clear to me what actually happens there, but, as it appears, the pipe itself has
no size at all, and only passes on the buffers as it receives them.
Now if the blocksize and the chunksize differ from each other, under some conditions there may be data loss.
I ran into this most often with
dd
. dd is commonly used to read from devices, and it does so by using the native blocksize of the device. (If you try to read a tape with the wrong blocksize, that just doesn't work. Same if you try to read from a CD with a blocksize not even to 2048.) So, the blocksize is some kind of attribute of a data stream, and it is important (in certain cases).
But then, you can also use dd to read an ordinary file (or another pipe). But what is the native blocksize of an arbitrary file or pipe? It appears to be arbitrary.
And this will hit you badly, as in a couple of cases it had hit me. The first is, when you try to use the conversion features of dd. These features are block based, and if the blocksize is arbitrary, they behave erratic, and data will be lost. And you can twiddle around with ibs= obs= cbs= as much as you want, and you can put as many pipes before it as you like, it just doesn't work - you don't get rid of that blocksize attribute.
The second, maybe easier to understand, it the following use-case. Imagine you have only one file-handle, and need to transfer multiple files to it. If these files are indeed streams, so you don't know their individual size beforehand, there is no way to tell the other end where one file would end and the next begin.
The way to solve this is to use a chunked approach. Split each file, as it appears, into little chunks of data, and send each chunk with a header that tells the length of the chunk. So, whenever that length is zero, the other end knows the current file has ended.
The most simple approach to do this would be something like this (stdin to stdout):
Code:
size=-1
while test size -ne 0; do
dd bs=128k count=1 of=tempfile
size=`wc -c tempfile`
echo "chunk: size = $size"
cat tempfile
done
But,
this does not work. It will loose data in a strange and random fashion.
That is, because dd reads the input in it's native blocksize. And, if that does not even up with the 128k, and since there is no receiver for the overhead, it will throw it away.
So, maybe Your use-case is in some way related to this phenomenon. In that case, twiddling with the pipes does not help. You can put as many pipes in the scheme as you want, that does not change the issue - the pipes appear to be actually zero-sized, they do not buffer.
In these cases, I had to resort to C coding, and use nothing else than an ordinary, simple
read(2), which does the correct thing.
I'll also point to this in
read(2):
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
Certainly one needs to evaluate the return values of the read, and repeat it as long as necessary.
Code:
int readn(int num, char *buf)
{
int count = 0, rnum;
while(TRUE) {
rnum = read(fileno(stdin), buf + count, num - count);
if(debug)
fprintf(stderr, "read %d of %d\n", rnum, num);
if (rnum <= 0)
if (rnum == 0)
return(count);
else {
fprintf(stderr, "bufcpy: readerr %d\n", errno);
exit(3);
}
else {
count += rnum;
if(count == num)
return(count);
}
}
}