Solved Is it possible to make a 'concatenated link' to multiple files

Imagine that we have several large-sized multimedia files called file1, file2, file3, ..., fileN respectively. Is there a way to create a file in the file system, say called clink, of very small size, which points to these multiple files so that by accessing the file clink one would be effectively accessing file1, file2, ..., fileN sequentially and so that it also allows seeking? The file clink should be 'static' in the sense that it does not require any more operation than mounting the file system, unlike the case of a named-pipe where there needs to be a process feeding the data (and easily breakable). For my purpose, it is enough to have a seekable read-only access. Thanks!
 
It would be possible to write your own small device driver to do this, where the driver presented the device /dev/clink that could be read in the way you suggest. See the page in the FreeBSD handbook on character devices or for more detailed information the book FreeBSD Device Drivers.

Another option would be to find a filesystem that stores files' data sequentially in the manner you need and then use mdconfig(8) to map a memory device to the start of the data, skipping any filesystem and header information. At first I thought FAT might be a good candidate for this, but you are likely to end up with slack space between the files so they would not be truly contiguous. I prefer the custom device driver option since you could write something that would be robust to file deletion, modification and addition.
 
It's possible to create vnode-backed memory disks with mdconfig for each file then concatenate them together with a GEOM CONCAT:
Code:
mdconfig -S 1 -f file1
mdconfig -S 1 -f file2
gconcat create -v DEVNAME /dev/md0 /dev/md1
The file is then accessible at /dev/concat/DEVNAME.

I split one of my movie files to try this out. I had trouble playing it with mpv, but mplayer and VLC worked fine. The checksum is different than the original however... Not sure why.

EDIT: When we force the sector size of the memory disks to 1 byte, mpv now works and also checksumming is ok. I've edited the commands above.
 
Really nice solution, tobik. I wasn't previously aware of GEOM CONCAT. It would be easy to script for a large number of files, with the caveat that GEOM CONCAT doesn't seem to support device modification, meaning that the device would need to be destroyed and recreated to add or remove files.

I suspect but haven't confirmed that mdconfig(8) defaults to a sector size greater than a byte, so you were ending up with some unallocated bytes at the end of the device (similar to the slack space issue I mentioned relating to my earlier gnop(8)/vnode FAT suggestion) that was causing your checksum mismatch.
 
Thanks for your suggestion, asteriskRoss! I'll look into the device driver related stuff more thoroughly when I have more free time to spare.

tobik, your approach is almost exactly what I wanted! Thanks! I still have a little bit of problem with it though. For the checksum, I find that if I use cksum or shasum, then the results are the same. However, for all the other tool, like md5, sha1, sha256 and sha512, the results are different. I guess the difference comes from that the latter commands access the character device differently. So if I use
Code:
cat /dev/concat/DEVNAME | md5
instead, the result is the same as the md5-result of the original file.

But meanwhile if you just use md5 on a symbolic link to a real concatenation of the split files, the result is the same as the original one. From another perspective, if one uses ffprobe on /dev/concat/DEVNAME, then one would get an error like this
Code:
/dev/concat/DEVNAME: Invalid data found when processing input

So it seems that different programs interact with the character device differently. Do you have any further suggestion to make the presentation of the character device uniform across all software accessing it? It would actually be better if one can make it behave like a symolic link or a hard link.
 
So it seems that different programs interact with the character device differently. Do you have any further suggestion to make the presentation of the character device uniform across all software accessing it? It would actually be better if one can make it behave like a symolic link or a hard link.
The reason that this is happening is that fstat(2) returns a file size of 0 for the device, so the programs think there is nothing to read. Some programs like mpv or shasum don't care about the size or never use fstat(2) and just read until EOF.

If it doesn't fit, use a bigger hammer: https://gist.github.com/t6/91ab2481b9f4f91583abe8e75b3b5b1f. Do not use. EDIT: updated to query device's media size with the DIOCGMEDIASIZE ioctl making this a little less hacky.

Instead I'm thinking that you can probably write a very simple FUSE filesystem for this that delegates all reads, lseeks, etc. to the device. This is probably less frustrating than writing a device driver.
 
Doing it all within a preloaded "biggerhammer" would be an option, hooking into all the related syscalls. Icky maybe, but high benefit/work.

Juha
 
As tobik mentioned, it'll probably be easier to write a FUSE file system to implement this idea. Well, as it turned out, someone already wrote one, called concatfs, the source code being available here: https://github.com/schlaile/concatfs . It works both on Linux and FreeBSD.

I am currently trying this program out. At this moment it seems that the code is more or less working as expected, albeit (at least on FreeBSD) some improvement seems necessary. But I say it is a good place to start.

For those who are interested in trying it out too, you can compile the source code using
Code:
gcc48 -Wall concatfs.c `pkgconf fuse --cflags --libs` -o concatfs
Obviously you need to install sysutils/fusefs-libs and load the fuse.ko kernel module by either using kldload fuse or adding a line fuse_enable="YES" in /boot/loader.conf . After you compile the source code, by default you can't use it as root, so you need to do sysctl vfs.usermount=1 and make sure that the user calling this program is in the operator group. You could also comment out Line 672 to Line 677 in the source code to allow root to use it. After all these, you can then use the program as described by the author.
 
I am currently trying this program out. At this moment it seems that the code is more or less working as expected, albeit (at least on FreeBSD) some improvement seems necessary. But I say it is a good place to start.
I tried this out, but after checksumming the concatenated file and after quitting mpv it would just hang for a while in getattr()... There seem to be some locking related bugs in there (I'm not convinced that struct concat_file needs to be stored globally and locked. It could be stored in the fuse_file_info.fh field, which is large enough to hold a pointer, which would remove the need for locking anything yourself. See examples/fusexmp_fh.c from sysutils/fusefs-libs). Other than that it seems to work ok.

I wrote a small FUSE fs (https://github.com/t6/devconcatfs) which re-exports /dev/concat read-only and returns the correct sizes for the devices and pretends they are regular files. Mount with devconcatfs -o direct_io /mnt/point.
 
Thank very much for your awesome code, tobik! It indeed works much better than the other one. I did noticed that the other one hung quite often. The one you wrote works quite perfectly! I consider this thread solved!
 
Back
Top