question about block devices

Hi,
in another thread someone pushed me to this page that states that:

Code:
Because the implementation of the aliasing of each disk 
(partition) to two devices with different semantics 
significantly complicated the relevant kernel code FreeBSD 
dropped support for cached disk devices

It is not clear to me what this means. I guess that having a disk device, like da0, means that before the redesign on the IO system we had a character device and a block (cached) device for da0, so two devices for the same disk (slice). Now only one device exists (the char device) and the block device is a kind of software layer on top of it (is it GEOM). Is this correct? In such case, not having a caching means that, for instance fsync is no more useful or are we talking here of something else? I'm really confused about this...
 
Caching in this context should refer to the access to the raw disk device.
After the redesign, the caching is done in the vnode layer and not in the IO layer. This means that if you read a sector from the raw disk device, and read that again, it will result in a new IO operation as no caching in the IO system is done. Because that cache is moved upwards to the vnode layer this does not mean that you do not need an fsync operation. The information which is held in the memory pages gets flushed out after (default) 30 seconds if the vnode (and thus the file) live that long. When the file is deleted before that time is elapsed, no IO is done as the vnode is scrubbed before the writing.
 
It is more clear now, but I've got still doubts. When you talk about raw devices, are you referring to character based acces or block access? Because it is not clear to me what is that device alias the documentation refers to.
Moreover, if the vnode system makes a cache of the disk data, does it mean that the operating system will actually have a cache of such data available to other processes or that every process will have a new vnode caching?

Finally, I'm curious to understand from when the i/o caching has been removed from freebsd and if other unixs are still using it or if this is a common trend.
 
The vnode cache works like this: when you open a file this file is known to the kernel as a vnode, and when you read or write to that file the memory associated with that is linked to that vnode. This works best with memory mapped files, like executables. When a page of memory has been read from, say, /usr/bin/cc, that page is part of the vnode memory list. After closing, the vnode still hangs around until it is needed again for some other file, then the memory is unlinked completely. In case of memory shortage, individual pages can be dropped to free memory. Should the file be opened again, the vnode gets reactivated and the memory still associated with that node does not need to be read back again. That cache is global, as the file system name space is global as well. When the cache was in the I/O system, that meant that memory had to be copied from the cache area to the system area. It just had not to be re-read from disk.

Unifying the cache with the virtual memory system, by usage of the vnode objects, increases performance. If my memory serves me right, all major unix systems have dropped the seperate I/O cache some time ago with HP UX being the last of the lot. But that is most likely not correct, corrections are welcome.
 
Very good explanation. Just to clear all my doubts, so before this restyling of the block devices there was a caching at the I/O level and the vnode cache, right? The two caches failed to synchronize or it was very difficult to synchronize them, right? This also means that a file system implementation is not getting any cache since it is done by the memory layer, is this correct?

Last question: what are the "two aliases device" the manual refers to?
 
fluca1978 said:
Very good explanation. Just to clear all my doubts, so before this restyling of the block devices there was a caching at the I/O level and the vnode cache, right?
No, the cache was
a: allocated at compile/boot time with a fixed size.
b: data needed to be copied from cache to "user memory" when needed

This design made the cache pretty simple in the code path and thus robust. Remember, no system is slower than a frozen one, and this goes to the power of PI for systems in a remote site where no one can easily press the reset button. Also, the early file systems were a lot less forgiving when they were not unmounted correctly.

This also means that a file system implementation is not getting any cache since it is done by the memory layer, is this correct?
This should be so, in the case of FreeBSD the memory management layer also provides the caching for msdos file systems and even NFS if my memory serves me right. Because all these file systems are using the same cache system, they coexist more gracefully than if each one had it's own caching, as this would lead to unpleasent results. For details, please refer to f.e. Tanenbaum and Hennesy&Pattersen.
Last question: what are the "two aliases device" the manual refers to?
Maybe they documentation was done "as usual", meaning not completely up to date. These two devices may not exist as such in current systems for quite some time.
 
Sorry to pick up a so old post, but sometimes things I thought I understood generate new doubts (this can be a sort of bug in my brain :p).

To recap, in FreeBSD any disk device is a character device, and the fact that the I/O subsystem does not make any caching make it more reliable and robust to system crash, that should sound like heaven for databases! However, and this is not meant to be a flame, why other popular systems provide not the same character logic interface? I mean, in Linux block device are marked as block device, and in OpenBSD there is even the double device (e.g., wd0 and rwd0). While OpenBSD does not use devfs Linux has something similar. While I understand that each operating system has its own way of thinking, the fact that there is so much disagreement on this subject makes me think that maybe I did not understand very well. Are these systems doing the same thing with different names or there are rationales behind these choices? Any good paper or documentation that handle this subject?
 
fluca1978 said:
Sorry to pick up a so old post, but sometimes things I thought I understood generate new doubts (this can be a sort of bug in my brain :p).
Not a bug - it is called scientific process ;)
fluca1978 said:
To recap, in FreeBSD any disk device is a character device, and the fact that the I/O subsystem does not make any caching make it more reliable and robust to system crash,
Not completely, it makes it more reliable by being smaller and simple, compared with the alternative. The low level code for the disc does no caching, the levels higher up do.
In case of file system payload, as explained before, the memory system does it. The file systems can cache metadata, but this is no way near the volume of file contents. The problems from different caches pushing against each other memory usage is not a problem there (if it is, you already have other problems far more obvious, like disc trashing).

Other systems do provide services of this kind as well, even when calling them differently and providing a different interface. The interface most likely was kept from the early versions (wd0 / rwd0 f.e.) but what they do internally pretty much converges on the same way to do things.

As for papers, I would need to come back later (maybe some days, no idea when I have time to dig in my ~/papers/ pile and then also find where the fluff I got that file from. Should you really be interested in such levels of OS design, reading Hennesy&Patterson will provide some relaxed evenings ;) I still do not regret buying it.

For starters, you may want to read this document which goes into some interesting details of FS design, esp. the when to cache and when not to cache points.
 
I will read the file system implementation paper, thank you very much.
At least now I'm pretty sure that all systems are doing the same thing the same way (or almost the same way) even if providing different interfaces.
And if someone else has some other good documentation that can help understanding this topic is welcome to join the discussion.
 
Back
Top