mmap and persistence

I am investigating memory-mapped files as a means for persistence in an application I'm writing. mmap provides an easy and transparent way to persist any data in memory. I am concerned, however, with inconsistent states resulting from partial writes to the disk in case of a system crash. Does anyone know about a solution to this problem? Thanks in advance.
 
msync forces a write to the disk, but does not give any guarantees about completing it (like you indicated). I realized that a plain mmap approach will not satisfy my needs, but maybe someone knows about a library on top of it that adds some database-like properties?
 
Really, there is no way to avoid data loss in application level due to nature of modern file-systems(buffering). While you can not avoid data loss you can avoid data corruption by timing your syncs/flushes at the end of every session, this will almost remove chance of partial writes resulting in your data being corrupted.

What "database-like properties" do you have in mind? Specific task has specific tools.
 
I don't mind some data loss as long as I can roll back to a previous version of my data/file that is consistent. The main problem for me is avoiding inconsistent states. So what would be perfect is some way of snapshotting a file, then let mmap and the OS do its magic, and in case of trouble roll back to a previous snapshot. Maybe I'm asking for too much. :-)
 
You can always have two file maps, one MAP_PRIVATE and second MAP_SHARED on the same file descriptor. The private map will never be flushed back to file system while shared map will be flushed by the kernel periodically or on unmap/manual sync. So you can do heavy I/O on the private map without doing any change to the file and only move data between the maps when you need to synchronize. This however will not guarantee corrupt free file if system goes down while you are syncing.
 
I came up with something along those lines. I could open a file using MAP_PRIVATE and then mprotect the memory for writes. I could then catch the signal if there is a write and maintain a second file containing all the changes in the pages. Then, every once in a while, I could write back all the changes to the main file. The advantage is that I could save the original contents before the write so that I could roll back in case of trouble. This approach sounds complex though, but no one said it was going to be easy. ;-)
 
Back
Top