Other database content: data revision control and history



Reaction score: 577
Messages: 1,199

I'm not sure if I did talk about that already earlier, search doesn't find a thing (but my search-fu isn't currently well developed, anyway).

With our code development, we have version control systems like SVN or GIT. They record all changes to a codebase, and provide us with the ability to (at least)
  • see a log of concerned changes in chronological order
  • display the differences between any two points in time
  • restore any historical version
These tools work upon a filesystem tree. They are not limited to code base development, but can work for anything that can be stored in a file tree: e.g. documentation projects, or system configuration files.

Now switch to SQL: when we store data in a database, we might have the same desire to retrieve those mentioned kinds of historical information. (Specifically when trying to store systems configuration data in a database, this gets quite relevant.) But that data is not present in the filesystem, it is stored inside an RDBMS, and therefore is not accessible for GIT or SVN.

This matter gets occasional attention at places like StackOv., and gets usually answered in a way, that there are tools available to log/archive/audit modified table rows. Which brings up another issue:

In SQL there is a thing called "normalized" form - and it is a good thing, because it means that you clean up your data structure to the most logical and consistent form. In practice this means that you will get not one table full of (possibly redundant) informations, but many tables with specific informations, which are all interrelated in some way.
And this is the end of those archiving/auditing tools, because they cannot be aware of the relationships: they may still create audit logs of all the specific changes that have happened, but there is no way of getting an intellegible historical timeline or diff out of that, even less to restore the whole structure to a certain point-in-time.

What we certainly can to is restore the whole database to a certain point in time - the RDBMS usually provides for that. But that is not what is desired; usually only a specific object in a table plus all it's adjacent information in dependent tables should be restored/copied as it was at some point in time. (The RDBMS itself would have the necessary knowledge and tools: it does it's own revision management; but this is only used internally to keep the database consistent and manage transaction isolation levels. It is not accessible to the user.)

I am currently trying to devise a solution. Comments, inspirations, strategies etc are welcome, as usual. :)