Yeah, but the two logs (database write-ahead log and ZFS ZIL) are not guaranteed to be in synch.
No, they are completely independent. One builds upon another.
They could be in different devices altogether. Suppose the database WAL has a checkpoint at transaction X. The database on restart will rely on all transactions up to and including X having been committed to the database data files. However, if transaction X was actually only partially applied, there could be data consistency problems like missing foreign keys.
Hm. One would need to look into that and understand how the checkpoints are actually done and how it is decided which were done completely.
These problems might not even manifest right away. It could be that nothing bad will happen until someone actually tries to access the table with the missing foreign key, at which point the database system is likely to crash with some assertion.
Recovery from that could be very, very difficult. You'll have to hope that you can identify transaction X somehow, and that you've archived enough WAL logs to be able to re-apply X and all the transactions that followed.
Silent corruption is an ugly thing. Otherwise I am very relaxed: if a database does not recover, I'll restore from backup and then apply all the logs that are there - and then I will see where and why that fails, and create another bug report.
This is indeed another gotcha in the whole logic: it expects that all the stuff works bugfree.
And there is yet another: usual SSD are three (or more) level cells, so if a cell gets erased to store another bit, there may be two more bits of old data that need to be preserved. What if a power failure comes in-between? We rely on the embedded SSD controller to employ some algorithm so that this can never happen. And we don't know if that is bugfree.
Redo log is only replayed since the last checkpoint. Checkpoints typically happen after a transaction has been committed. Same problem if the checkpoint transaction got corrupted by the ZIL failure.
That is interesting. Checkpoint happens, and writes lots and lots of data. At some point it is completed, and then there must be a flush, before the WAL can be dumped. What if that flush gets lost... I would assums, if the flush is lost, then the WAL dump is also lost, and how would the database at restart know that this checkpoint was ever completed?
That would still need some logical verification. But then, before I start to buy expensive enterprise SSD, I would rather get a diesel. I cannot know what's inside the SSD logic, but I can make sure that a diesel works - and I love heavy metal...