We will be using a distributed database spreading its data across multiple nodes (computers). Each node only holds a small part of the data, therefore to maintain data consistency creating a snapshot of the database would require creating a snapshot at the exact same point in time on every computer. In practice, I believe the database system would be resilient in case snapshots are done with a couple of seconds or minutes apart (less than 5 minutes). However, I would prefer not to rely on this assumption and make the distributed snapshot as atomic as possible since there are already enough things to worry about when one gets to the point where a backup is needed....
I was thinking that one option could be to have one central storage server for all database instances, so that each node uses a different dataset on the same server since ZFS makes it easy to snapshot multiple datasets simultaneously on the same machine. However this would sort of defeat the scalability benefits of sharding this database system, since the storage server would become a bottleneck in terms of I/O. I was thinking that maybe assigning one hard drive per dataset would address the problem, but I feel this goes against the intended use of ZFS since it relies on the idea that hard drives are abstracted in pools. By experience (generally speaking) I fear that going against the intended usage could end up being a rabbit hole where our life would become increasingly painful as we try to tweak the system to use it in a way it wasn't meant to be used. Also even if creating one zpool per hard drive to have one hard drive per dataset turned out to be perfectly fine and supported, other aspects of the single storage server could end up being a bottleneck (RAM memory, CPU, etc...).
And maybe more importantly, restricting ourselves to storing the data of all sharding nodes in the same data center would cancel many of the benefits of using this database system as it has seamless support for advanced replication scenarios such as data locality (replicating specific parts of the data in geographically distributed data centers to ensure that data is stored close to the user to reduce latency). So the best option for me is to figure out the best way to create or coordinate a ZFS snapshot on multiple machines to have the highest possible degree of atomicity while keeping the management of the system reasonably convenient. We need to treat every node as an independent commodity computer.
We are talking about multi-terabytes databases serving heavy workloads (including storing and streaming media files to web - this database system is optimized for that) with high-availability requirements. Thus the concerns with I/O and single points of failure in general. Initially we are talking about 3 nodes only, but this will rapidly grow to a couple dozens or hundreds nodes across several independent deployments and hundreds of terabytes of data. So we need a scalable approach that we will be able to manage using shell scripts as our needs scale.
The database system itself offers pretty good data redundancy and replication features so such ZFS features would not even be used. The reason I am considering ZFS here is mainly for backups using the snapshot feature as replication does not help when there are application bugs causing data corruption or a ransomware attack.
I was thinking that one option could be to have one central storage server for all database instances, so that each node uses a different dataset on the same server since ZFS makes it easy to snapshot multiple datasets simultaneously on the same machine. However this would sort of defeat the scalability benefits of sharding this database system, since the storage server would become a bottleneck in terms of I/O. I was thinking that maybe assigning one hard drive per dataset would address the problem, but I feel this goes against the intended use of ZFS since it relies on the idea that hard drives are abstracted in pools. By experience (generally speaking) I fear that going against the intended usage could end up being a rabbit hole where our life would become increasingly painful as we try to tweak the system to use it in a way it wasn't meant to be used. Also even if creating one zpool per hard drive to have one hard drive per dataset turned out to be perfectly fine and supported, other aspects of the single storage server could end up being a bottleneck (RAM memory, CPU, etc...).
And maybe more importantly, restricting ourselves to storing the data of all sharding nodes in the same data center would cancel many of the benefits of using this database system as it has seamless support for advanced replication scenarios such as data locality (replicating specific parts of the data in geographically distributed data centers to ensure that data is stored close to the user to reduce latency). So the best option for me is to figure out the best way to create or coordinate a ZFS snapshot on multiple machines to have the highest possible degree of atomicity while keeping the management of the system reasonably convenient. We need to treat every node as an independent commodity computer.
We are talking about multi-terabytes databases serving heavy workloads (including storing and streaming media files to web - this database system is optimized for that) with high-availability requirements. Thus the concerns with I/O and single points of failure in general. Initially we are talking about 3 nodes only, but this will rapidly grow to a couple dozens or hundreds nodes across several independent deployments and hundreds of terabytes of data. So we need a scalable approach that we will be able to manage using shell scripts as our needs scale.
The database system itself offers pretty good data redundancy and replication features so such ZFS features would not even be used. The reason I am considering ZFS here is mainly for backups using the snapshot feature as replication does not help when there are application bugs causing data corruption or a ransomware attack.