Solved ZFS Storage Seperation

I have data that I want to be as separated as possible.

Currently my thought was to just create new pools each time I acquired a chunk of data. I realized this could potentially not be the correct approach if ZFS datasets are logically separated. Then it would make a lot more sense to just create new datasets anytime I acquired a chunk of data that I wanted to be separate. This data has to be separate for legal reasons.

So the gist of my question is does it make more sense to create separate pools or just create a single pool and create multiple datasets that logically separate the data. Really I guess what I'm asking all hinges on if ZFS datasets data touch between the different datasets.
 
What does "separated" even mean in this context? I could be glib and say "put it in two different files, that's already separated enough, the bytes from one file won't leak into another file". And you could reply with a glib answer "how do I know the 1 bit I put into the file wasn't silently switched with one from another file?", and we would both laugh. The problem is: you said "legal reasons", and lawyers tend to not have a sense of humor in their professional life (in private, they're perfectly human and funny).

So what kind of separation do you want? Permissions, access control, generally thought of security? Performance isolation? Space quota? Backups and snapshots at different times? What are you really worried about, and what type of actions do you need boundaries against?
 
The context that I can provide here is that there for different clients. I think my answer is to separate them into different pools. That way they are physically separate all data being stored on different disks.
 
Verify with someone in your company that's familiar with those legal requirements. We store a lot of sensitive data, I work for a governmental department, so you can imagine the rules and regulations we have to follow. Sometimes just making sure CustomerA can't access data from CustomerB is enough (file level access permissions), other times we have to build an entirely separate network for a certain application and its data. And we have various solutions in between. It all depends on the sensitivity of the data and who needs what kind of access.
 
The context that I can provide here is that there for different clients. I think my answer is to separate them into different pools. That way they are physically separate all data being stored on different disks.
Why? Are you worried about client A reading data that is owned by (intended for, controlled by, ...) client B? In that case, pools don't help any. Access control (permission setting, organizing users and groups) helps. As does encryption: if client A encrypts their files with their private key (that only they have access to) before storing, and client B does likewise, then they can't read each other's data.

The same goes for separation as far as writing is concerned. Pools or dataset don't prevent client A from doing "rm -R /home/client_b/*", once they are logged in. Permissions do.

Are you worried about performance interference? For example, client A might measure how busy (loaded) the whole system is, by measuring the latency of certain requests. Then if the system shares any performance bottlenecks between clients A and B, client A can see whether any other clients are using resources. You said you don't want to share disks between clients, but are disks your only bottlenecked resource?
 
'Separate' from a legal perspective usually boils down to "not residing on the same machine".
That is an over-generalization. In some cases it does. As SirDice already said, sometimes it even means not on the same (sub-)network, or not connected via network at all, or all sys admins carry an assault rifle on their back at all times. In other cases, access controls are sufficient: if you set permissions (or the moral equivalent on modern larger systems) so client A can not read or modify data of client B, that can be completely sufficient. In large production systems, end-to-end encryption is often used as a means to safeguard data privacy between clients in a multi-tenant system, but it is not a hard requirement. As an example, look at the GDPR: It talks about "appropriate technical and organizational measures", not "the most extreme measures".

One of my favorite examples, directly related to the GDPR: How does data have to be destroyed when it is no longer needed? Maybe it just needs to be deleted (with the moral equivalent of an rm command, which makes it unreadable with normal file system means, assuming that normal users can't perform things like block reads or raw device access). Maybe it needs to be overwritten in place (if we're worried about block reads). Maybe (if the data was written in encrypted form), the encryption key for it needs to be forgotten (and one key caching period allowed to expire), so even though the bits remain readable, they can't be decrypted any longer. Maybe the disk drive needs to be physically destroyed (many large data centers have shredders or crushers on site for this purpose). Maybe the disk drive needs to be immediately locked against future access (using SED, part of the SCSI standard), and later shredded/crushed, to reduce the chance that a rogue employee steals the drive. All these can be sensible answers, and that shows you the range of possible policies.
 
Lots of helpful information here. Appreciate the responses, going to dig in deeper to see what the actual legal requirements are and as msplsh said take into account if a single pool goes down with data from multiple clients on it is that an acceptable outcome. My thought initial is no on that but like I said just need to get with other people in the company to identify the hard fast requirements.

Thanks everyone!
 
Back
Top