Other How to plan racked storage for the long term?

This community has helped me out in the past, and I've had good success with our 15 TB system so far. So I'm hoping you will grace me with your insight once again.

I'm working on a project where the data is going to be ingested at a rate of about 250 TB in the first year, and possibly up to 500 TB in year two, but I'm guessing it will steady out between 300 and 600 TB per year (that's just how much we generate). It's de-identified clinical imaging data. As the images are generated, an extra copy will be mailed off to a collaborator. If we get all the approvals, we'll start sending it offsite by wire, but for now it's sneakernet.

We are looking at cloud providers of course, but, because it's clinical, at least the most recent couple years need to be available at high speed, preferably without choking our inbound internet during peak hours.

I can put in at least a couple of racks. We have the cooling, power, and floor space for that. How does one even spec out a job like this? What parameters do people negotiate over? Does OSHA impose a limit on overhead storage?

In short: what the hell am I getting into?
 
I'm working on a project where the data is going to be ingested at a rate of about 250 TB in the first year, and possibly up to 500 TB in year two, but I'm guessing it will steady out between 300 and 600 TB per year (that's just how much we generate).
How does this data need to be organized and accessed? Is a bunch of separate servers acceptable or does all of the space need to show up as a single (huge) pile? If the former, you can purchase prebuilt storage appliances from places like iXsystems or build your own. My "homebuilt" RAIDzilla 2.5 systems provide 85TB usable storage in a 3RU form factor. Systems like the BackBlaze Pod give you lots more capacity at a lower price, though I don't particularly like the design (it's fine for the use Backblaze puts it to, less useful as a generic storage device). If you need all of the data to be accessed via a single server, then you're looking at fancier solutions, including those from (expensive) name brands.
We are looking at cloud providers of course, but, because it's clinical, at least the most recent couple years need to be available at high speed, preferably without choking our inbound internet during peak hours.
Even disregarding the cost of bandwidth to upload/download your data, I've found that the point where it is cheaper to have your own on-site storage for large data is less than 2 years (comparing the one-time cost of servers + drives and the monthly power cost vs. the monthly cost for cloud storage).
I can put in at least a couple of racks. We have the cooling, power, and floor space for that. How does one even spec out a job like this? What parameters do people negotiate over?
It isn't just rackmount systems. There's power / cooling / management / etc.

A bunch of folks on this forum do this sort of thing for a living, including me. I'm in the NYC area (US) if that's relevant to you.
 
Back
Top