UFS Need to know if I need to cluster?

hockey97

Well-Known Member

Reaction score: 3
Messages: 290

Hi, I am trying to figure out a way to manage my storage needs. For example, If I host 500 websites which takes about a total of 200 terabytes of data. I need storage for the actual websites and then a backup storage area. So, I need a total of 400 terabytes of storage.
I plan on using the server itself which will have 20 terabytes of data and will need to back that data up too.

My question here is why would someone use a cluster? I was told to solve my problem I need to cluster NAS servers. I know what clustering is but the thing is that others have told me you only clusters servers if you need processing power. I was told most don't cluster storage servers. They mostly clusters servers to act as one and it's usually for scientists and engineers to process complex algorithms for their fields of study.

So, I am not sure if clustering is the key to solve my problem or should I use servers separately.
I have 2 websites right now but one will setup websites configs on my server and will store the files in a folder. Right now it's setup to store all websites information on my server. However, I would like to migrate after a while to storing the website actual files on a NAS server or off the server. I have seen NAS hard drives and thought I could use that instead but someone told me that it's not efficient and that I need to use a NAS server.

If I use a NAS server how do I not break the storage links? What I mean by this if my website stores lets say network drive A. If the NAS server has like 500 terabytes of storage by using multiple storage drives. Is all those drives considered A? Or would there be 500 separate drives? Or is it both possible depending on how you set it up?

The thing is if drive A get full...meaning maxes out 500 terabytes and lets say I attach another NAS server with another 500 terabytes can I still have the second NAS server be part of the first one? So, I wouldn't need to keep manually changing the directory to store data to the new locations?


What I am looking here is a solution where I need less manual labor to update the network. If I need more storage I can easily buy the stuff and just attach it to the network and it will automatically configure itself. I don't want to keep spending hours setting things up.
The reason is that I want to focus more on my websites rather than administrating it. I don't want to spend most of my time making sure my servers have enough space to keep my website up and running. Same goes with processing power.

Pretty much how does the ISPs and web hosting providers handle their infrastructure?? Do they use NAS servers for storage? Do they cluster everything? Or is clustering strictly meant for the research type facilities to process complex math problems?
 

Terry_Kennedy

Aspiring Daemon

Reaction score: 303
Messages: 932

Hi, I am trying to figure out a way to manage my storage needs. For example, If I host 500 websites which takes about a total of 200 terabytes of data. I need storage for the actual websites and then a backup storage area. So, I need a total of 400 terabytes of storage.
I plan on using the server itself which will have 20 terabytes of data and will need to back that data up too.
That is a lot of storage. You say this is for web sites, so you need a fast connection to the Internet (unless it is for a company intranet or similar). At the speeds you probably need, you should probably look at colocation services in order to save money on data circuit charges. A good "carrier neutral" colocation facility will have a number of providers offering Internet access and you can select one or more based on price, quality, etc. Note that in a colocation facility you will be paying for space and power (which is often "marked up" to well over utility rates due to the site having UPS systems, generators, etc.). But what you save on data circuit charges should more than offset those costs - but you need to select equipment based on size, power usage, etc.

Note that simply duplicating the data is not backup. Having another copy in the same building (or perhaps even the same city) is not backup. Neither is RAID by itself. More on this below. Tape may be a workable solution for backups - LTO6 holds 2.5 TB/tape (up to 6.5 TB compressed). There are robotic libraries that hold from 8 to several hundred cartridges - you then ship the cartridges to an off-site storage facility.

If I use a NAS server how do I not break the storage links? What I mean by this if my website stores lets say network drive A. If the NAS server has like 500 terabytes of storage by using multiple storage drives. Is all those drives considered A? Or would there be 500 separate drives? Or is it both possible depending on how you set it up?

The thing is if drive A get full...meaning maxes out 500 terabytes and lets say I attach another NAS server with another 500 terabytes can I still have the second NAS server be part of the first one? So, I wouldn't need to keep manually changing the directory to store data to the new locations?
Whether you need a single monolithic piece of storage depends on what you're doing. For multiple web sites, you should be able to use a collection of network storage, since a single web site probably does not need access to all 200 TB of data.

You can either use traditional NAS devices (file-level storage) via NFS or you can use block-level storage via a protocol like iSCSI. FreeBSD offers HAST - Highly Available STorage - if you need that sort of capability.

What I am looking here is a solution where I need less manual labor to update the network. If I need more storage I can easily buy the stuff and just attach it to the network and it will automatically configure itself. I don't want to keep spending hours setting things up.
"Easily buy the stuff" does not really describe storage at this capacity. You are looking at specialized hardware with a potentially long lead time (at least compared with ordering a desktop or notebook system online). If you want all of the hardware to have interchangeable parts, you need to deal with a supplier who will guarantee that the hardware will still be offered for sale after a period of time (like a year at least), or you'll end up with a hodgepodge of different hardware.

The closest to "set it and forget it" would be something like Amazon Elastic Block Storage. But for reasonably large amounts of storage, that will be more expensive than purchased storage (buying is cheaper than renting).

There are companies that specialize in making easy-to-use (and manage) storage servers. Probably the one that most people think of is EMC. Then there are the usual "big brand" players - IBM, Dell, HP who can offer you hardware and support for storage as well as servers. And there are other companies that specialize in storage, like NetApp. There are also companies that offer FreeBSD-based solutions, such as iXsystems. You are likely to experience "sticker shock" when you price 200 TB of storage.
Pretty much how does the ISPs and web hosting providers handle their infrastructure?? Do they use NAS servers for storage?
Not all ISPs have huge amounts of storage. It depends on what their focus is. Backblaze is in the cloud backup business and they are quite open about what they do - look at the link. Several vendors provide generic versions of the Backblaze Storage Pod if you want to use their hardware design. They have gotten the price down about as low as possible (by trading cost for features they don't make use of).

You might want to read my article about my RAIDzila II file servers to get an idea for what's involved in "roll your own" storage. I also address a number of other points, like backups and dealing with hardware faults.
 
Top