What Clustering technology for HPC

Hi,

Newbie here. Please recommend a good clustering software for HPC, given that I have a bunch of HP DL380p 300GB RAM, 20 cores. I have InfiniBand as well. FreeNAS(TrueNAS?) for iSCSI shared lun's. Please don't hesitate to ask any questions. Thank you.
 
My knowledge of the general tools is a bit dated, as I've been (too?) specialized in the storage layer for over a decade. There are several things you need:
  • First an installation / update mechanism, to keep all your machines in sync, and to automate system administration.
  • Then a mechanism for starting parallel jobs, usually with a batch and queuing mechanism. In the really old days there was free software called Condor or DQS; then people used a commercial product called LoadLeveler (used to come from a small company that was later acquired by IBM). This comes with a parallel shell which allows you to either (a) execute a command on all machines at the same time (very useful for system administration, see above), or (b) execute a command on the most lightly loaded machine immediately.
  • You need to think about the storage system. The traditional solution (which makes life easy) is to use a cluster file system, so all machines see the same file system. The easiest thing to do is to use a cluster file system that allows parallel IO (multiple clients can read AND WRITE the same file at the same time). Solutions for that include GPFS, Ceph, Panasas, and Lustre. For small clusters, having an outboard NFS server may actually be sufficient (it pretty much has to be NFSv4 with caching). What I've seen more recently is that each node has a local file system, and that all IO is done on an individual file basis to a (set of) file servers.
  • You probably want your code to run really fast (that's the whole point of HPC). One step is to replace the standard compiler with ones from the Portland Group.
  • Some user workloads are "embarrassingly parallel", meaning each node (or core) runs one program that reads from an input file and writes to an output file, with no interaction between nodes. Others require lots of interactions between nodes. For the interaction control, it is common to use a software package called MPI. It also comes with a component MPI-IO that organizes the file system usage.
  • In many cases, people forego much of the above complexity, and use Hadoop as an integrated package that covers much of the above territory.
  • And these days, IO is often done by reading from databases (instead of reading flat files). Which then opens up Pandora's box of setting up and using a parallel database.
These days, many HPC users forego the complexity of doing all of the above, and go with commercial solutions. That may sound expensive, but if you consider the personnel costs of doing all of the above (setting it up, and maintaining it), buying or renting a canned solution is quite competitive. All the major cloud providers have products targeted at HPC. And if you want to have your own hardware, you can get a variety of service providers (for example IBM) to administer them for you.
 
Thanks ralphbsz cracauer@ Much appreciated. I'm looking to hone up my job skills. The job I am targeting is at Fermi Labs in IL.
I need to learn a generic HP clustering technology and file system that I can launch myself from. It might sound strange but I'd be interested in giving developers access to the HPC and tuning it to their requirements...in short, emulating a production scenario.
 
Back
Top