Tuesday, May 5, 2009

"Cloud" and HPC?, Huh?

I have tried for the most part to not post on this phenomenon known as "cloud computing." "Cloud" is still evolving and as such has many different meanings. The reason this whitepaper caught my attention is it's attempt at connecting high performance computing (HPC) with "cloud computing." The way I see it, "cloud" is still more of an evolving idea then a true product. True, many companies are offering "cloud" products, but the standards are still evolving, as is the true meaning of "cloud computing."

In my mind "cloud" is the next logical evolution of computing - better resource management through enabling applications to better communicate with their supporting infrastructures (servers, storage, network, cpu and memory resources) to allow applications to have the intelligence to scale up and down based on demand. "Cloud Computing" also has a valid connection to outsourcing in the sense that shared infrastructures will at some point over take privately managed information technology (IT) infrastrucures that are common today.

There are several points about the above listed whitepaper from UnivaUD that caught my attention:
  • MPI was only mentioned once. The Message Passing Interface (MPI) is the standard on which most HPC applications and platforms are built. For a paper to truly look at the potential of outsourcing HPC to a "cloud" environment, an indepth review of MPI will need to be done to ensure the proper updates are made to handle the additional physical layer errors that could occur in a shared environment, as well as the added challenges of communication in an unknown environment.
  • There was very little mention of the actual applications that are common in HPC. Applications like Fluent, NAMD, NWChem, Gaussian, and FFTW are commonly used on clusters built in house to meet the specific needs of a given community. Moving those applications from these small, in-house envirronments will take time and review to ensure they are able to scale in shared environments, as well as properly handle the increased variation possible in hardware and configurations.
  • There was no mention of parallel file systems. This is a fundamental requirement of modern HPC environments. To truly move common HPC environments into the "cloud" a solution will be needed for data management and transfer at the high speeds required of todays applications.
In short, the above linked whitepaper is common of what I am seeing in the "cloud" space; lots of talk of the possible benefits around the use of shared environemnts. What we need to stop doing as a community is trying to associate all things IT with "cloud." I have no doubt that in time we will evolve to more use of shared resources - this has been occuring for quite a while with the migration to larger clusters within universities and national laboratories, as well as the ongoing outsourcing of email and specific applications - but as a community we need to ensure that each time we change how we do things for a given area of IT it is with specific goals in mind. Without those clearly defined goals we will not know if we were successful.

As time allows I hope to explore the above issues, particularly looking at alternatives for parallel file systems in environments that may have varying latency, and are distributed over various data centers.

No comments: