Thursday, May 21, 2009

Understanding Lustre Internals

Lustre can be a complex package to manage and understand. The folks at ORNL, with assistance from the Lustre Center of Excellence have put out a wonderful paper on Understanding Lustre Internals.

I recommend that all Lustre administrators read it, it is very useful information for understanding how all the Lustre pieces plug together.

Tuesday, May 5, 2009

"Cloud" and HPC?, Huh?

I have tried for the most part to not post on this phenomenon known as "cloud computing." "Cloud" is still evolving and as such has many different meanings. The reason this whitepaper caught my attention is it's attempt at connecting high performance computing (HPC) with "cloud computing." The way I see it, "cloud" is still more of an evolving idea then a true product. True, many companies are offering "cloud" products, but the standards are still evolving, as is the true meaning of "cloud computing."

In my mind "cloud" is the next logical evolution of computing - better resource management through enabling applications to better communicate with their supporting infrastructures (servers, storage, network, cpu and memory resources) to allow applications to have the intelligence to scale up and down based on demand. "Cloud Computing" also has a valid connection to outsourcing in the sense that shared infrastructures will at some point over take privately managed information technology (IT) infrastrucures that are common today.

There are several points about the above listed whitepaper from UnivaUD that caught my attention:
  • MPI was only mentioned once. The Message Passing Interface (MPI) is the standard on which most HPC applications and platforms are built. For a paper to truly look at the potential of outsourcing HPC to a "cloud" environment, an indepth review of MPI will need to be done to ensure the proper updates are made to handle the additional physical layer errors that could occur in a shared environment, as well as the added challenges of communication in an unknown environment.
  • There was very little mention of the actual applications that are common in HPC. Applications like Fluent, NAMD, NWChem, Gaussian, and FFTW are commonly used on clusters built in house to meet the specific needs of a given community. Moving those applications from these small, in-house envirronments will take time and review to ensure they are able to scale in shared environments, as well as properly handle the increased variation possible in hardware and configurations.
  • There was no mention of parallel file systems. This is a fundamental requirement of modern HPC environments. To truly move common HPC environments into the "cloud" a solution will be needed for data management and transfer at the high speeds required of todays applications.
In short, the above linked whitepaper is common of what I am seeing in the "cloud" space; lots of talk of the possible benefits around the use of shared environemnts. What we need to stop doing as a community is trying to associate all things IT with "cloud." I have no doubt that in time we will evolve to more use of shared resources - this has been occuring for quite a while with the migration to larger clusters within universities and national laboratories, as well as the ongoing outsourcing of email and specific applications - but as a community we need to ensure that each time we change how we do things for a given area of IT it is with specific goals in mind. Without those clearly defined goals we will not know if we were successful.

As time allows I hope to explore the above issues, particularly looking at alternatives for parallel file systems in environments that may have varying latency, and are distributed over various data centers.

Monday, May 4, 2009

Balancing Security and Productivity – Part 4 of 4

Proxy Internet Connections

Companies often look to proxy servers as a method to monitor and block harmful traffic from their networks. Proxy servers provide a gateway between company networks and outside networks to ensure that all connections are logged, filtered and denied per company policies. Proxy servers can present a challenge because they can often slow access for staff, and inadvertently limit access to sites that are authorized, but may initially appear unauthorized to the automated tools limiting access.

  • Open Internet Access – Open internet access is allowing staff unrestricted connections from a corporate network to the outside world; these connections are free from any proxy servers, bandwidth restrictions or other traffic filters. While this can allow for maximum ability for the staff to conduct their jobs, the question must be asked, is this too much access? When a network allows that level of connectivity going out, there is inevitable risk that confidential information could be transmitted out of the company with little or no record of the event.

  • Limited Internet Access – Outside access can be limited by a variety of methods including blocking specific ports, utilizing proxy servers or utilizing other network traffic monitoring solutions. When used correctly, these tools can not only prevent company confidential information from being inappropriately transmitted outside the company, but they can also provide a solid audit trail in the event an investigation is needed. The trade off is that staff's performance will be affected by possible slowdowns due to the overhead of the tools as well as the potential that the traffic being blocked or targeted does have a requirement for conducting business and an employees productivity will be affected adversely.

In part 1 of this discussion we asked the question; how balance allowing employees to access company data with a personal device that connections to proprietary company information? The answer will ultimately be different for every company. But there are some common criteria that will be consistent across all solutions:

  • Consistency of security policies - It is critical that just when a staff member is using a personal laptop, the security policies are not being compromised for this benefit. This means that personal systems must adhere to the same policies for storage of company data, use of virus scanning applications and use and storage of company passwords.

  • Centralization of storage – By utilizing central, company controlled storage, it allows the information technology (IT) department to ensure all company data is regularly backed up, archived and available in the event of laptop or mobile device loss. There are many tools on the market that can automatically replicate data from remote devices to a company managed data center. This ensures data is always available, regardless of the type of device connecting or ownership of the device.


Finding the proper balance of security and productivity is a complicated, dynamic process for both the end users and those forming company policies. Any company today must ensure that they have the proper IT resources at their disposal to do their job and that those tools are open enough for staff to utilize in the most efficient way, but closed enough that propriety or otherwise confidential data is not put at unnecessary risk. All risks have a potential downside and all functionality has a potential benefit, both of which can be expressed in dollars. It is important to ensure that the balance of that risk and benefit is on the side of benefits, and that the risk is not so great as to cause harm to your company.

Friday, May 1, 2009

Balancing Security and Productivity – Part 3 of 4

Database Encryption

Often companies will encrypt data stored within a database. This ensures that data is secure from simple eavesdropping by requiring a key to manipulate or view the data.

  • Encrypted Databases – Encrypted databases are becoming more common, either encrypted in their entirety, or portions of the database that are particularly sensitive. While encrypted databases to provide a lot of protection to unauthorized users, they do potentially provide slower access because of the additional CPU time needed to decrypt the data for use. Encrypted databases also pose a hazard for data loss in the event the keys necessary for data encryption and decryption are lost or otherwise must be regenerated.

  • Non-Encrypted Database – Standard databases are most common today, essentially databases that store the data in traditional ways without encryption. The risk they pose is that if the clients of the database are compromised, or backups of the database are compromised it is quite trivial to read the data contained in that database, which could contain personal information like user names, passwords and addresses. While traditional, non-encrypted databases can scale much larger because of the lower CPU usage, they do have significant risk to data compromise.

Device Ownership

Device ownership is often a big topic of discussion, especially within companies hiring younger workers right out of college. Individuals will often get very comfortable with a platform while in school and expect to be using that same platform when they enter the workforce. When they later find out that their employer has a different OS or brand of laptop, employees will often use their personal devices for company business.

  • Company Devices – From a security standpoint, company owned devices are the most secure option, but at a cost. Employees will be less productive if they are forced to use a platform they are uncomfortable with or new too using. Company owned devices ensure that the company can recover the device should an employee leave and ensures that all software being used is licensed, virus free and properly monitored by corporate IT staff.

  • Personal Devices – While personal devices can allow workers to be more productive and comfortable with their operating environment, it comes at the cost of very decentralized IT management. Personal devices may not necessarily be covered by corporate software licensing agreements, and may not be kept up to date for security patches per company policy.

  • Combination – Most firms have settled on a combination of allowing personal hardware, but putting policies and tools in place to ensure it is managed by a centralized IT organization. This ensures that staff can have the tools they a are most familiar with, but data integrity, security and virus scanning is updated as company policies evolve.

File Transfer Policies

All companies have the need to transfer files, both internally and externally for review, collaboration and company communication. These documents present a risk to the company because confidential information could inadvertently be sent to unauthorized parties.

  • File Attachments to Email – Attaching files to email has several risks including a large need for capacity in the mail servers to handle the volume of traffic, as well as the potential that files could be inadvertently sent outside the company. While some modern email systems have the ability to scan out going email for specific content, this is often time consuming and can slow down the flow of communication.

  • Collaboration Tools – Limiting employee's ability to send files via email attachments is becoming much more common; as a solution to the need to share files, many companies are beginning to use collaboration tools like Trac, Twiki or Sharepoint. These solutions allow files to be stored internally, access to be restricted back and to ensure proper versions of files are available for those that need them, with out the risk of outsiders having email and attachments inadvertently forward to them.