Wednesday, April 22, 2009

Balancing Security and Productivity – Part 2 of 4

Chat Applications and Boundaries

Many companies are looking to real-time communication tools like instant messenger and other chat applications to enable staff to communicate real time, either internally or with external customers or partners. These tools can enable staff to be very efficient at communication and issue escalations, but the risks of information being shared incorrectly, or not properly archived present a risk that should be evaluated.

  • Internal-only – Internal only chat solutions provide staff the ability to quickly communicate internally, while limiting the change of accidental exposure of customer data outside the company. What internal-only chat solutions lack is the ability to communicate real time with customers or partners. By eliminating this capability, staff could have to use other, more time consuming solutions for external communication.

  • Internal and external – By providing staff with the ability to chat real time both internally, and externally they are enabled to communicate real time with customers, partners and other outside groups that contribute to the bottom line. The potential risk is a staff member could send an incorrect file, or cut/paste incorrect text into a chat window and reveal company proprietary data to an external entity.

  • No-chat – On one end of the extreme is to block all real-time chat communication, limiting staff to communication using standard email or phone conversations. While this can ensure no company sponsored tools are used for external communication, todays tech-savvy employees will often attempt to circumvent this limitation and use their own tools, potentially creating larger security implications because of non-centralized management. While eliminating chat applications can contribute to a more secure environment, the potential effect on employee productivity can be negative.

  • Compliance – Compliance is the other large factor for chat and other instant messenger type applications. Compliance can include a variety of items include detailed record keeping, legal documentation of discussions and industry-standard policies for data storage and handling. Most chat applications offer the option of storing an archive of all discussions, this feature should be evaluated against compliance requirements to ensure that necessary records are kept and unnecessary information is purged.

File Storage Locations

Storing of company files, including email archives, customer communications and other company documents must be done in a way that files can be recovered if lost, but also to ensure that access to those files is only grated to those requiring access to complete their assigned job. Few companies have a consistent method for file storage and sharing; most companies have differing policies for each department. It is important that a company have a defined policy that becomes part of the corporate culture to ensure collaboration and exchange of ideas, as well as compliance for document storage.

  • Local – Local file storage is individual employees storing company documents on the computers and other devices they use for conducting company business. Local file storage presents a challenge in all facets of security because of a lack of an audit trail for file access, a lack of recovery capabilities if an employee accidentally deletes a file, a lack of a recovery mechanism for lost laptops and ultimately a lack of recoverability if an employee were to leave and take their laptop with them. While local only storage provides an individual employee with the easiest access to the files they work with regularly, the company as a whole has very limited visibility into that employees archive of company data.

  • Network Shares – Network shares provide a loosely controlled environment for storing files that individual staff members have worked on or created. Network shares provide minimal levels of recoverability because they can be backed up more easily then individual laptops and desktops, and they can also do minimal revision control. They do lack real audit capabilities for file access and updates and do not provide staff a formal method for communicating who is working on any given document at any given time. Because of the lack of real auditing paired with the lack of real capability around access controls, network shares are not a good long term strategy for a company that could have many documents to manage.

  • Shared Collaboration Sites – Shared collaboration sites are the most common method in companies today to share files and documents internally. They provide a very robust method for storing documents, managing multiple revisions and managing access controls for documents based on a variety of factors including need-to-know, manager approval, project participation and department ownership.

Operating System Usage

Many companies will evaluate a given operating system (OS) as part of a security review, when the actual OS in use is a very minor component of the equation. At some point in time a security vulnerability has been found in all major operating systems. The risk posed by these various vulnerabilities has much more to do with how the vulnerability is responded too then the actual OS with the vulnerability.

  • Staff Skill Level – Probably the most important topic when addressing what operating systems (OS) to use in any environment is skill set of the system administration team, yet it is often not looked at in depth. Staff are most efficient at administering operating systems that they are familiar with and have experience with. If new operating systems are introduced, the initial ramp up time to be proficient for staff can be on the order of months. During this time there is risk that best practices will not be followed and work could potentially have to be redone. When evaluating operating systems for a given environment, the time consideration for training staff with the necessary skills must be considered.

  • Patch Process – The process to install performance, security and feature upgrade packages differs very widely from OS to OS. This has significant implications to the security of a system, the longer it takes the administration team to install patches, the longer a vulnerability could be exploited. When reviewing new operating systems, the tools they offer for installing and managing patches should be reviewed to ensure that patches can be installed and tested in a timely manner.

  • Vendor Relationship and commitment – A vendor's commitment to a particular OS and application stack is critical to ensuring a secure environment. When reviewing operating systems for use in your environment, it is important to understand the vendors commitment to the platform; this has implications for the speed of patches being released, as well as the capabilities a vendor has for developing patches in a timely manner.


Tuesday, April 21, 2009

Lustre Users Group 2009

Last week we held the 2009 Lustre Users Group. It was a success; we had the largest user turn out ever.

All slides can be found here.

I did a presentation on Best Practices for the Sun Lustre Storage System, those slides can be found here.

Friday, April 17, 2009

Balancing Security and Productivity – Part 1 of 4

This is the first part of an ongoing discussion. The additional parts will be posted in the coming weeks.


An often challenging debate in any IT organization is the proper balance of security and productivity. Most organizations struggle to balance a loss in productivity for staff due to tighter security restrictions around passwords, data access, allowed applications, automated monitoring and threat detection. People at various levels within an organization will have differing solutions for balancing risk and ease of completing work for various staff. Every risk that must be understood for security changes has an associated cost, either in the cost of lost data, lost capability or bad publicity. On the flip side, every change made in the name of security and lowering risk could potentially lower employee productivity which can both affect output and have a cost, as well as affect morale if tasks become more difficult to complete.

In addition to evaluating risk for security policies and it's impact on staff and their productivity is assessing that impact across different staff with different duties at the company. Often times staff with more tightly controlled tasks are easier to limit impact for then staff that have a larger range of duties that may require off hours work, remote work or constantly changing duties and tasks.

With any activity within an enterprise, be it adding an application, adding a new mobile device or adding a new network connection poses a level of risk. That risk must be weighed against the benefits gained by adding that network connection. Take one of the most common tasks for an IT department; adding a new active network connection to someones office within a company facility. This activity has little risk associated with it because most often only staff will be in the area and able to physically use the connection. The benefit of this can be great by allowing an additional productive staff member, an additional printer for staff use or allowing faster network access then existing connections would allow. In this case this risk to reward balance is reasonable. Now take an activity that is just as common; installing VPN software on a laptop so that a staff member can connect to the company network remotely. What if this laptop is then lost and has company data on it? What if this laptop is infected with a virus that could infect other corporate machines? I intend to explore various trade offs that must often be reviewed by IT departments and the associated risks and rewards that go with each.

Passwords versus Tokens

One of the most common methods for increasing security within a computing environment is by eliminating one-time passwords and replacing them with a token based approach for non-reusable passwords. In this forum I call any authentication solution that provides a challenge response or requires an external token to be the alternative to standard passwords. There are several trade offs that must be considered for this approach to provide a high-level of assurance that accounts are only used by the designated owners:

  • Login Speed – Using tokens or other 2-factor methods for logins has the potential to slow down staffs' ability to login. If a staff member can not find their token for login that will slow down their ability to complete tasks. Additionally, the time needed to use a token is often longer then the time required to enter a traditional password from memory and be authenticated.

  • Seamless Integration – Integration company wide can pose a challenge for tokens and 2-factor authentication solutions. While much improvement has been made on this level with modern identity management tools, most firms still have a diverse range of applications and integration with all of them is often not possible. This leaves companies in a situation where they must decide which applications and tools make sense for token based authentication and which should remain password based.

  • Ease of Memory – Tokens often use a pin number that is shorter then common passwords. This shorted pin paired with a specific token that is time specific creates a combination of information that is easier to remember, and thus less likely to be written down by staff. This ease of memory of necessary login information can ensure a situation where staff passwords are

VPN versus Public Secure Web Sites

There are two primary methods for ensuring that company data is secure when being accessed by employees and authorized personnel; the primary method is to use web based applications that run over encrypted channels, the https protocol is the most common. Often times companies will implement a virtual private network (VPN) solution to further ensure that all data transmitted is secure.

The primary issue being discussed here is providing access to company applications to staff that are located in remote locations, this could be working from home, while on travel or via remote devices.

  • VPN Assurances – VPNs, when properly used can ensure compliance with a variety of company security policies around virus protection, password length and expiration and a systems patch status. These policies can ensure all hosts connected to the companies network are secure. The trade off is that VPNs are often difficult for users to utilize because of the time necessary to connect and the technical challenge in ensuring users can always connect to the VPN when necessary.

  • VPN Restrictions – While VPNs ensure that systems connected to the network meet compliance, they restrict an employees ability to login quickly and complete a task. If an employee needs access but does not have a company computer, a VPN only approach may limit their ability to use nearby computers to complete the task.

  • Availability of Web Based Applications – Web based applications that are encrypted and outside of company VPN infrastructure allow staff to connect in a secure fashion, regardless of who's computer they are using. While this does enable productive work to be done in more locations, it increases the potential that data or passwords could be compromised by keystroke loggers on non-company controlled machines.

Wednesday, April 1, 2009

Security considerations in a virtualized environment

Virtualization is becoming the standard method for consolidating large information technology (IT) environments down to less hardware then was once required. Because of the rapid increase in both processor performance, and memory density, paired with increased disk capacities, a single server can handle the load that it used to take many to accomplish.

This consolidation effort has presented multiple challenges, including:
  • Increased complexity of IT environments
  • Increased requirements for System Administrator's skills sets
  • Unknown quantities around security within virtualized environments
  • Increased need for processes to ensure compliance with applicable industry regulations
  • Increased need for executives to understand resource utilization and allocation across the environment(s)
  • Increased need for disaster recovery planning so that single hardware outages do not cripple an environment

I am going to talk primarily about the security aspect, and some mitigation techniques used with virtualization. Security is a difficult subject within virtualization because the topic is in it's infancy and because of that we are still learning the proper processes that are needed to secure virtual environments at the same level our traditional physical infrastructures are secured at. The introduction of hypervisors within an IT environment add a level of complexity to the environment, and create an entirely new tier where data access, user authorization and monitoring must be implemented to ensure security.

Lets also talk about the boundaries for our discussion and the definition of security I will use for the remainder of this posting. Security can mean many things to many different people. The boundaries for what falls within the realm of a security team within a company will also vary greatly from firm to firm. Security as I describe it is the actions and processes that ensure an individual can only access and modify data that management has approved them access too. This includes ensuring permissions and other configuration settings are only changed by those authorized, and private information is only accessed by those that management feel have a valid reason to access it.

Definitions
  • Physical Host – A physical server running a hypervisor and having one or more virtual machines active on it
  • Virtual Machine – A single running instance of an operating system (OS) sharing physical resources with other running OS instance
  • Hypervisor – The software layer that resides on a physical host and allows multiple concurrent virtual machines to effectively share the same physical resources
  • System Administrator – An individual with root or administrative level rights on one or more physical or virtual hosts
  • SAN Administrator – An individual with the ability to manipulate shared storage devices or switch configuration between shared storage and servers using that storage
  • VLANs – Virtual Local Area Networks, a method to logically partition a single physical network into multiple logical networks
  • LUNSs – Logical Units, a unit of storage exported from a shared storage device to one or more hosts

Now, lets discuss some scenarios that are specific to virtualization, and some techniques to mitigate these threats.

Administrators with full access to hypervisors
Probably the best known and most thought about security vulnerability within virtualized environments is the hypervisor and it's inherent access to the virtual machines above it. Most current virtualization solutions have a single root user at the hypervisor level with access to power virtual machines up and down, modify virtual machines (VM) boot parameters and gain console access to those VMs.

This type of model requires both a high level of trust for system administrators, as well good processes in place to ensure all changes are approved, properly tested and periodically reviewed by staff other then those responsible for making them. All administrators within a virtual environment should only have access privileges on systems required to complete their job, and systems that contain data they are authorized to see and handle. Management should implement audit policies to periodically review logs and ensure that all changes were approved, properly tested and meet all IT policies.

Console access to VMs
Most hypervisors by default will allow anyone with administrative rights on the host system to access the console for all VMs hosted on that system. This creates a situation where an unauthorized party could access the console of a system and perform password recovery activities, or see system output to the console.

Ensuring that administrators have the least amount of access to successfully complete their job is key to ensuring that console access is limited to those that need it. Often times, administrators will rarely need to access the console of a system because of technologies like remote desktop and remote shells for managing a virtual system. Modern hypervisors will allow permissions to be set so that console access is only given to those that are authorized. It is suggested this be enabled so that an administrator can only access the console for systems they are immediately responsible for.

Patches at the hypervisor level
The hypervisor within a virtual environment creates a single tier with essentially administrator level access to many more systems then the administrator would have before virtualization. This hypervisor layer has access too all VM data, the ability to power VMs up and down and the ability to see the console for all VMs on a single physical server. This hypervisor layer adds a single tier of access, that if compromised could create a path to easy compromise of many additional systems.

Ensuring security now requires additional levels of testing during the phase that was traditionally penetration testing. New applications must also include load testing from a security standpoint to ensure that new applications, if compromised would not affect the performance or response time of remaining applications. This all means that a security patch at the hypervisor level has much more sever implications then patches on individual VMs because of the increased threat.

Ultimately, the most important aspect with hypervisor security is ensuring that only those that require access to it, can connect to management tools. This means using host based and network based firewalls to explicitly allow traffic that is allowed and deny all other connections to the hypervisor for VM management. In addition to restricting access, companies should have an efficient process to test patches when they are released from the vendor to ensure they are implemented, particularly at the hypervisor as quick as possible to limit any windows of opportunity.

Complexity
Any addition of new technology, tools or features has the potential to add more complexity to an already complex IT environment. Complexity creates a variety of long term problems including making upgrades harder to manage, creating the potential for mistakes and configuration errors, creating the potential for one change adversely affecting other aspects of the environment, and most notable putting a higher workload on IT staff.

As virtual environments grow, testing and validating all processes becomes only more critical. The best defense to complexity is careful documentation that has been tested, and is continually updated to reflect changes in the environment or methods of management around that environment or the company as a whole. The more carefully things are documented, the more efficiently actions can then be automated, ensuring that the potential for human error is further removed. By automating processes around auditing, patch testing, software deployment and VM creation, IT staff can be left to focus on operational efficiencies, while ensuring that all systems will operate within the boundaries of company policy with minimal intervention.

LUNs Zoned to Hypervisor
It is common to utilize a SAN in todays virtualized environment to simplify management of data growth, movement of virtual machines and increase performance of backups. This use of a SAN creates a level within the hypervisor, that anyone with administrative access to the hypervisor can manipulate the LUNs destined for virtual machines. This creates the potential for not only having people access data they do not have the need to access, but the potential that data is manipulated without proper authorization.

Properly encrypting data at the file system level will ensure that data is only accessed by authorized applications and users. Encrypting data ensures that only the authorized application and administrators can manipulate production data, this level of assurance also ensures that if any physical disks were to become unaccounted for, management can be assured the data will not be read by unauthorized parties.

Ability to power VMs up and down
Virtual machines share an underlying management infrastructure and physical machine infrastructure. This creates the potential that a rouge system administrator or staff member can cause harm to one segment of the infrastructure, simply because they have access to another. Having a shared hypervisor creates the potential that if the administrator account is abused, systems can be stopped, started and rebooted at unexpected time.

Critical services should not be hosted in virtual environments. This will ensure an added layer of protection for things like LDAP, Kerberos, Active Directory, DNS and critical web servers. By hosting these critical services on dedicated virtual machines, you ensure that security problems within the hypervisor environment, or rogue staff do not cause harm to the services that are most critical to the stability of your enterprise.

Staff accounts with permissions to power up and down VMs should be closely monitored and restricted to only allow access to the systems an administrator needs to access to complete their job. This limiting of access will ensure that if an account is abused, the damage it can incur is limited in scope.

Shared networks on physical machines
Companies often times will use VLANs as a way to separate systems based on usage, security risk, data type and physical site. This reliance on VLANs often times extends as far out as the firewalls at the edge of a corporate network. When using virtual machines, there is the added risk of having multiple virtual machines on a single physical machine that require separate VLANs to function and adhere to existing network policies. Mistakes with initial virtual machine setup, as well as system compromises can create a situation where VMs add unexpected paths between networks.

When initially planning the use of virtual machines, it is vital to include the staff responsible for both security, as well as network routing and switching implementation. They can provide valuable insight into the reasons for using VLANs or other network separation techniques. By including them, you can review what physical systems will house what virtual machines, and if network changes will be required to ensure security is not compromised and unexpected paths are not created between separate networks.

Implementing a new VM
Implementing new virtual machines has an inherent risk in both the threats posed by any new applications, but additionally the necessity to manage and patch an additional host within the environment. Every new virtual machine is a full OS that could potentially compromised, or otherwise used to launch attacks on your network, or others' networks.

A toolkit should be implemented before any virtual machines are activated that is used for two primary purposes:
  1. Penetration Testing on new systems – All new hosts should be properly tested to ensure they meet company security policies. This testing process should include a review of running services, a review of host level firewall policies, a review of active system accounts and passwords and finally, ensure the system is integrated in with corporate monitoring and patch management tools
  2. Patch management and monitoring on all systems – A corporate wide patch management suite should be used and inclusive off all virtual machines. This centralization will ensure staff are aware of all virtual machines that are active, and aware of systems that are not up to date on security patches. More advanced tools can also provide staff with the ability to quickly audit systems for other security policies like password length, password expiration and firewall policies.

All virtual machines should be retired as soon as they are no longer needed. This removes the overhead on staff of managing the system, and removes the risk of having the system sit potentially unmonitored and used. Virtual machines should be considered the same as the sprawl of old, unused physical servers, and removed as soon as practically possible.

Application layer vulnerabilities
Ultimately a server is only as strong as it's weakest active service, and most often servers are compromises not because of a lack of OS patches, but because of failed application implementations or configuration errors. VMs are vulnerable to this same risk around application level security problems. Virtual machines have the added risk of being compromised that if their load increases, they put other virtual machines on the same physical infrastructure at risk

Boundaries should be enforced across all tiers of an infrastructure; storage, physical systems, network connections, management tools and applications. An application is an extension of the OS from a security perspective, and an applications residing on a physical system via virtual machines should have similar security characteristics including risk, data classification and company policies.

Externally facing VMs
The location and use of VMs must be closely tracked. If a physical host has VMs with both internal access and access from external users, the threat of outside attacks affecting internal resources increases dramatically. Any VM on a single physical host is vulnerable to a host of threats because of the other VMs it shares physical resources with.

By working with the networking and security teams before implementing virtual machines, system administrators can ensure that physical hosts only host common virtual machines, grouped by access levels, data classification and risk. Most companies do not cross network boundaries with virtual machines. Separate physical machines will be places in each separate security environment to host virtual machines for that security and access level.

Audits and Tools
Auditing is a critical function in all IT environments. By properly auditing an environment, administrators can be notified to problems before they become serious or data is potentially compromised. A solid audit trail is often required by outside firms that may certify a companies ability to house or process certain types of data. Auditing is an entire topic on its own, but some common items to monitor and alert in a consistent fashion are:
System level logs from all hosts, both physical and virtual
Monitoring network traffic for unexpected changes to typical traffic pasterns
Logging of all manipulation of VMs including console usage, powering on and off of systems, installation of patches and changes to configuration files
Changes to storage configuration that could include LUNs, zoning or encryption characteristics

Security within a virtual environment has the same underlying principals as the traditional physical environments we are accustomed too. Least access must be ensured so that compromised accounts or rogue staff have a limited amount of damage that can be caused. Process is the most important way to ensure access is limited in a way that staff can successful complete their job, yet not access resources they do not have an immediate need to work with. Clear process can ensure new systems are thoroughly tested, reviewed and put into service, and then managed for the life of the application or host. Staff are more effective at overall administration if consistency is ensured across an environment.