This consolidation effort has presented multiple challenges, including:
- Increased complexity of IT environments
- Increased requirements for System Administrator's skills sets
- Unknown quantities around security within virtualized environments
- Increased need for processes to ensure compliance with applicable industry regulations
- Increased need for executives to understand resource utilization and allocation across the environment(s)
- Increased need for disaster recovery planning so that single hardware outages do not cripple an environment
I am going to talk primarily about the security aspect, and some mitigation techniques used with virtualization. Security is a difficult subject within virtualization because the topic is in it's infancy and because of that we are still learning the proper processes that are needed to secure virtual environments at the same level our traditional physical infrastructures are secured at. The introduction of hypervisors within an IT environment add a level of complexity to the environment, and create an entirely new tier where data access, user authorization and monitoring must be implemented to ensure security.
Lets also talk about the boundaries for our discussion and the definition of security I will use for the remainder of this posting. Security can mean many things to many different people. The boundaries for what falls within the realm of a security team within a company will also vary greatly from firm to firm. Security as I describe it is the actions and processes that ensure an individual can only access and modify data that management has approved them access too. This includes ensuring permissions and other configuration settings are only changed by those authorized, and private information is only accessed by those that management feel have a valid reason to access it.
- Physical Host – A physical server running a hypervisor and having one or more virtual machines active on it
- Virtual Machine – A single running instance of an operating system (OS) sharing physical resources with other running OS instance
- Hypervisor – The software layer that resides on a physical host and allows multiple concurrent virtual machines to effectively share the same physical resources
- System Administrator – An individual with root or administrative level rights on one or more physical or virtual hosts
- SAN Administrator – An individual with the ability to manipulate shared storage devices or switch configuration between shared storage and servers using that storage
- VLANs – Virtual Local Area Networks, a method to logically partition a single physical network into multiple logical networks
- LUNSs – Logical Units, a unit of storage exported from a shared storage device to one or more hosts
Now, lets discuss some scenarios that are specific to virtualization, and some techniques to mitigate these threats.
Administrators with full access to hypervisors
Probably the best known and most thought about security vulnerability within virtualized environments is the hypervisor and it's inherent access to the virtual machines above it. Most current virtualization solutions have a single root user at the hypervisor level with access to power virtual machines up and down, modify virtual machines (VM) boot parameters and gain console access to those VMs.
This type of model requires both a high level of trust for system administrators, as well good processes in place to ensure all changes are approved, properly tested and periodically reviewed by staff other then those responsible for making them. All administrators within a virtual environment should only have access privileges on systems required to complete their job, and systems that contain data they are authorized to see and handle. Management should implement audit policies to periodically review logs and ensure that all changes were approved, properly tested and meet all IT policies.
Console access to VMs
Most hypervisors by default will allow anyone with administrative rights on the host system to access the console for all VMs hosted on that system. This creates a situation where an unauthorized party could access the console of a system and perform password recovery activities, or see system output to the console.
Ensuring that administrators have the least amount of access to successfully complete their job is key to ensuring that console access is limited to those that need it. Often times, administrators will rarely need to access the console of a system because of technologies like remote desktop and remote shells for managing a virtual system. Modern hypervisors will allow permissions to be set so that console access is only given to those that are authorized. It is suggested this be enabled so that an administrator can only access the console for systems they are immediately responsible for.
Patches at the hypervisor level
The hypervisor within a virtual environment creates a single tier with essentially administrator level access to many more systems then the administrator would have before virtualization. This hypervisor layer has access too all VM data, the ability to power VMs up and down and the ability to see the console for all VMs on a single physical server. This hypervisor layer adds a single tier of access, that if compromised could create a path to easy compromise of many additional systems.
Ensuring security now requires additional levels of testing during the phase that was traditionally penetration testing. New applications must also include load testing from a security standpoint to ensure that new applications, if compromised would not affect the performance or response time of remaining applications. This all means that a security patch at the hypervisor level has much more sever implications then patches on individual VMs because of the increased threat.
Ultimately, the most important aspect with hypervisor security is ensuring that only those that require access to it, can connect to management tools. This means using host based and network based firewalls to explicitly allow traffic that is allowed and deny all other connections to the hypervisor for VM management. In addition to restricting access, companies should have an efficient process to test patches when they are released from the vendor to ensure they are implemented, particularly at the hypervisor as quick as possible to limit any windows of opportunity.
Any addition of new technology, tools or features has the potential to add more complexity to an already complex IT environment. Complexity creates a variety of long term problems including making upgrades harder to manage, creating the potential for mistakes and configuration errors, creating the potential for one change adversely affecting other aspects of the environment, and most notable putting a higher workload on IT staff.
As virtual environments grow, testing and validating all processes becomes only more critical. The best defense to complexity is careful documentation that has been tested, and is continually updated to reflect changes in the environment or methods of management around that environment or the company as a whole. The more carefully things are documented, the more efficiently actions can then be automated, ensuring that the potential for human error is further removed. By automating processes around auditing, patch testing, software deployment and VM creation, IT staff can be left to focus on operational efficiencies, while ensuring that all systems will operate within the boundaries of company policy with minimal intervention.
LUNs Zoned to Hypervisor
It is common to utilize a SAN in todays virtualized environment to simplify management of data growth, movement of virtual machines and increase performance of backups. This use of a SAN creates a level within the hypervisor, that anyone with administrative access to the hypervisor can manipulate the LUNs destined for virtual machines. This creates the potential for not only having people access data they do not have the need to access, but the potential that data is manipulated without proper authorization.
Properly encrypting data at the file system level will ensure that data is only accessed by authorized applications and users. Encrypting data ensures that only the authorized application and administrators can manipulate production data, this level of assurance also ensures that if any physical disks were to become unaccounted for, management can be assured the data will not be read by unauthorized parties.
Ability to power VMs up and down
Virtual machines share an underlying management infrastructure and physical machine infrastructure. This creates the potential that a rouge system administrator or staff member can cause harm to one segment of the infrastructure, simply because they have access to another. Having a shared hypervisor creates the potential that if the administrator account is abused, systems can be stopped, started and rebooted at unexpected time.
Critical services should not be hosted in virtual environments. This will ensure an added layer of protection for things like LDAP, Kerberos, Active Directory, DNS and critical web servers. By hosting these critical services on dedicated virtual machines, you ensure that security problems within the hypervisor environment, or rogue staff do not cause harm to the services that are most critical to the stability of your enterprise.
Staff accounts with permissions to power up and down VMs should be closely monitored and restricted to only allow access to the systems an administrator needs to access to complete their job. This limiting of access will ensure that if an account is abused, the damage it can incur is limited in scope.
Shared networks on physical machines
Companies often times will use VLANs as a way to separate systems based on usage, security risk, data type and physical site. This reliance on VLANs often times extends as far out as the firewalls at the edge of a corporate network. When using virtual machines, there is the added risk of having multiple virtual machines on a single physical machine that require separate VLANs to function and adhere to existing network policies. Mistakes with initial virtual machine setup, as well as system compromises can create a situation where VMs add unexpected paths between networks.
When initially planning the use of virtual machines, it is vital to include the staff responsible for both security, as well as network routing and switching implementation. They can provide valuable insight into the reasons for using VLANs or other network separation techniques. By including them, you can review what physical systems will house what virtual machines, and if network changes will be required to ensure security is not compromised and unexpected paths are not created between separate networks.
Implementing a new VM
Implementing new virtual machines has an inherent risk in both the threats posed by any new applications, but additionally the necessity to manage and patch an additional host within the environment. Every new virtual machine is a full OS that could potentially compromised, or otherwise used to launch attacks on your network, or others' networks.
A toolkit should be implemented before any virtual machines are activated that is used for two primary purposes:
- Penetration Testing on new systems – All new hosts should be properly tested to ensure they meet company security policies. This testing process should include a review of running services, a review of host level firewall policies, a review of active system accounts and passwords and finally, ensure the system is integrated in with corporate monitoring and patch management tools
- Patch management and monitoring on all systems – A corporate wide patch management suite should be used and inclusive off all virtual machines. This centralization will ensure staff are aware of all virtual machines that are active, and aware of systems that are not up to date on security patches. More advanced tools can also provide staff with the ability to quickly audit systems for other security policies like password length, password expiration and firewall policies.
All virtual machines should be retired as soon as they are no longer needed. This removes the overhead on staff of managing the system, and removes the risk of having the system sit potentially unmonitored and used. Virtual machines should be considered the same as the sprawl of old, unused physical servers, and removed as soon as practically possible.
Application layer vulnerabilities
Ultimately a server is only as strong as it's weakest active service, and most often servers are compromises not because of a lack of OS patches, but because of failed application implementations or configuration errors. VMs are vulnerable to this same risk around application level security problems. Virtual machines have the added risk of being compromised that if their load increases, they put other virtual machines on the same physical infrastructure at risk
Boundaries should be enforced across all tiers of an infrastructure; storage, physical systems, network connections, management tools and applications. An application is an extension of the OS from a security perspective, and an applications residing on a physical system via virtual machines should have similar security characteristics including risk, data classification and company policies.
Externally facing VMs
The location and use of VMs must be closely tracked. If a physical host has VMs with both internal access and access from external users, the threat of outside attacks affecting internal resources increases dramatically. Any VM on a single physical host is vulnerable to a host of threats because of the other VMs it shares physical resources with.
By working with the networking and security teams before implementing virtual machines, system administrators can ensure that physical hosts only host common virtual machines, grouped by access levels, data classification and risk. Most companies do not cross network boundaries with virtual machines. Separate physical machines will be places in each separate security environment to host virtual machines for that security and access level.
Audits and Tools
Auditing is a critical function in all IT environments. By properly auditing an environment, administrators can be notified to problems before they become serious or data is potentially compromised. A solid audit trail is often required by outside firms that may certify a companies ability to house or process certain types of data. Auditing is an entire topic on its own, but some common items to monitor and alert in a consistent fashion are:
System level logs from all hosts, both physical and virtual
Monitoring network traffic for unexpected changes to typical traffic pasterns
Logging of all manipulation of VMs including console usage, powering on and off of systems, installation of patches and changes to configuration files
Changes to storage configuration that could include LUNs, zoning or encryption characteristics
Security within a virtual environment has the same underlying principals as the traditional physical environments we are accustomed too. Least access must be ensured so that compromised accounts or rogue staff have a limited amount of damage that can be caused. Process is the most important way to ensure access is limited in a way that staff can successful complete their job, yet not access resources they do not have an immediate need to work with. Clear process can ensure new systems are thoroughly tested, reviewed and put into service, and then managed for the life of the application or host. Staff are more effective at overall administration if consistency is ensured across an environment.