Thursday, August 12, 2010

Considerations for always-on environments

In today’s information technology (IT) departments, more and more applications are becoming 24x7 in nature, requiring availability at all times, with minimal or no downtime to accommodate upgrades, patches, troubleshooting or maintenance activities. This is a tall order with today’s integrated solutions. Ensuring that all pieces of an environment work regardless of maintenance, outages or upgrades is a complex challenge, often only addressed by companies with large resources at their disposal.

There is no magic bullet to solve all availability problems. But, if you are in a position to develop a solution interactively and own the entire infrastructure, there are considerations that can be planned for ahead of time to provide significantly better available of an application than could be obtained through traditional methods for supporting always-on type applications.

The most common challenges in today’s always-an environments are:

  • Hardware Failures - Hardware fails, there is nothing that we can do about that. The more hardware you have, the more components that will fail. All modern environments should be designed to ensure that critical applications and services are not taken offline by failed hardware.

  • Facility Failures - Modern data centers are a complex combination of traditional construction, complex power generation and distribution, cooling and environmental controls. These systems, while redundant, can still fail for a variety of reasons. Additionally, data centers in many locations are susceptible to environmental disasters. All modern application environments should span multiple locations in a way that ensures a failure of the application is not caused by the failure of a single facility.

  • Upgrades Must Occur - Upgrades are one of those activities within IT that must occur. There is a lot of flexibility around when and how they are completed, but eventually, all hardware and software will need to be updated to ensure new features are available, the solution is secure and stable.

  • Rapid Growth - Today’s IT environments are growing at a very rapid pace compared to even 5 years ago. Many organizations add new servers and applications on a weekly basis. This growth must be managed at the same time that legacy systems are upgrades, eliminated and supported through all this change.

  • Data - Both data size and the life cycle of that data is becoming more complex to manage. More and more data presents challenges to upgrades, backup, recovery, available bandwidth and a list of other pieces of the environment. As data grows, it also becomes more valuable to the organization and the life cycle of the data much be managed in a more automated way to ensure that data is deleted when required and and the flip side, is available when required.

  • External Dependencies - No matter how many resources and skills a company has internally, almost all firms have to use a third-party product that is developed by an outside firm for some portion of their IT environment. These third-party products present challenges in integration, management and support that consume time and resources from the IT department using them.

  • Shifting Loads - Very few solutions today have a consistent load on the environment every day, and every hour. This variation adds complexity by requiring a constant monitoring of capacity and adjustments based on user demand.

  • Variety of Delivery Platforms - Most applications today are delivered via a variety of platforms including web browsers, smart phones, exposed web services or appliances. Each of these requires a separate set of development and testing procedures, each has separate security policies and all have very different methods for storing, rendering and handling dynamic content. Modern applications must support a seamless experience across all of them for each individual user and their preferences.

While that is not an exhaustive list, it is a good intro to the kinds of challenges that are encountered in most IT departments today. In order to overcome those, many shops develop their own software to ensure they have features and capabilities to make the above challenges easier to manage. Here are some of the most common design considerations that can ultimately lead to design solutions when designing new software within your environment:

  • Mindset of Rolling Upgrades - In today’s environments, 100% up time is not an unreasonable expectation when serving an application supporting users in multiple timezones and countries. Modern software should be designed in a way that upgrades can be done in a non-disruptive way, allowing users to continue using the application during upgrades and other maintenance activities. While taking the entire application offline is not often an option, software can be written to allow some subset of servers or features to be down for periods of time, while preserving other portions of the application.

  • Inflight Transaction State Tracking - Modern applications often times will communicate with many different systems across a single or multiple data centers. All this communication must be properly tracked so that an application can recover and a user continue with their transactions, even if hardware and software within the environment fails. It is critical that modern software implement mechanism for tracking the state of communications between servers, this state can be used to recover should a failure of a server or facility occur. This tracking and ability to recover automatically will allow the application to function regardless of the underlying hardware state.

  • Consistent and Automated QA Processes - Quality assurance within the software development realm has become more and more critical as applications have become more complex and upgrades more often. Having a fully automated regression testing environment will allow new builds of an application to quickly be tested and allow for new tests to be developed as bugs are found to ensure that they do not make it into future releases.

Today’s 24x7 IT environments are extremely complex and very difficult to manage. As they grow and user demands grow, they will only become more complex. Automation is one primary way to stay ahead of this challenge - any process, from QA to deployment that can be automated, should be automated, this increases quality, decreases variation and limits the chance for human mistakes. That paired with distributed software and hardware environments will ensure that as the environment scales, it will not become more susceptible to a failure of any single component. Finally, design all applications assuming that things will fail. By ensuring that applications can recover from failures, either hardware or software, you can ensure that the end user experience is as positive as possible.

Monday, August 2, 2010

The Trend of consolidation in IT

Consolidation is a common term with Information Technology (IT) departments, it is often used by CIOs and set as a goal for IT departments. While consolidation is a valuable goal, IT departments need to focus less on the concept of Consolidation, and more on setting good, long term habits for the department around hardware purchases, reuse and elimination of assets. Consolidation makes for a good short-term project with defined end dates and targets. What is more critical for IT departments, is to embrace the concept of Return on Investment (ROI) and ensure that any solution deployed has a solid ROI for the business and properly factors in all costs for the life-cycle of the purchase. The more focus on long term planning and ROI calculations, the fewer times an organization will find itself in a place that it fells it must start a Consolidation project. By using the ROI calculations during all projects it ensures the IT portion of the business is as efficient as possible.

I have used this term consolidation multiple times already, but what does it really mean? Within the context of IT, consolidation is most often the process of reviewing all applications and the servers they are hosted on and getting rid of components that are not necessary any more with the goal of utilizing the most up to date technology to better match capacity needs with actual capacity.

In my mind, more important then working on consolidation projects, IT can better utilize their project time and resources by looking at Return on Investment (ROI) for the entire enterprise while working the many projects that make up an IT environment. By reviewing ROI as part of projects a company can ensure that the solutions being implemented will be the cost-effective over the life of the project and not require large consolidation projects that consume time, money and human resources. ROI can be influenced both positively and negatively by a regular update cycle being defined for both hardware and software. This regular cycle ensures that all projects are reviewed for changes on a regular basis to ensure a chance is made to go with modern hardware, less capacity or different solutions to minimize cost.

What type of things should be reviewed by all projects to ensure that capacity does not grow so large that a consolidation project is needed?

  1. Reducing hardware - As part of all new projects, updates, changes or capacity requests the hardware needs should be closely reviewed to ensure that the amount of hardware is not excessive, but rather just enough to handle the availability needs, performance and geographic distribution. A capacity plan should be developed to ensure that additional capacity can be added as needed, but before performance is negatively impacted.

  2. Reducing the number of vendors - ROI for all projects should include a review of complexity and the associated cost of managing that complexity. Consolidation often targets reducing the number of vendors with an IT environment. Carefully reviewing vendors at the beginning of projects can ensure that all vendors have a strong reason for being added and adding costs to projects related to vendor management, support escalation and compatibility matrices.

  3. Reducing the number of instances - Often times companies will have multiple copies of applications or data sets, often referred too as instances, running within the environment. These are created for a variety of reasons, but often contribute to high administration costs and difficulty when auditing the environment. Projects should carefully review the use of multiple instances and ensure that there is a valid business need prior to deploying a solution that could require consolidation or high administration costs after initial deployment.

  4. Rotation of hardware - All hardware will eventually hit the end of it's useful life and have to be upgraded or replaced. To minimize the need for consolidation, new projects should plan during implementation what the minimum and maximum life-cycle of the hardware for the project will be. This will ensure that as the capacity needs grow, hardware can be rotated in for the project to ensure capacity and needs match.

  5. Upgrades of software - Software requires regular patches and upgrades to ensure stability and the ability to easily apply required patches later in the lifecycle of the solution. Projects should include adequate time for applying incremental patches as needed to eliminate the potential of costly downtime periods and staffing resources should a critical patch become available, but not be able to be applied due to missing prerequisites. All project plans should include adequate time for upgrades, maintenance and testing to ensure maximum stability and manageability.

  6. Placement of new applications - Often times the assumption is made that a new application requires a new server, or some variation on that theme. All projects should include a careful review to ensure that all capacity being added will be utilized at it's maximum levels and ensure that duplication of services, hardware and capabilities are not being added into the environment.

  7. Proactive addition of capacity - One method of ensuring that consolidation is not needed down the road, and your IT environment is running efficiently is through the use of capacity planning, and proactively adding the necessary capacity to handle demand and by using centralized management of capacity. This centralized management of capacity will ensure the environment has capacity to handle growth, old hardware is adequately retired and that capacity being added is based on adequate planning and not guesses or last minute panicked needs by end users.

I have used the term return-on-investment (ROI) several times. What components make up the ROI for an IT focused project:

  1. Current administration costs - These administration costs include the physical power and cooling necessary to run the servers for a given set of applications and the associated data, the labor necessary to upgrade, patch and keep the environment stable and the maintenance costs for any hardware or software being used in the environment.

  2. Delta in new solution administration costs - The delta, or difference, in the cost of managing the new solution being proposed. This delta could be represented in weeks, months or even years, depending on the size and complexity of the project being assessed.

As you can probably tell, I am not a fan of consolidation projects. I prefer to manage an enterprise proactively and ensure that capacity added, is capacity needed. But, that is not always an option. in the event you are beginning to look at your environment for potential areas of efficiency, here are some items to review for possible consolidations savings:

  1. Application review - The quickest way to consolidate is to eliminate duplicate applications and functionality from the enterprise. Review all applications and ensure they have a valid business need, executive sponsor and are not a duplicate for other functionality found within the enterprise.

  2. Hardware review - A thorough review of all hardware in the data center will allow you to catalog it's age, power consumption, speed, memory capacity and use in the environment. This information can be compared to the latest information on systems available and determine if power or space savings can be gained by moving to newer hardware platforms.

  3. Data center costs - Paired with the above data, a review of the costs for space and power for your data center can be completed to see if savings can be found through the use of less space within the data center.

Consolidation is a commonly used buzz word in IT today. Consolidation is most often the use of visualization to cut back the number of servers utilized in an environment, and reaping cost savings from the lower power, cooling and space costs. Consolidation can also include work like eliminating applications with duplicate functionality and eliminating pockets of information that must be managed separately from the rest of the corporate enterprise. While consolidation is a worthwhile goal in all IT departments, it is equally important that IT leadership ensure that as new solutions are deployed, they are done in a cost efficient manner and with the necessary amount of capacity. This will ensure capacity is not sitting idle and will need to be consolidated down the road. IT departments can save both money and time by ensuring that a solid strategy is in place to add capacity, tools, servers and applications in a way that is the most efficient for the enterprise.