Thursday, August 12, 2010

Considerations for always-on environments

In today’s information technology (IT) departments, more and more applications are becoming 24x7 in nature, requiring availability at all times, with minimal or no downtime to accommodate upgrades, patches, troubleshooting or maintenance activities. This is a tall order with today’s integrated solutions. Ensuring that all pieces of an environment work regardless of maintenance, outages or upgrades is a complex challenge, often only addressed by companies with large resources at their disposal.

There is no magic bullet to solve all availability problems. But, if you are in a position to develop a solution interactively and own the entire infrastructure, there are considerations that can be planned for ahead of time to provide significantly better available of an application than could be obtained through traditional methods for supporting always-on type applications.

The most common challenges in today’s always-an environments are:

  • Hardware Failures - Hardware fails, there is nothing that we can do about that. The more hardware you have, the more components that will fail. All modern environments should be designed to ensure that critical applications and services are not taken offline by failed hardware.

  • Facility Failures - Modern data centers are a complex combination of traditional construction, complex power generation and distribution, cooling and environmental controls. These systems, while redundant, can still fail for a variety of reasons. Additionally, data centers in many locations are susceptible to environmental disasters. All modern application environments should span multiple locations in a way that ensures a failure of the application is not caused by the failure of a single facility.

  • Upgrades Must Occur - Upgrades are one of those activities within IT that must occur. There is a lot of flexibility around when and how they are completed, but eventually, all hardware and software will need to be updated to ensure new features are available, the solution is secure and stable.

  • Rapid Growth - Today’s IT environments are growing at a very rapid pace compared to even 5 years ago. Many organizations add new servers and applications on a weekly basis. This growth must be managed at the same time that legacy systems are upgrades, eliminated and supported through all this change.

  • Data - Both data size and the life cycle of that data is becoming more complex to manage. More and more data presents challenges to upgrades, backup, recovery, available bandwidth and a list of other pieces of the environment. As data grows, it also becomes more valuable to the organization and the life cycle of the data much be managed in a more automated way to ensure that data is deleted when required and and the flip side, is available when required.

  • External Dependencies - No matter how many resources and skills a company has internally, almost all firms have to use a third-party product that is developed by an outside firm for some portion of their IT environment. These third-party products present challenges in integration, management and support that consume time and resources from the IT department using them.

  • Shifting Loads - Very few solutions today have a consistent load on the environment every day, and every hour. This variation adds complexity by requiring a constant monitoring of capacity and adjustments based on user demand.

  • Variety of Delivery Platforms - Most applications today are delivered via a variety of platforms including web browsers, smart phones, exposed web services or appliances. Each of these requires a separate set of development and testing procedures, each has separate security policies and all have very different methods for storing, rendering and handling dynamic content. Modern applications must support a seamless experience across all of them for each individual user and their preferences.

While that is not an exhaustive list, it is a good intro to the kinds of challenges that are encountered in most IT departments today. In order to overcome those, many shops develop their own software to ensure they have features and capabilities to make the above challenges easier to manage. Here are some of the most common design considerations that can ultimately lead to design solutions when designing new software within your environment:

  • Mindset of Rolling Upgrades - In today’s environments, 100% up time is not an unreasonable expectation when serving an application supporting users in multiple timezones and countries. Modern software should be designed in a way that upgrades can be done in a non-disruptive way, allowing users to continue using the application during upgrades and other maintenance activities. While taking the entire application offline is not often an option, software can be written to allow some subset of servers or features to be down for periods of time, while preserving other portions of the application.

  • Inflight Transaction State Tracking - Modern applications often times will communicate with many different systems across a single or multiple data centers. All this communication must be properly tracked so that an application can recover and a user continue with their transactions, even if hardware and software within the environment fails. It is critical that modern software implement mechanism for tracking the state of communications between servers, this state can be used to recover should a failure of a server or facility occur. This tracking and ability to recover automatically will allow the application to function regardless of the underlying hardware state.

  • Consistent and Automated QA Processes - Quality assurance within the software development realm has become more and more critical as applications have become more complex and upgrades more often. Having a fully automated regression testing environment will allow new builds of an application to quickly be tested and allow for new tests to be developed as bugs are found to ensure that they do not make it into future releases.

Today’s 24x7 IT environments are extremely complex and very difficult to manage. As they grow and user demands grow, they will only become more complex. Automation is one primary way to stay ahead of this challenge - any process, from QA to deployment that can be automated, should be automated, this increases quality, decreases variation and limits the chance for human mistakes. That paired with distributed software and hardware environments will ensure that as the environment scales, it will not become more susceptible to a failure of any single component. Finally, design all applications assuming that things will fail. By ensuring that applications can recover from failures, either hardware or software, you can ensure that the end user experience is as positive as possible.

No comments: