Merging Business and IT

Wednesday, October 28, 2009

Risk Workshops

Risk workshops are an important part of managing risk for all projects. They are typically done at the beginning of a project and any time a major change is made to the requirements or acceptance documents for the project. Risk workshops are brainstorming sessions to develop a list of risks for a project and determine the factors associated with those risks so that they can be mitigated.

Throughout this posting I use the word project a lot. In this context, project is a defined set of activities with a given beginning and end. Projects are common in all companies to take a unit of work and properly manage that unit of work to completion.

A Risk Workshop is a sit-down, face to face, thorough review of all aspects of a project. This commonly includes delivery schedules, acceptance plans, project financials and contracts associated with project delivery. The primary purpose of the risk workshop is to allow both project participants and outside observers to brainstorm all possible risks that could come up and come to agreement on risk level and mitigation strategies.

There are 2 priorities for all risk workshops. All participants should enter with these at the top of their mind as they look for mitigations strategies and project plan changes:

Protecting the company from financial loss. This can include penalties for delays or missed features, or having to redo work because of low quality.
Delivering the project on-time. All risk mitigation strategies should be worked together with developing the project plan so that estimations are realistic.

The primary deliverable for all risk workshops will be a risk workbook. The most common categories to track for each risk in this workbook are:

Risk Number – A unique identifier for the project so that team members can clearly communicate about each individual risk.
Raised By – The name of the individual who first brought up the risk. This should be tracked in the event clarification is needed about the risk or impact.
Date raised – The date that the risk was first discussed. All notes associated with the risk should have associated dates as well to track the progression of the risk discussions.
E/B/C – Engagement/Business/Customer – This category will define the risk type. Engagement is a risk associated with contractual details or the relationship between customer and vendor. Business risks are a risk that a product lifecycle may change or priories shift for a team. Customer risks are associated with delays on the customer side, either because of pre-requisites not being met or changes to the customer's requirements.
Description – This is the detailed description of the risk, what it affects and any supporting details to what could trigger it.
Risk Cost – This is the monetary cost of correcting the risk should it become a problem during the project. This includes time, facilities, and all associated resources needed to resolve the risk should it become a problem.
Risk % - This is the chance that the risk will occur during the project. This is used in pair with the risk cost to determine a risk budget for the project.
Mitigation Strategy – This category defines the solution for mitigating the risk. This should define what steps will be taken to lessen the chance of the risk turning into a problem. This could include additional staff on the project, earlier testing, or a chance in architecture.
Mitigation Cost – Mitigation cost documents the cost to minimize the chance of a risk occurring. This cost is then compared to the chance of the risk occurring and the cost of the risk occurring to determine if the mitigation cost should be spent, or continuing on the project and managing the risk if it does occur.
Risk Owner – The risk owner is the individual that best understands the risk and associated mitigation strategies. This is most commonly the person responsible for monitoring for the risk occurring and documenting the risk mitigation strategies.
Risk Trigger – Not all risks will become problems and impact a project. The risk trigger is what defines when the risk does become a problem so that staff can take steps to address the problem.

The first part of any risk workshop is to discuss the objective and purpose with the participants. All risk workshops should begin with a discussion of why the team has come together and what deliverable is expected at the end of the meeting. This deliverable will most often be a risk workbook containing all risks and their associated risk level, potential cost and mitigation strategies. All risk workshops should set time limits to ensure that if a discussion occurs on one risk, the meeting time is not overwhelmed. This time limit is too ensure everyone has a chance to speak on the topic. If consensus is not reached in that time frame, someone delegated by management should be responsible for getting input from all parties and making a decision on the risk level and other details.

I can not remember a project that I have worked on that had zero risk, or a list of zero risks. All projects have some level of risk, and the purpose of a risk workshop is to clearly define them and the plan for avoiding delays because of them. A long list of risks coming out of the risk workshop shows that the team was successful in thinking of possible pitfalls and mitigating them. The purpose of the meeting should not be to have a list of zero risks, that is not the same thing as a zero risk project.

As part of the risk mitigation portion of the risk workshop, there are two primary strategies for handling high risk components of a project:

Redesign – Often times a design can be redone to limit, or minimize the risk of a project. The redesign may have other impacts including cost of delivery or schedule impact that must be weighed against the potential risk.
Risk Mitigation – Mitigation is the most common strategy for managing risk. This is the early planning of how to handle a risk, should it become a problem. Mitigation often involves having clearly defined paths for escalation to other teams or additional resources available.

Ultimately the risk workbook will be used to develop a risk budget. This risk budget will be built into the project financials to ensure adequate resources are available to respond to risks if they do become problems, as well as providing funding to cover risk mitigation as necessary.

Risk workshops are a critical component to all successful projects. A risk workshop allows for all interested parties to express any risks they foresee and how to properly plan for and mitigate these risks. Risk workshops should not consume an unlimited amount of time, but should allow everyone to express an opinion to risk levels and allow that to be documented in the risk workbook for the project. Risk workbooks are living documents for the duration of a project and provide a single reference for developing the risk budget and showing mitigation strategies for a project.

Saturday, October 3, 2009

Time scheduling for IT Staff

Information Technology (IT) staff often must juggle both daily demands of user requests and daily repair activities, with long term projects like upgrade testing, capacity planning and new feature evaluation. These two distinct types of work are difficult to juggle, in addition to a never ending array of meetings, office interruptions and service outages. Many IT jobs today are high stress, both because of the level of work to be completed, but as well as the chronic mis-management of time, creating both higher stress levels and lower productivity levels.

As with all professions, the goal with time management, by both staff and management should be to minimize context switching. A context switch is each time a person must change from one task to another; this can include changing project focus, phone calls, office interruptions or stopping a task to goto a meeting. By limiting context switching IT management can allow more time for staff to focus, and provide them clearer blocks of time to complete their work, in a more efficient way.

It is quite common within the IT space to schedule meetings mid-day as well as pull staff into meetings during the day. This is quite disruptive and often not necessary. It is important that managers within IT organizations clearly define what constitutes an emergency and how to properly justify pulling staff away from their daily work load versus planning for a meeting in the future.

Suggestions for minimizing interruptions and increasing time utilization:

Meeting Free Days – Blocking out days specifically for meetings will allow the remaining days to be used by staff to focus, free of interruptions on long term projects, research and other work that is more efficiently completed during a focused period of time.

Set Aside Time for Ticket Based Work – It is very common for IT organizations to have a ticket tracking system to handle incoming requests and common tasks. This should be monitored by a dedicated person; if that is not possible time should be dedicated for other staff for monitoring. Tracking and managing many small requests in the middle of project based work is very disruptive and negatively affects productivity on the long term projects.

Clearly Defined Office Hours – Clearly defining staff's office hours can set a stage for limiting interruptions to minimal times within the day and giving staff dedicated time for focusing on ticket based work and project based work. This will ensure that staff are available for drop in discussions, but that these do not dominate their available time.

Staff Privacy – One method to ensure IT staff can focus and ensure time is used properly is giving IT staff a private office and workspace. All IT jobs require some level of collaboration, but they also require time to focus on projects and work as an individual. This focus requires a place free of interruptions like ringing phones, conference calls, others talking in the hall way and side discussions.

Within IT, time management is important to ensure staff can properly focus on both daily needs as well as long term projects and goals. By minimizing context switching by the use for dedicated blocks of time, staff can have better focus and concentration on their projects, and ensuring completion on time and minimal delay and interruptions.

Saturday, September 19, 2009

Importance of Code Reviews

Code reviews are an important part of the software development process. They are the period during development where a more senior team member reviews the code written by another team member, prior to submission into a companies version control system. Code reviews are a formal process to both improve the quality of submitted code, as well as to allow for mentoring of all developers on the team.

Any time a piece of code is being submitted for eventual inclusion in an application, a code review should be part of the process prior to formal inclusion. This ensures that a minimum of two people review all changes to the software to check for defects. This code review process also ensures that knowledge is duplicated within the enterprise to better manage project transition and long term support responsibilities for all applications.

There are several primary areas that should be of focus for all code reviews:

Company Coding Standards
All companies should have standards for software development. These should include the libraries used during development, the documentation of the code base and the languages used for development. This is the first item that should be reviewed during all code reviews. By reviewing all code for adherence to company standards, it ensures all team members not only follow the standards, but have a chance to learn any standards that they may not be aware of or that may have changed.

Company Enterprise Architecture Standards
In addition to company coding standards, all firms should have a formal set of Enterprise Architecture (EA) standards. These often include how data is stored, managed, tagged, backed up and secured during transport and manipulation. All code reviews should ensure that new code being submitted follows existing company EA standards for ease of interoperability, as well as long term software life cycle management.

Mentoring
Mentoring is a key component of all code reviews. Code reviews allow senior staff to review code of their teammates, and provide them suggestions for improvement based on experience. This mentoring is key to ensuring better long term quality from all produced code, as well as for providing staff a path for development. Each staff member that is having their code reviewed could potentially be reviewing code in the future, so it is key that this mentoring process be official, and an important part of the software development teams culture.

Security
In today's IT environments, security is a critical component of software development. All code reviews should include a portion of time for reviewing security to ensure that input and variables are handled securely, that temp data is cleaned up properly and that host to host communication is handled in a secure fashion, just to name a few.

Security is a complex topic, especially in the software development arena because of the wide range of attacks, challenges and threats. Code reviews allow for a formal process to ensure common mistakes are not made, previous mistakes are not made again and that staff have a forum for discussion of implementation details.

Scalability
Today, many applications are scaling to levels of usage never first envisioned when the application was written. This causes many problems for both the administrators of these applications, as well as the users. Code reviews should ensure that applications are properly handling resources like CPU time, system memory and disk bandwidth as to allow the application to properly scale over time. Scalability is a combination of many components, both the responsibility of the developer and other IT administrators; code reviews should ensure that all code written is properly prepared to scale over time and handle even the most extreme loads on the system.

Coding Quality
Ultimately, the final key of all code reviews is ensuring quality. Quality can come from many aspects of the code base including documentation, ease of understanding of the code and the maintainability of the code. These are all key aspects that if properly addressed and corrected during code reviews can ensure not only better developers, but more manageable code over time.

Code reviews are an important process component for all companies developing software, either for internal use or external sale. Code reviews ensure that staff are formally mentored on the code they contribute, allowing them to increase their skills and experience as developers and become more valuable to the organization over time. A side effect of this mentoring is higher quality code submissions, with fewer defects and better long term manageability of the code base.

Sunday, September 6, 2009

Migrating Applications between OS Platforms

At some point in time most Information Technology (IT) departments have had to migrate an application or service from one platform to another, in this case I mean a different operating system as the platform. This is most often driven by a cost savings that can be obtained on the new platform, either through lower hardware maintenance costs, or lower support costs for the software on this new platform. The challenge with these migrations is that often times, the application is stable on the existing platform, and any migration introduces the risk of introducing instability.

The points of review documented below are not specific to any operating system (OS) on the market, but rather are a guide for migrating from any single OS to a different OS. Currently the IT world is seeing the largest percentage of these types of application migrations from UNIX-based platforms to a Linux based platform. But, just because this is occurring now does not mean this will always be the most common migration path, in time a new OS could come on the market providing advantages not currently available.

Many modern programming languages are portable in the sense that they can very easily be migrated from one host OS to another. This is not true for some legacy programming languages; this framework is meat to cover both these cases. Even with modern programming languages, some underlying libraries can vary from OS to OS and will require detailed migration planning.

Below is a framework for the process for reviewing the application being migrated and developing a plan for the migration. This framework is structured to ensure that the same steps can be used, regardless of the original and future OS.

Application Source Code
When initially reviewing an application to migrate from one OS platform to another, the source code must be checked from a process, availability and legal standpoint. This is the first phase to determine if the application can even be migrated to a new platform.

Is the source code available?
This is often an overlooked component of legacy applications. Often times the source code is not available either because it was lost, or because the intellectual property for the application has been transferred to another party. This is an important part of porting an application, and can cause alternate applications to be looked at or developing the application from scratch.

Legal obligations?
As part of reviewing the availability of the source code, it is also important to review legal obligations around that source code. Specifically open source applications often have requirements for submitting changes to the community, depending on the usage model of the application. These legal obligations are also important regarding trademarks, copyrights, and their implications on staff that previously worked on the application being reviewed.

Review of Application Source Code
After determining if the application source code is available, and determining what changes can be made and how to communicate that to external parties that may be required, it is time to review the source code technically to develop a plan for the migration and porting activities later.

What language?
Looking at what language the application is developed in is a first critical step. This will enable the planning team to determine if the company has the necessary skills to port the application, or if external resources will be needed for the migration. Knowing the language can also assist with planing supportability on the new OS based on how the well the language is supported and used in the community.

What libraries?
As part of reviewing the source code, a review of the libraries used should be done. This review should be done to ensure that the libraries will properly work on the new OS, that they are still available, and that they are compatible with other libraries that will need to be installed. This is the time to ensure no dependency problems are found later in the migration.

Deprecated calls?
The source code review should also include an assessment of what calls and functions are now deprecated; this can include external libraries, kernel functions and other external resources. Any section of the application code that references deprecated functions should be reviewed to determine the best supportable path forward to ensure that functionality is not compromised.

Define Testing and Roll out Strategy
Now that the source code has been reviewed, it is time to define success for the migration. This component of the process is to ensure that relevant metrics are clearly defined for the time period of the migration, and after the migration so that staff using the application are not negatively impacted by the migration.

Data Integrity
Defining data integrity standards should be the first metric for all migrations from OS to OS. This is critical to ensure that data is consistent both during the migration, and handled in the proper way after the migration. A migration of an application from one OS to another should not ever require the compromise of data integrity standards.

Functionality
Second to data integrity is functionality. Staff become used to the tools they use on daily basis, and any change in the capability or functionality of those tools can cause a significant drop in performance. All migrations should include reviews to ensure that all utilized features will continue to be available for staff to utilize.

Performance
Performance is an important metric to define prior to migrating an application from one OS to another. Performance can change dramatically between OS platforms and should be planned for both testing and proper application tuning during the migration process. Performance can include many metrics including response time, reporting generation time and response time under heavy loads.

Security
While one OS is not necessarily more or less secure then another, each have their own methods for setting permissions, logging system activity and patching against known vulnerabilities. The migration plan should include a proper review of these differences to ensure that staff are properly trained to handle securing the application once it is running on the new OS.

Stability
Stability is commonly defined as uptime or availability of an application. Introducing a new OS to an environment can change the availability characteristics, either because of new, unfamiliar processes, or because of a misplaced expectation about an OS's capability. A plan should be developed to define what availability is required of the application, and documentation on monitoring those metrics.

Porting of Code
After defining the above metrics, we can begin the longest portion of any application migration. The actual porting and testing of the application to the new OS platform. This phase will include both making modifications to the code base to ensure it works on the new OS platform, as well as testing the application on the new OS platform to ensure it properly meets the metrics defined above for success.

Maintenance Cycle Definition
During the porting of the application data can be gathered about necessary maintenance that will need to be done regularly on the new platform. This maintenance cycle will need to include time to update patches to both the underlying OS, as well as do maintenance on the data supporting the application. This maintenance cycle should be defined prior to roll out so that staff can be properly trained on this maintenance cycle and end users can be prepared for a possible change in availability policies for the application.

Update DR Processes and Tools
Disaster Recovery (DR) is an important component of all application migrations, ensuring that a properly plan is in place to recover from catastrophic failures and ensure the data and application are available for use. As part of the application migration, the DR processes should be reviewed to adequately reflect the changes in how the application is hosted and what precautions should be taken for backup, replication and training for recovery.

Training
Training is a two part activity, both the administrators for the application as well as the end users will need to be trained on the changes in administering and using the application. Training should be provided to the appropriate staff prior to migrating the application, this will ensure that staff are ready for all change that come as part of the migration. Training should additionally be made available for staff to reference back after the migration to answer questions that could come up about the migration.

Application Roll out
After the above metrics for success are defined, the code is ported and tested and staff are trained, the application migration can be completed. This migration will include the migration of any necessary data for the application, as well as the application delivery infrastructure. This migration can be done is phases if the architecture of the application will support it, or may require an extended outage to properly migrate and test all components.

Migrating an application from one hosting OS to another is a common practice, yet, very often it is done with very little planning. As IT continues to evolve, it is inevitable that new OSs with innovative features will become available, necessitating the need to migration applications between them. Keeping a solid process that is followed each and every time will ensure stability in the migration, integrity of the data and maintaining productivity of the end users.