Creating a Business Continuity Plan (BCP) requires thought and planning. This blog explores what a BCP is, a high level approach to defining a BCP and how it differs from a Disaster Recover Plan (DRP).
So What’s The Difference Between BCP And DRP?
The obvious answer is that BCP deals with business continuity and Disaster Recovery Plans (DRP) deal with the restoration of systems after a disaster.
Normally DRPs are far more focused on actual technology and steps, where as BCPs have to consider everything surrounding it. The Business Continuity Plan must look at risks to business and likely scenarios we need to manage, where as DRP’s normally are more specific; although they may also be scenario based. Typically the DRP is written by a technical specialist with experience and scope around what happens with specific technologies.
BCPs are important because they consider the needs of the business and not only the technology. Technical subjects, Such as daily backups have a business implication. Performing daily backups implies a Restore Point Objective (RPO) has been set to 24hrs which effectively means that at any point up to 24 hrs of data can be lost. Is that acceptable in a large company? Possibly not. It’s a business decision that sometimes is made by a technical resource with little thought for the fact that loosing a day of business could result in extremely high costs; If one person loses a working day of information the cost may be considered to be 8 hrs of work, but if 100 people lost 8 hrs of information, the cost could be 800 hours.
If you have ever been in the unfortunate position to lose a large system and have to recover from backup due to a series of extremely improbable events you will see some of these issues first hand – It can take months to restore things, causing all manner of financial penalties and chaos. BCP’s should be tested at least once a year – because business and technology change. Even if you only use public cloud services, you still need a BCP in place.
In a large systems failure a simple question can greatly reduced the cost to your business and customers – for example, “What order should we do restores in?”.
Operations might restore in an order that made sense to them, but that’s not necessarily the right order for the customer or the business. Its possible that key critical services & infrastructure for our customers can be grouped together and restored first to minimize impact to them.
The same question could be asked for a product team – in the event of a catastrophe – what do we really need to get things working as a bare minimum? Operations cannot know whats business critical by itself, the BCP guides them.
Customer vs Internal BCP
In an IT services operation its important to remember the customer and supplier are two different business entities. In pretty much every business model the customer doesn’t want to spend money – they want to receive value. As a provider, We want to provide value in the most efficient way we can so we can reduce risk, optimise our costs and improve our profit.
At the heart of a BCP we are managing risk to our business – and the customer must manage risk to theirs. In order to manage risk to business you need an understanding of the business strategy and goals, A customers BCP is about managing risk to their business and not ours. Properly defining a BCP with a customer can be considered to be consultancy work which may involve connecting to their stakeholders, understanding and modelling their strategy, analysing their working practices, risks, and potential business impact. This requires a level of intimacy with the customer.
Similarly, a service provider does not want to expose customers to all the risks that they need to mitigate; they need to protect the business, and its a level of detail they do not normally need. Typically BCPs are classified as “Internal”, or “Confidential”.
For these reasons above, It’s essential that a service provider doesn’t mix these up.
How do I build a BCP?
People often just pick up a template and fill things in on it – which often gives unpredictable results and isn’t really covering the things that are critical to business. Consider a structured approach:
Risk Analysis (RA)
This is key to building a proper BCP. If we haven’t identified our risk, then how do we know a business continuity plan is providing value, and mitigating key risks to the business? I have seem businesses that have not gone through risk analysis at all, leading to some very high level scenarios, which have no value, because at that level, in an emergency, just making things up would work equally as well. There are formal mechanisms we can use such as SABSA, or if we modelled our business continuity scenarios and processes in BPMN, we could apply something like I suggested in my blog Risk Analyzing BPMN Models.
Thinking about your end to end delivery is a good starting point for doing BCP work and then drawing it as a process.
Requirements are also a good place to start; do not forget those come from all kinds of places – we have customer requirements & wishes, security non functional requirements and may also have goal related or other requirements from our business. Understanding priority requirements and looking at possible risks to meeting them can form a basis for a risk analysis.
Of course a skilled architect designing solutions and documenting them to an ISO 42010 standard would already be managing stakeholder concerns, and would be able to identify the key concerns easily.
Business Impact Analysis (BIA)
Once we identify risks we need to establish their cost to business. There may already be guidelines around this; many people like to asses in terms of potential financial loss. In a well defined business there are normally a set of established metrics defined in architecture, and / or a policy around how we measure risk impact. Very basic values can be calculated with a set of assumptions – for example – if we have defined a risk that there will be a loss of customer data we could say the impact hits us in several ways: financial penalties, reputation, potential loss of customers.
If we think a risk will impact multiple customers – such as may happen if we lose a complete platform, you may wish to assess how many customers we might permanently lose, as the missing revenue may impact us in a long term.
We could make a rough guess on the percentage of customers we might lose, but we could also look at previous examples of similar events – for example how many customers did we loose when we lost our servers during a previous outage? what did it cost? You can use such figures or percentages as a guideline when calculating potential impact. Think about how you can use the figures you have at your disposal and use those to influence your assumptions. Once you have run through an impact analysis you may actually decide to re-.prioritize your requirements.
Writing the BCP
Once you have done the RA and BIA you have a good starting point to all the key areas you need to cover in a BCP. With or without a template, we have a good idea of the scenarios we need to cover in our BCP. A few things to note:
Normally I try to avoid repeating information I have in other places – i would rather refer to other documents. Doing that however means that you need to ensure the referred documents are accessible to the entire audience of your BCP. The audience of a BCP is something that needs to be considered carefully. Of course, you have all the security related people but that is the tip of the iceberg. All the players in our BCP process need to be aware of their part in it and need to agree to their part in it. The owner of the BCP must ensure access to the BCP and all related resources for its audience. At this point we can take a document template and start to build a document that really brings value.
There is another school of thought on where to keep information – when I discussed it with some members of my employers security team, they preferred to copy and paste key material into the BCP template. The value that has is that all the information you have is in a single location – the disadvantage is its another place to maintain which can easily become outdated.
In some previous jobs I have had to also maintain a print copy in a physical safe. Of course we are supposed to regularly test this so its arguable that the document will be kept up to date… you decide.
I eluded to the fact that there can be a hierarchy of BCP’s; depending on the structure of the business; and there can also be dependencies on disaster recovery plans and on teams and people – its important that as part of the disaster recovery planning excercise you ensure the availability of all of the things you depend on – be it resources, systems, documentation.
Bear in mind that if your services rely on other teams, or other companies they could well be an integral part of your BCP and its important to establish a proper interface and expectation. This becomes a lot easier if you have defined your process using a notation such as BPMN as i mentioned earlier.
Discipline & Testing
Testing should be regularly done – at least one a year – and documented. If its not documented it never happened. Things change.
Losing The Value Of BCP
If its not tested, it loses its value. If its not communicated, it loses its value. if its done as a copy+paste exercise without walking through your processes and thinking though its goals and your business… it loses value.
Like many things in Architecture the value in a BCP is not in the result, but more in the process you take to reach it. Without the risk analysis or an idea on how your business actually works the value is greatly diminished. The question isn’t actually – “Do I have a BCP Document?” the question is “Do I understand the key areas of risk in my business and do i have a solid plan defined and communicated for if something catastrophic happens?”
Should I Be Doing A BCP?
Business Continuity Plans are Rarely a singular document. In a large corporation some scenarios are taken care of at corporate level and then the different levels of the corporation should have their own – individual service areas, and in the case of Tieto, individual Products/services.
At a corporate level a BCP will usually cover things such as Loss Of Life, and how we should handle things like the media in a catastrophe. Bear in mind also we may rely on other BCPs and should document if its the case.
I’ve heard from product managers sometimes in different service organizations that they shouldn’t be responsible for BCPs – its a customer thing. The reality is, if you have a business that you value, you need to be able to protect it, which is why the BCP exists. There may be exceptions but at the end of the day product teams and operations are running parts of the business too – often together – We should not forget the architecture side of this – we are defining a solution that needs to cover our aspects – That looks at risk relating to People, Processes & Roles, Tools & Technology, Organization, and Information. Product managers have P&L responsibilities – so naturally the continuation of business should be of interest to both.
Summing it up
If you have a business, how important is it to you that it continues? If its important at all, then why not spend a little bit of time protecting it, and rather than blindly running through a paper exercise, really think about what you need to do to protect it. Maybe take in some key resources to work together and take a structured approach. It may be that your BCP exercise yields unexpected results, and improvements to your architectures.