Risk Analyzing BPMN Models

People often make risk assessments without a structured approach relying on intuition or experience. In this blog I show some methods I use to quickly identify risks in BPMN models when formal risk methodologies are not in use (such as SABSA).

As mentioned before in previous blogs, I normally do a process overview in Archimate, and then align it to BPMN because BPMN is easy to understand, and creating BPMN diagrams forces us to think about who is doing what, when.

Figure 1 – An example BPMN process

Design vs Run-time

At the highest level I normally thing of things in terms of design time vs run time risks. Design time risks are the ones that are happening due to the structure of the process and the decisions we have made in process build and run time risks happen when we execute the process. I will talk a little more about this in a while but its important we consider both aspects.

Running Through The Process With TELOS

TELOS (Technical, Economic, Legal, Operational, Scheduling) is an acronym we use for feasibility checking usually. If anyone is interested I can blog on this later. for now – the basics; we look at each of the tasks (the square blocks on the diagram) and ask these questions:

Technical

  • Do we have the software / hardware in place to perform this task? If a task requires the user to modify a Visio file, does he have Visio? Are we relying on a specific platform? For example if we need to read something from CRM, what happens if that goes down?  Are we relying on a system that isn’t in place yet or is due to be decommissioned?
  • Is it clear how to do this? Having the correct level of detail in the documentation is important. If we are creating or saving files for example, have we said exactly where to locate them?
  • Does we have the skills & competencies in place to execute this? Can the person who is expected to perform the task actually do it or do they need training on how? Are they sufficiently trained if something should go wrong?

Economic

  • Are we doing this the most economic way? For example, a contractor may be more expensive to perform a task than an internal resource.
  • What would be the economic ramifications of this step going wrong? Would it potentially be a breach in the Service Level Agreement (SLA)? Could it result in a major incident, or financial penalties?

Legal

  • Are we violating any known regulation? For example, Non-EU teams may not be allowed to work with personal information under GDPR. Legal requirements may exist for information retention if we are working with financial data so we may need to ensure it is kept as part of a process.

Operational

  • Have we used the proper roles? Its important that the right person is doing the job. For example, you would not design a process to have a Business Architect install a server. Even if its a particular Business architect has the capability to do it, that kind of work most likely belongs to a technical specialist; assigning the wrong resources can cause all kinds of issues..
  • Do we have the resources in place? In order to execute the process, have we thought through how often the process will be run, and If we have sufficient resourcing?

Scheduling

  • Will we be able to do all this when the process goes live? For example, will the training be in place? will the resources be in place?
  • Can the step be performed in good time? would the duration it takes to perform the step force a breach in SLA when you add it up with the related steps?

There are many further questions we could ask around a design using TELOS as our guidelines, but these are questions I will typically ask when risk analyzing the design of a process.

Run Time Risks

When analyzing run time issues I do this very simply. I look at all the relationships between each of  the elements and i ask my self the following questions:

  • What happens if the communication never happens? This normally breaks a process unless a contingency is put in place. A simple example; If we order something that is never delivered – is it a risk we need to handle? We can always handle risk as part of our normal escalations process but sometimes we want to put steps in place to be a bit more proactive. We could have a timed event in our process that after a week checks to see if the delivery arrived and if not escalates with the supplier, so that we get the required deliverable before it becomes a critical issue.
  • What happens if the communication is wrong? If we send an order for parts that is incorrect then that’s just as bad as communication not happening – if not worse.  Do we need to put in steps to check the communication before it happens, or are we willing to accept the risk?
  • What happens if the communication is delayed? an order delayed may lead to a breach in SLA – do we need something in place to avoid that?
  • What happens if a resource isn’t available? as well as individual resources not being available for each step consider what happens if resources aren’t available to perform a role at all… if nobody is assigned for example

What About Events?

The things that start and stop our processes, and the events we trigger and receive during a process should be handled with the same questions I listed for run time risk.

Lets Be SMART

Its another acronym – in the slide above it was about goal setting but it applies equally to process – we cover some of them already in TELOS but for each task in our process:

  • Specific – Is it vaguely defined or can anyone understand exactly what it is that you need to do exactly. For example “check that the architecture is good” can be interpreted in many different ways – “Check the architecture against the ISO 42010 Checklist” is a bit better defined. In the associated documentation you would also state where it is of course. if we are vague in definitions there is a risk we will be misunderstood, leading to communication overheads, and possibly the wrong things being done (an overhead in work).
  • Measurable – In defining processes we should also be defining metrics to ensure that processes are healthy. Those metrics should be clearly defined and measurable. If we have a process step to deliver something from point A to point B – we should probably have a metric to understand the amount of time that takes – which can act as a key performance indicator. if these indicators aren’t defined its most likely a design related risk.
  • Agreed Upon – Its OK to build a process with 10 different actors involved but each one must agree. Even if you are the manager of those resources, this step is still important because they may identify issues in process you do not see. If something is not agreed upon – there’s a risk that the execution may not happen according to design.
  • Realistic – You could define a process step such as “Get Owen to eat Broccoli”. That’s fine. Now try getting me to do it. If you define a step like that, there could be an associated risk!
  • Time-Based –  You should have a fairly good idea of how long each process step takes. If you cannot define this, its likely a risk.

Summing It Up…

What I have presented here becomes very easy once you have done it a few times, and without exception whenever I have personally applied these techniques I have identified unconsidered areas of risk.  I haven’t talked about mechanisms for determining impact and probability, but if you follow these guidelines I believe you will have better more mature risk analysis. Remember its not a bad thing to identify risks. When you identify risks, its OK if business wants to except them – and if they don’t you have improvement actions and quality goes up. When you show a risk list with many risks on it shows that you have done a good job in design, and identifying risks is the first step to solving them. These techniques arent as comprehensive as a full risk management methodology, but they are significantly better than just trying to guess what risks may occur. I hope you find this useful, if so, let me know.