Additional commentary has been added by Jon Speer, where noted.
NOTE: The below article makes it seem like I hate FMEAs. Au contraire: I love FMEAs, if they are performed correctly! And correctly in the medical device industry is slightly different than the original automotive/ aerospace standards that created the FMEA methodology.
NOTE 2: Thanks to my risk management clients who have laughed but thumbs-upped this analogy enough times for me to finally write an article!
"Our risk management process is.... FMEA."
If I hear that phrase one more time, I promise to quit my medtech regulatory consulting career and become a Bahamian fisherman in the Exumas (note: I may do this regardless, so I'm just putting it out there like in "The Secret").
In working with dozens of companies over the last few years remediating risk management files and processes, no acronym is used more than ye olde F-M-E-A. "Fuh-me-uhs", "EFF-EM-E-EYS", and my personal favorite - "FUHMEEKAS" (the 'C' for criticality), are often the first tools in the auditee's toolbox, plopping the ten pound Excel file on the desk with a smug look. However, missing from 90% of these FMEAs is the linear, closed-loop, progressive risk assessment sequence of events that ISO 14971 specifically calls out. To paraphrase, in most FMEAs that I see, "everything leads to death." Catheter delivery systems: kinking you say? Death. Poor guide wire lubricity? Death. Wrong color? Death (exaggerating, but you get the point).
The simple issue with FMEA as a standalone risk assessment tool is that - by definition - FMEA is a single fault failure analysis technique that doesn't accurately capture the true progression of how a hazard becomes a harm, the most critical factors in risk determination. Let's look at a stupidly simple, but hopefully effective, example.
[Jon's comments: Agree 100%. Risk management is about assessing hazards and hazardous situations. Rarely does a hazardous situation result from a single fault failure. Additionally, hazards and hazardous situations most definitely occur from normal use conditions as well.]
THE SPOILED FOOD RISK ANALOGY
Thanksgiving 2017. You had an excellent meal of turkey, mashed potatoes, gravy and other goodies. Your family comes to visit and you all leave town and rent a cabin up north for a week to continue the bonding session. Upon driving up to your house seven days later, it hits you: THE FOOD IN THE FRIDGE IS PROBABLY SPOILED! Since you're a medical device expert, you automatically think: let's tackle this with 14971, right?!
So here is where the analogy gets....... weird/ interesting.
You determine that the hazard in this scenario is "Spoiled food." Since hazard = a potential source of harm, its reasonable to think that harm may visit you if you eat the leftovers in the fridge. You consume spoiled food, you may become sick and nauseous; so lets make the harm in this situation "Sickness" or "Nausea."
If I ran an FMEA in this situation, you may decide that the spoiled food, or something that leads to the spoiled food (design: fridge breaks; use: Dad leaves the fridge door open) is the failure mode, and like in most FMEAs, you would deduce that the failure mode leads to the harm - Sickness or Nausea. But question: do you automatically become sick when you walk through the door? Why not? Because, like 14971 makes clear, something needs to happen in order for that hazard to become a harm. That something is called the hazardous situation!
In this case, the hazardous situation is the eating of the spoiled food!
So let's recap:
- The hazard is the spoiled food in the fridge.
- If consumed, the spoiled food may make you sick, which is the harm.
- In order for the hazard to become a harm, the hazardous situation of eating the food has to occur, otherwise the hazard never materializes into you getting sick (the harm)!
Like I said earlier, you don't walk through the door and BOOM - you get sick. Steps have to occur in a sequential manner in order for risk to occur. In an FMEA, an overestimation of risk happens frequently because the assumption that a failure mode automatically leads to a harm doesn't accurately describe the sequential progression of risk!
The above has huge implications on risk estimation. Let me guess, you're still using probability of occurrence of the failure mode to do Occurrence/ Probability/ Frequency assessments within your risk estimation, right? How does your Company justify this if risk is defined as the probability of occurrence of HARM and the consequences of that harm?
Risk = Severity of Harm (S) x Probability of Occurrence of Harm (POH).
...and if you've been following closely, you'll notice a problem: how can you assess the probability of a harm if you're using FMEA to do a 1:1 CORRELATION BETWEEN FAILURE MODE AND HARM OCCURRING? The answer: you can't!
[Jon's comments: Keep in mind the ISO 14971 definition of "risk": combination of the probability of occurrence of harm and the severity of that harm.]
This brings us to my two favorite designations in ISO 14971: P1 and P2.
We know that Risk = S x POH.
- Severity: severity is usually assessed as the consequences of a clinical event or harm to a patient or user (or environment). It is usually a single factor value, assessed qualitatively as low, medium, high or any other language that companies employ (catastrophic, moderate, minimal, etc.).
- Probability of Occurrence of Harm: hidden in plain sight within 14971 Annex E is something called the risk "sequence of events." So if you think I've been BS-ing you until now, well, here's the proof. Let's look at E.1, which describes our new friends P1 and P2:
As you can see, although Severity is a single factor, POH is broken up into two distinct variables called P1 and P2:
- P1: Probability of a hazardous situation occurring: what's the probability that all the stuff leading up to a hazardous situation occurs? In our fridge example: whats the probability that someone eats the food in the fridge (hazardous situation)? You remembered in the drive way that the food was spoiled, there was a bad stench when you walked in the house that alerted you to the decomposing turkey, and you would most likely have seen the mold growing on the cornbread. So in this scenario, the P1 probability of the hazardous situation occurring is probably LOW.
- P2: Probability of hazardous situation leading to harm: what's the probability that, even if the hazardous situation has occurred (you ate the food!), the harm will occur (i.e. you do get sick)? In our fridge example: let's say for some reason you arrive home inebriated and you ignore all rational cues and eat the food. Does that automatically mean that you get sick? You have a strong immune system, you notice right away that the food didn't taste good so you take some Tums, and you go to bed and avoid the worst symptoms. In this situation, the P2 likelihood that the harm will occur (i.e. that you'll actually get sick even if you ate the food) is also probably LOW.
Our POH for the fridge example is then a combination of the P1 AND P2 values: P1 x P2 (LOW x LOW) = a LOW POH. Plain English:
The probability of occurrence of you getting sick (harm) from eating (hazardous situation) spoiled food (hazard) is LOW. Depending on the severity of the sickness, your overall risk may be low or maybe slightly higher....
Let's assess this using FMEA now instead. Let's say that the main reason that the food got spoiled is because the fridge broke (failure mode, design). That failure mode leads to a hazard being present in the "system" - spoiled food. We already did the hazard analysis on this and determined that the overall risk is probably low because the P1 and P2 were low.
And voila, the CORNERSTONE OF THIS ARTICLE:
If we did FMEA on this example like most medical device companies, it would look like this:
Fridge breaks (failure mode) --> Sickness (harm). So if the design failure mode happens often, the FMEA inaccurately correlates it to a person getting sick, 100% of the time! If the FMEA had its way, you'd be driving to your cabin, and as soon as the fridge breaks, you'd get sick! This is a poor reflection of the actual outcomes!
[Jon's comments: No, "detection" (a classic FMEA component) does not have a place in this example. If you detecting the broken fridge, then would this reduce the overall risk? Yes, it would. But detecting this actually reduces the probability of the hazardous situation.]
Long story short: an FMEA that does not include the Annex E elements of hazard --> hazardous situation --> harm does not present an accurate risk management process.
[Jon's comments: FMEA-based risk management practices may have served you well up until now. Realize, however, that if your risk management process is not aligned with ISO 14971, then this will present issues going forward. The entire medical device regulatory world has accepted ISO 14971 as THE standard for risk management. ISO 14971 is also a significant aspect of the revised ISO 13485:2016 as the accepted methodology for risk-based QMS and decision-making processes.]
I've seen many companies use a hybrid FMEA that incorporates a hazard analysis very effectively. It. Just. Makes. Sense.
Moral of the story: upgrade your fridges, people.