System Safety (was Best EFIS in Glass Cockpit Forum)

Flying Scotsman · Nov 11, 2010

Just for context purposes, here's where we were on the other thread over in the Glass Cockpit area:

Originally Posted by rleffler
I choose not to have a software bug or hardware failure take out all the systems. Independent systems will still function in a known state.

Click to expand...

This gets in to an area I work with quite a bit, safety-critical software. I'll spare you all the definitions of what constitutes a "hazard", etc., or even what we mean by "safety-critical" (versus "mission-critical", and so on).

This probably doesn't really mean much for what are pretty simple systems (EFISes) by comparison, but there have been considerable studies done on the idea of "different software done by different teams". What was found was that the teams made the same sorts of design "errors", or more properly, failed to deal with unsafe conditions *in the same way*.

Here is noted expert on software safety Nancy Leveson from MIT:

To cope with software design errors, ?diversity? has been suggested in the form of independent groups writing multiple versions of software with majority voting on the outputs (like modular redundancy in hardware). This approach is based on the assumption that such versions will fail in a statistically independent manner, but this assumption has been shown to be false in practice and to be ineffective in both carefully controlled experiments and mathematical analysis [14,15,16]. Common-cause (but usually different) logic errors tend to lead to incorrect results when the various software versions attempt to handle the same unusual or difficult-to-handle inputs. The lack of independence in the multiple versions should not be surprising as human designers do not make random mistakes; software engineers are not just monkeys typing on typewriters. As a result, versions of the same software (derived from the same requirements) developed by different people or groups are very likely to have common failure modes?in this case, common design errors.
In addition, such redundancy schemes usually involve adding to system complexity, which can result in failures itself. A NASA study of an experimental aircraft with two versions of the control system found that all the software problems occurring during flight testing resulted from errors in the redundancy management system (which was necessarily much more complex than the original control software). The control software versions worked perfectly [17].
Software Challenges in Achieving Space Safety by Nancy Leveson. Journal of the British Interplanetary Society, Vol. 62, 2009

Do not think that if you have some sort of non-internal fault as your "hazard" that *both* systems will not fail to handle the condition in a safe manner.

Now, a problem with an OS, or a circuitry problem, or something like that, yes...but then, that alone wouldn't require independent vendors, would it?

Further, safety is an *emergent property* of the system as a whole, and cannot be evaluated on a component-by-component basis. And that system includes your vehicle, you, the ground components, etc.
__________________
Steve
Santa Clarita, CA
PP-ASEL, ASES, Instrument Airplane

Empennage and wings done (except fiberglass)
Lyc YIO-360 (mounted) + Hartzell 74" CS prop
Starting cowling; avionics arriving!

--------------------------------------------------------------------------------

Next post:

----------------------------

Steve, you bring up an issue I've been looking at for some time and still don't have the correct words to verbalize it succinctly.

Here is what I?m thinking:

Which is safer, airplane A or airplane B?

Airplane A:
Primary and backup flight instruments, electrical system, etc.
Either two EFIS units or an EFIS unit with steam gauge backups and multiple electrical power sources, including dual batteries and generators. This includes the required wiring designed to limit back feeding/battery draining situations and the necessary switches to control it.

Airplane B:
One set of flight instruments, either steam gauges or an EFIS. Simple electrical system with no ?E-Buss? or backup battery.

When Airplane A has a problem, how much time does the pilot spend resolving the conflict and getting on with the business of flying vs. Airplane B?

Could the complexity of Aircraft A when something goes Tango Uniform actually make for a less safe aircraft when compared to the simple Aircraft B? When something goes wrong with Aircraft B, the pilot has to get on with flying whereas the pilot of Aircraft A may spend more time trying to debug a situation, which could be fatal.

Thoughts?

Flying Scotsman · Nov 11, 2010

So here's my answer...

It depends. You first need to identify the hazards you are trying to protect against.

"A hazard is a condition or changing set of circumstances that presents a potential for injury, illness, or property damage. It is the potential or inherent characteristics of an activity, condition, or circumstance, which can produce adverse or harmful consequences."

You can't talk about safety until you've defined ALL of the hazards you wish to mitigate or prevent.

rleffler · Nov 11, 2010

I would agree with Steve.

This is one reason why I decided on using the Vertical Power VP-200. It allows me to mitigate some of the complexity with redundant equipment and buses. It does through allowing me to pre-defined multiple failure scenarios and automating certain tasks. It won't cover everything, but it's a step in the right direction.

I would again point folks to the Paul Dye's article I referenced in the other thread. The answer is going to differ with each of us. There is no one size fits all solution.

You can find Paul's Equipment Redundacy - What is Enough? article by following the link.

Flying Scotsman · Nov 11, 2010

Paul's article is talking about Fault Tree Analysis, just without using the words. And he does a very good job of explaining it, as well.

FTAs (and their counterparts, FMEAs and FMECAs) are stock-in-trade for engineering (at least, the kind we do).

What I was pointing out was that the proposed *mitigation* (in this case, relying on independent software solutions) has to be evaluated to determine if it's actually mitigating the hazard or not.

E.g., let's say you want to mitigate a loss of attitude knowledge due to severe turbulence in clouds, which has bounced you around to an extremely unusual attitude, quickly. Assuming that two different EFISes will not behave the same and at least one will still show correct attitude *may* not work, as the common mode failure may be that both systems are unable to react that quickly, due to either software designs or hardware limitations, and thus BOTH will fail to mitigate the hazard.

That's what Leveson's research was getting at...independent solutions may fail to mitigate the same hazard, for various reasons (and as she noted, introduce additional system complexity which has its own failure modes).

N-version programming has been demonstrated not to be anywhere near as effective as people think it *should* be.

jdeas · Nov 11, 2010

Steve,
Would it be fair to say that a steam gauge backup to an EFIS would have the following traits?

More likely to have a single system failure
Less likely to have a dual system failure

Ron Lee · Nov 11, 2010

My guess is that any problem suggested here is far outnumbered by other causal factors, eg running out of fuel, VFR - IMC, loss of control, CFIT, etc to be insignificant.

Fix the big killers.

Flying Scotsman · Nov 11, 2010

jdeas said:
Steve,
Would it be fair to say that a steam gauge backup to an EFIS would have the following traits?

More likely to have a single system failure

Less likely to have a dual system failure

Ah, my beer-drinking friend...this sounds like a conversation over a Red Ale at our local microbrew tonight

The question is ill-posed...you have many components with different failure modes, etc.

It's also a question of reliability...maybe I could work up a PRA on it

(ETA: BTW, I'm fairly sure that if I *did* do a PRA on a combined EFIS + "steam gauge" system, the driver would NOT be the electronics, but would turn out to be the vacuum systems failing due to vacuum pump low MTBFs).

rv8ch · Nov 12, 2010

The big ones

Ron Lee said:
My guess is that any problem suggested here is far outnumbered by other causal factors, eg running out of fuel, VFR - IMC, loss of control, CFIT, etc to be insignificant.

Fix the big killers.

Gotta agree with Ron here - a dead EFIS is waaaaay down the list of reasons for NTSB reports, last time I checked.

penguin · Nov 12, 2010

I think Steve has hit upon a very valid point, its easy to sit down with pencil and paper and to convince yourself that a dual EFIS, dual alternator, dual battery system is the only way ahead as it has to be "safer", but what have you actually mitigated that a much simpler system does not do? Ron's observation is equally valid - all the avionics in the World won't help if you run out of gas or into a mountain.

My vote is for simple systems built from the best quality components that I can run to.

Pete

Sig600 · Nov 12, 2010

My take, is that it's a double edged sword.

Look at accident statistics for light twins, wouldn't one think twice the engines, twice the safety? Or are you a twice the likelyhood of failure person?

Conversely you can build the most complicated panel in the world, which could be construed as either:

1.) very redundant and very safe
or
2.) more components and more prone to failure

Like was mentioned I think you need to look at your mission requirements, identify the hazards, and plan from there.

Personally, I want my RV compatible with my skill/experience, i.e. full IFR with WX and precision approach capabilites (500 knot cruise would be nice).

I also want EFIS and synthetic vision. My risk mitigator? I'll have dual screens, dual AHRS, and a battery back up for all. Steam guage backup? I've had more vacuum pump failures than any other failure.

Although even the most capable plane in the world can't fix stupid (CFIT, VMC->IFR). Approaches to mins in mountainous terrain, in winter weather at night? No thanks.

Flying Scotsman · Nov 12, 2010

I knew an instructor who used to put all of his new students, regardless of prior flight experience or ratings, into a simulator (simulating IMC) for one of the first lessons. Then, he'd fail the vacuum system...realistically. Not just "up and died", but slooowly died.

He told me that *every single one* of his new students followed the AI right down into the ground. Every. Single. One.

No vacuum system on MY airplane, thanks.

B25Flyer · Nov 25, 2010

I believe a significant amount of the effort in designing a panel needs to be focused on the Human Machine Interface. Make is simple.

Unfortunately pilots buy hardware based on the feature list and since there is a perception that more is better. EFIS manufacturers try to add features and functionality to their boxes and the result is that often they cross a line and they become confusing, especially in a high stress environment, like when a system failure is in progress.

I occasionally fly an RV-7 with Dual high end EFIS (mfr name withheld), G-430W, G-496, Scorcerer, SL-30 yadda yadda yadda.... This is an incredibly powerful panel. It has steam gauge A/S, Alt, & T/B backups.

The danger in this airplane is not the reliability of the equipment.... The panels are crowded with too much information, there are too many ways to decide what to view, what information is being displayed and what is driving the A/P. The possibility of information overload is high when everything is working. Add the stress of a system failure and it would be very difficult to sort out a problem.....

One of the simpliest system integration issues is the CDI button on the Garmin 430, and yet it is #1 error that I see on IPCs, initial instrument checkrides, and even FAR 135 Checkrides being flown by professional pilots in airplanes they fly regularly....

Personally, my Rocket has a G-430W driving my Dynon and a Digiflight II. My Baron has a G-430W driving an Aspen and a Century III. The way the GPS, HSI and A/P integrate is totally different in these two airplanes.... I fly the Rocket 200 hrs per year and the Baron 100... I am fairly proficient, but I still struggle with this sometimes

Autopilots are now starting to incorporate straight and level buttons, i think EFIS mfgrs should start to encorporate a KISS, or B2B, (Back to Basics) Button. Pushing this button clears the panel of everything but the most basic information to reduce the datastream going to the pilot in high stress situations....

So my advice is, make it simple. In many cases, less really is more, and it is almost certainly safer....

Tailwinds,
Doug Rozendaal
F1

delusional · Nov 27, 2010

which system is failing?

Ron Lee said:
My guess is that any problem suggested here is far outnumbered by other causal factors, eg running out of fuel, VFR - IMC, loss of control, CFIT, etc to be insignificant.

Fix the big killers.

Clearly, there is a failed subsystem, but as Ron points out it is Us. Or perhaps it is We.. dunno.

Anyway, can it be a coincidence that with part 121 the pertinent and simple redundancy that results in better outcomes is the hot, load-balancing and failover operator.

How many NTSB reports would be prevented by a professional FO in the right seat? But not so easy to implement for us.

Better hardware systems and redundancy only help in a few cases. Mostly we need to make ourselves better pilots and that seems to mainly mean better decision-makers. Maybe we need the help of the business schools? Sorry, just brainstorming here...

terrye · Nov 27, 2010

Failure Modes

I'd like to put this question to the group:
If you have system redundancy, for example two different brands of EFIS, or an EFIS backed up with an analog horizon, airspeed and altimeter, how do you decide which system to trust if they are giving you different information?

I'm not talking about one EFIS with a big red X through it, or an analog vacuum horizon that is tumbling. I mean EFIS 1 shows a different heading, airspeed or altitude from EFIS 2. You have redundancy, and maybe both systems are equally reliable, but if you are in IMC how do you as the biological computer decide which mechanical computer is lying to you?

Bavafa · Nov 28, 2010

terrye said:
I'd like to put this question to the group:
If you have system redundancy, for example two different brands of EFIS, or an EFIS backed up with an analog horizon, airspeed and altimeter, how do you decide which system to trust if they are giving you different information?

I'm not talking about one EFIS with a big red X through it, or an analog vacuum horizon that is tumbling. I mean EFIS 1 shows a different heading, airspeed or altitude from EFIS 2. You have redundancy, and maybe both systems are equally reliable, but if you are in IMC how do you as the biological computer decide which mechanical computer is lying to you?

This was exactly one of my questions and dilemma when I was planning my panel. I wanted redundancy but not relay on one brand. What I concluded was two different brand can create more of an issue in cases of not a complete failure mode. The second issue with two different brand is proficiency and learning, not to mention capability and wiring headache.

My panel consist of a dual and redundant EIFS which has dual AHAR/mag with a set of steam gauges that are my back up. If the EIFSs misbehave and don't agree with each other, I am going to revert back to the steam gauges till I am on the ground or see which EIFS agrees with the steam gauge and use that one.

But I view the steam gauges a necessity as we can always have electrical problem with will not matter how many EIFS you have, unless they have battery backup which introduces its own set of issues.

rleffler · Nov 28, 2010

Bavafa said:
This was exactly one of my questions and dilemma when I was planning my panel. I wanted redundancy but not relay on one brand. What I concluded was two different brand can create more of an issue in cases of not a complete failure mode. The second issue with two different brand is proficiency and learning, not to mention capability and wiring headache.

Unfortunately, I think one component needs to be of a different vendor to eliminate the very small chance of a software glitch taking out both EFIS at the same time. Fortunately, most of the EFIS vendors are getting better in error management. I agree that you have more to learn and it increases the complexity.

Bavafa said:
My panel consist of a dual and redundant EIFS which has dual AHAR/mag with a set of steam gauges that are my back up. If the EIFSs misbehave and don't agree with each other, I am going to revert back to the steam gauges till I am on the ground or see which EIFS agrees with the steam gauge and use that one.

This is a great approach, you need the third component as a tie breaker in case your both your EFIS don't agree.

Bavafa said:
But I view the steam gauges a necessity as we can always have electrical problem with will not matter how many EIFS you have, unless they have battery backup which introduces its own set of issues.

I think this may start another religous debate with no correct answer, only an answer that is best for you.

My preference is to deal with an appropriate electrical design, as oppose to dealing with a vacuum pump. I've had a history of vacuum failures in my Cherokee and I believe an approriate electrical design suits my needs better.

There are several electrical designs that address this issue available in Nuckolls's book, or you can always implement one of the Vertical Power solutions.

As in most things in life, you need to determine which issues you what to deal with and which ones you don't. There isn't a one size fits all solution, but that's what keeps all this interesting. Mehrdad chose steam backups and I chose an electrical solution. Both work, both have advantages and disadvantages. Pick the solution the best meets your needs.

bob

Ironflight · Nov 28, 2010

The Man With Two Watches Dilemma

Yup - having two different instruments to measure the same parameter is always going to put you in a dilemma case without a tie breaker. That's just a logical fact. Fortunately, you can have dis-similar redundancy and alternate cues to help break the tie. I am a firm believer in dis-similar redundancy, which is why I like a separate EFIS and autopilot head (that can be linked for integration, of course) because it is generally going to give me different software, and even different sensor packages.

Alternate cues can come from a variety of things though - in a traditional six-pack, you use airspeed and VSI to help decide if you are climbing or descending, right? Bank can be indicated by heading changes. And of course, a loud rushing noise generally says that you have sped up and are headed for the ground.

But if you want pure EFIS instrumentation redundancy without ambiguity, you need an odd number greater than one....

Search

Search

System Safety (was Best EFIS in Glass Cockpit Forum)

Flying Scotsman

Well Known Member

Flying Scotsman

Well Known Member

rleffler

Well Known Member

Flying Scotsman

Well Known Member

jdeas

Well Known Member

Ron Lee

Well Known Member

Flying Scotsman

Well Known Member

rv8ch

Well Known Member

penguin

Well Known Member

Sig600

Well Known Member

Flying Scotsman

Well Known Member

B25Flyer

Well Known Member

delusional

Well Known Member

terrye

Well Known Member

Bavafa

Well Known Member

rleffler

Well Known Member

Ironflight

VAF Moderator / Line Boy