When engineers were forced to respond to this wave of change, it became clear that traditional maintenance methods were no longer adequate - a new approach to equipment maintenance was required. The commercial aviation industry was the first to realise that change was necessary and committed significant resources to developing a solution in the 1960s and 1970s. The results entered the public domain in 1978 under the name "Reliability Centred Maintenance" or ”RCM” .
In industry generally, there is now extreme pressure on the maintenance function to deliver maximum performance at minimum cost.
This new understanding of equipment failure made the civil aviation industry realise that their existing maintenance regime was flawed and even caused failures, some with catastrophic consequences.
A new approach to equipment maintenance was essential and subsequently developed. The end result was what we now call Reliability Centred Maintenance - RCM.
Third Generation: Today there is a vast, and even bewildering, range of highly advanced maintenance techniques available.
The problem for maintenance engineers (besides learning what the available techniques are in the first place) is knowing which techniques are appropriate for which equipment and how often to use them.
RCM helps with this enormously.
Each of these questions is considered in more detail in the following sections.
A similar logic will apply to many of the engine’s other failure modes, resulting in two quite different maintenance schedules for the very same engine.
A function is a statement of what the user of the equipment wants it to do and to what standard of performance.
A complete set of functions for a piece of equipment represents the objectives of its maintenance schedule. The user will be very happy if the maintenance schedule keeps the equipment performing its functions!
The function list is the foundations for the remainder of the RCM analysis and the RCM analysis group will use it to deduce exactly what is meant by failure. From this ‘failed state’, the RCM analysis group can list the failure modes that could cause each failed state.
A system’s “primary function” is usually obvious, easy to determine and normally states why the system was purchased in the first place.
However, most systems are expected to perform other “secondary functions” which represent the user’s requirements for environmental and safety integrity, protection, control, economy, appearance, etc.
By documenting functional failures, the RCM analysis group defines the “failed state” (i.e. exactly what is meant by “failed” and “partially failed”) for the equipment’s operating context.
The functional failures are the starting point for the RCM analysis group to identify the failure modes that could cause the equipment to be in the failed state.
However, any unlikely failure modes that have extremely severe consequences would also be considered.
When writing failure modes, it is important to identify the cause of the failure in sufficient detail so that the RCM analysis group can identify appropriate maintenance later in the RCM process (using the RCM maintenance task selection logic).
Insufficient detail may well mean that appropriate maintenance tasks are missed, rendering the analysis ‘superficial’ (and possibly dangerous). On the other hand, if failure modes are identified in too much detail the RCM analysis group could end up wasting time unnecessarily.
The first 4 questions of the RCM process make up the information gathering phase. The answers to these questions document what the equipment or system should do (functions), how it could fail (functional failures), what causes it to fail (failure modes) and what problems result (failure effects) when it does fail.
This information becomes an excellent equipment reference which can subsequently be used to support a safety case, act as an audit trail, produce a comprehensive fault-finding guide and be the starting point for determining spare parts provisioning and how to work-around problems that arise when failures occur.
Hidden failures are usually associated with equipment or systems that provide some sort of protection (e.g. a boiler pressure relief valve). Hidden failures on their own do not have any direct consequences but they leave the protected equipment or system without the protection that they should have - in the case of a pressure relief valve failing closed, the boiler may explode if a second failure causes the boiler to over-pressurise.
There are many ways in which a failure with Operational consequences can incur costs; these include lost production, increased operating costs, degradation in product quality, poor customer service, etc.
If it is practical to monitor for point P and the P-F interval is long enough for action to be taken to reduce, avoid or eliminate the consequences of failure then it may be possible to do the condition monitoring task.
If failure cannot be predicted as it begins to occur, then RCM looks to see if it can be prevented from occurring. This would mean performing some sort of intervention before a failure even begins.
In the RCM task selection logic, the available choices are:
Scheduled restoration and scheduled discard tasks are carried out before the wear-out zone (i.e. towards the end of “life”, which is the age at which its conditional probability of failure begins to rise rapidly).
Sometimes these tasks are carried out earlier (i.e. the task is carried out more often) if the consequences of failure are very severe. This will increase the frequency of the scheduled task and provides a “safety factor”.
In addressing this question, RCM takes special note of the consequences of failure. For example, where the consequences are purely economic, RCM permits No Scheduled Maintenance (or Run-to-Failure) as a valid default action; however, doing nothing is not an option if the failure mode has safety or environmental consequences.
The possible default actions are:
RCM analysis group members are drawn from equipment maintainers, operators, possibly manufacturers/suppliers and occasionally specialists. The most important factor is that they know and understand the equipment being analysed using the RCM process.
The aim is to reduce the size of the “black hole” in knowledge (i.e. the black area in the box representing “all there is to know about the equipment” in the diagram). Inevitably, there will be some gaps in the group’s combined knowledge, but at the end of the RCM analysis each group member will usually have acquired useful knowledge about the equipment from other members of the group.
Under the guidance of the RCM facilitator, the group follows the RCM process. The outputs of the analysis are:
When the RCM analysis is complete, the output should be audited by whoever has overall responsibility for the equipment or system. This is so they can satisfy themselves that the analysis has been carried out correctly and that it is both sensible and defensible.
The final step is to implement the results of the RCM analysis when the audit is complete.
Quality - Improved quality due to:
Life-Cycle Cost - Reduced life-cycle costs by optimising the maintenance workloads and providing a clearer view of spares and staffing requirements
Equipment Life - Longer useful life of expensive items due to an increased use of On condition maintenance techniques
Maintenance Data - A comprehensive maintenance data base which:
Motivation - Greater motivation of individuals, particularly those involved in the review process. This gives improved understanding of the equipment in its operating context and wider "ownership" of the resulting maintenance schedules
Teamwork - Better teamwork brought about by the highly-structured group approach to analysing and addressing maintenance problems.
There are a number of different RCM variants available; the most widely applied version world-wide is RCM2 (promoted by the Aladon Network). Mutual Consultants use RCM2 exclusively.
You may like to visit rcm.uk.net where you can find more information on RCM and download free RCM-related tools such as screensavers and browser toolbars.
The world of equipment maintenance changed dramatically during the second half of the 20th century and it continues to do so today.
Several major influences have been responsible for driving these changes:
Looking back to the 1930s, we can divide up the years since then into three “generations”. We can then examine the expectations placed on the maintenance function in each of the three generations as follows:
First Generation: prior to the Second World War, equipment was relatively simple and over-designed, so it tended to be reasonably reliable. The failures that did occur didn’t matter too much and were quick and easy to repair. There was little need for the planned maintenance systems that are commonplace today.
Second Generation: the Second World War quickly led to increased demand for many types of manufactured goods and severely limited the supply of skilled labour to industry. In response, factory equipment became more mechanised and more complex. Failures (and their downtime) began to matter more so “preventive” maintenance systems were developed in an attempt to prevent them - usually these were fixed interval overhauls.
Third Generation: the last 30-40 years have seen an enormous increase in demand for manufactured goods and mass transportation. Industry responded with ever more automation and complexity in order to reduce the manpower needed to meet this demand; this in turn greatly increased costs of ownership and maintenance costs.
We can also look back at what was generally understood about the way in which equipment behaved and failed over the same three generations:
First Generation: it was widely believed that new equipment had a very low probability of failure and that this remained the case for a long period of time. After a certain age, the equipment would "wear-out" and, therefore, become more likely to fail.
Second Generation: an understanding of the concept of "infant mortality" led to the notion of an initial high probability of failure (which quickly settled down), followed by a long period of low failure probability before wear-out resulted in equipment becoming more likely to fail. Plotting conditional probability of failure against time on a graph produces the classic "bathtub curve".
Equipment maintenance consisted of nursing the equipment through the "bedding in" phase and then overhauling (or replacing) it before it reached the wear-out phase.
Third Generation: in the 1960s and 1970s the civil aviation industry undertook an extensive research project into the ways in which equipment behaves and, in particular, how it fails. This research showed that only 4% of civil aviation equipment failures actually fitted the classic bathtub failure pattern and that there were, in fact, an additional five failure patterns - most failures in the aviation industry conform to the sixth pattern.
The maintenance techniques available to engineers have grown in number and complexity over the three generations:
First Generation: the only real option was to leave equipment running and fix it if it failed.
Second Generation: the pressure for output fuelled demand for higher equipment availability. This in turn led to the development of the first “preventive maintenance” systems. Large and cumbersome (by today's standards) computers were introduced into the maintenance function in order to manage these systems.
Having examined how the changing world of maintenance drove the development of RCM, the rest of this article describes what it is in some detail and explains Mutual Consultants’ role in assisting companies with its application.
The developers of RCM took the unusual view (at the time) that the objective of equipment maintenance should be to keep the equipment doing whatever its users want it to do, rather than to prevent failures for the sake of preventing failures.
With this emphasis on preserving what the user wants, Moubray defines RCM as:
A process used to determine what must be done to ensure that any physical asset continues to do what its users want it to do in its present operating context.
It is, therefore, no surprise that determining the operating context and what the user wants the equipment to do is the starting point for the RCM process, which is applied by asking and answering the following seven questions:
What are the functions and associated performance standards of the asset in its present operating context?
In order to answer question 1, it is important to have a clear understanding of the operating context of the equipment being studied. This is because the operating context can influence what should be defined as failure and, therefore, whether a maintenance task is worthwhile.
For example: consider a small diesel engine used to power trains. This engine could be the only engine in a two-car train or it could be one of eight in a much longer train. These are very different operating contexts which will result in two very different views of what constitutes failure.
If a cooling water pump fails, the engine will eventually overheat and its protection will shut it down. On the two-car train this will result in very serious operational consequences because the train will come to a halt mid-journey. On the eight-car train it will result in a loss of 12% of its traction power. The train will continue to its destination with only a minor delay.
Any maintenance task for the cooling water pump that is considered in the RCM analysis will be much more likely to be evaluated as worthwhile on the two-car train than on the eight-car train.
In what ways does it fail to fulfil its functions?
A system or piece of equipment is said to have ‘failed’ if it is unable to perform its intended function(s) to the desired standard of performance. This includes partial failure (as well as complete failure) where the equipment still functions, but not to an acceptable standard (e.g. it may be operating too slowly or producing poor quality).
What causes each functional failure?
A failure mode is any event which is reasonably likely to cause a functional failure. “Any event” is not limited to equipment failures caused by wear and tear or deterioration (sudden or slow), but also includes human error, poor procedures and design issues.
“Reasonably likely” (i.e. credible) failure modes fall into the following broad categories:
What happens when failure occurs?
The RCM analysis group needs to have sufficient information so that they can make robust decisions about how to manage each failure mode.
In particular, the effects of each failure (i.e. what would happen when the failure occurs if nothing was done to prevent it) are required. This information allows the RCM analysis group to answer the questions posed in the RCM decision logic.
The failure effects record the problems (e.g. any undesirable/costly events) that the RCM-derived maintenance schedule is intended to manage (i.e. predict or prevent).
The failure effects should, therefore, contain the following information:
What can be done to predict or prevent each failure?
Once each failure mode has been categorised according to the consequences of failure, a structured decision logic is used to select maintenance tasks. The RCM decision logic first looks to see if it is appropriate to perform a scheduled task to predict when the failure mode is going to occur.
If such a task is not appropriate, RCM then considers whether the failure should be prevented by regularly restoring the item’s original resistance to failure before it fails and if not, whether a scheduled replacement of the item (before it fails) is appropriate.
This entails monitoring the equipment in order to identify a detrimental change (i.e. a warning) that indicates that the failure is in the process of happening (early enough so that action can be taken before the failure actually occurs). This is known as Condition-based Maintenance or Condition Monitoring.
How often the equipment needs to be monitored is governed by the time it would take from when the warning can be identified to the point at which full failure occurs. This is illustrated in the diagram below: the warning is shown at point P (Potential failure) and the full failure occurs at point F (Functional failure).
The monitoring task should be carried out at an interval which is less than the time between P and F (know as the P-F interval).
What should be done if a suitable proactive task cannot be found?
The RCM task selection logic ensures that proactive tasks are identified only for those failure modes that need them. When a suitable proactive task cannot be found there still remains the question of what else could be done in order to manage the failure mode.
RCM has been applied in a wide range of industries in most countries throughout the world. Correctly applied, RCM produces a maintenance schedule that is optimised for the equipment in its operating context; the aim is to achieve inherent levels of equipment reliability and availability. The RCM derived maintenance and the process itself bring about the following benefits:
Safety - Greater safety and environmental protection due to:
Performance - Improved operating performance due to:
Cost Effectiveness - Greater cost effectiveness due to:
Our role is to impart an understanding of RCM to clients and provide support and guidance in its application; our goal is for clients to become competent to apply RCM themselves.
This is achieved via a combination of: