RCM in Oil Storage RCM in Power Distribution RCM on Railways RCM on Oil Platform RCM in Power Generation RCM in General Industry RCM in Shipping RCM on Floating Hotel RCM in Paper Industry

Reliability Centred Maintenance (RCM) RCM

When engineers were forced to respond to this wave of change, it became clear that traditional maintenance methods were no longer adequate - a new approach to equipment maintenance was required. The commercial aviation industry was the first to realise that change was necessary and committed significant resources to developing a solution in the 1960s and 1970s. The results entered the public domain in 1978 under the name "Reliability Centred Maintenance" or  ”RCM”  .

In industry generally, there is now extreme pressure on the maintenance function to deliver maximum performance at minimum cost.

This new understanding of equipment failure made the civil aviation industry realise that their existing maintenance regime was flawed and even caused failures, some with catastrophic consequences.

A new approach to equipment maintenance was essential and subsequently developed. The end result was what we now call Reliability Centred Maintenance -  RCM.

How Equipment Fails

Third Generation: Today there is a vast, and even bewildering, range of highly advanced maintenance techniques available.

The problem for maintenance engineers (besides learning what the available techniques are in the first place) is knowing which techniques are appropriate for which equipment and how often to use them.

RCM  helps with this enormously.

Advances in Maintenance Techniques

Reliability Centred Maintenance  (RCM

  1. What are the functions and associated performance standards of the asset in its present operating context?
  2. In what ways does it fail to fulfil its functions?
  3. What causes each functional failure?
  4. What happens when failure occurs?
  5. In what way does each failure matter?
  6. What can be done to predict or prevent each failure?
  7. What should be done if a suitable proactive task cannot be found?

Each of these questions is considered in more detail in the following sections.

A similar logic will apply to many of the engine’s other failure modes, resulting in two quite different maintenance schedules for the very same engine.

A function is a statement of what the user of the equipment wants it to do and to what standard of performance.

A complete set of functions for a piece of equipment represents the objectives of its maintenance schedule. The user will be very happy if the maintenance schedule keeps the equipment performing its functions!

The function list is the foundations for the remainder of the  RCM  analysis and the  RCM  analysis group will use it to deduce exactly what is meant by failure. From this ‘failed state’, the  RCM  analysis group can list the failure modes that could cause each failed state.

A system’s “primary function” is usually obvious, easy to determine and normally states why the system was purchased in the first place.

However, most systems are expected to perform other “secondary functions” which represent the user’s requirements for environmental and safety integrity, protection, control, economy, appearance, etc.

1. Operating Context and Functions

By documenting functional failures, the  RCM  analysis group defines the “failed state” (i.e. exactly what is meant by “failed” and “partially failed”) for the equipment’s operating context.

The functional failures are the starting point for the  RCM  analysis group to identify the failure modes that could cause the equipment to be in the failed state.

2. Functional Failures

However, any unlikely failure modes that have extremely severe consequences would also be considered.

When writing failure modes, it is important to identify the cause of the failure in sufficient detail so that the  RCM  analysis group can identify appropriate maintenance later in the  RCM  process (using the  RCM  maintenance task selection logic).

Insufficient detail may well mean that appropriate maintenance tasks are missed, rendering the analysis ‘superficial’ (and possibly dangerous). On the other hand, if failure modes are identified in too much detail the  RCM  analysis group could end up wasting time unnecessarily.

3. Failure Modes

The first 4 questions of the  RCM  process make up the information gathering phase. The answers to these questions document what the equipment or system should do (functions), how it could fail (functional failures), what causes it to fail (failure modes) and what problems result (failure effects) when it does fail.

This information becomes an excellent equipment reference which can subsequently be used to support a safety case, act as an audit trail, produce a comprehensive fault-finding guide and be the starting point for determining spare parts provisioning and how to work-around problems that arise when failures occur.

4. Failure Effects

Hidden failures are usually associated with equipment or systems that provide some sort of protection (e.g. a boiler pressure relief valve). Hidden failures on their own do not have any direct consequences but they leave the protected equipment or system without the protection that they should have - in the case of a pressure relief valve failing closed, the boiler may explode if a second failure causes the boiler to over-pressurise.

There are many ways in which a failure with Operational consequences can incur costs; these include lost production, increased operating costs, degradation in product quality, poor customer service, etc.

5. Failure Consequences

If it is practical to monitor for point P and the P-F interval is long enough for action to be taken to reduce, avoid or eliminate the consequences of failure then it may be possible to do the condition monitoring task.

Preventing Failure
If failure cannot be predicted as it begins to occur, then
  RCM  looks to see if it can be prevented from occurring. This would mean performing some sort of intervention before a failure even begins.

In the  RCM  task selection logic, the available choices are:

Scheduled restoration and scheduled discard tasks are carried out before the wear-out zone (i.e. towards the end of “life”, which is the age at which its conditional probability of failure begins to rise rapidly).

Sometimes these tasks are carried out earlier (i.e. the task is carried out more often) if the consequences of failure are very severe. This will increase the frequency of the scheduled task and provides a “safety factor”.

6. Proactive Tasks

In addressing this question,  RCM  takes special note of the consequences of failure. For example, where the consequences are purely economic,  RCM  permits No Scheduled Maintenance (or Run-to-Failure) as a valid default action; however, doing nothing is not an option if the failure mode has safety or environmental consequences.

The possible default actions are:

7. Default Actions

Applying  RCM 

RCM  analysis group members are drawn from equipment maintainers, operators, possibly manufacturers/suppliers and occasionally specialists. The most important factor is that they know and understand the equipment being analysed using the  RCM  process.

The aim is to reduce the size of the “black hole” in knowledge (i.e. the black area in the box representing “all there is to know about the equipment” in the diagram). Inevitably, there will be some gaps in the group’s combined knowledge, but at the end of the  RCM  analysis each group member will usually have acquired useful knowledge about the equipment from other members of the group.

Under the guidance of the  RCM  facilitator, the group follows the  RCM  process. The outputs of the analysis are:

When the  RCM  analysis is complete, the output should be audited by whoever has overall responsibility for the equipment or system. This is so they can satisfy themselves that the analysis has been carried out correctly and that it is both sensible and defensible.

The final step is to implement the results of the  RCM  analysis when the audit is complete.

What  RCM  Achieves 

Quality - Improved quality due to:

Life-Cycle Cost - Reduced life-cycle costs by optimising the maintenance workloads and providing a clearer view of spares and staffing requirements

Equipment Life - Longer useful life of expensive items due to an increased use of On condition maintenance techniques

Maintenance Data - A comprehensive maintenance data base which:

Motivation - Greater motivation of individuals, particularly those involved in the review process. This gives improved understanding of the equipment in its operating context and wider "ownership" of the resulting maintenance schedules

Teamwork - Better teamwork brought about by the highly-structured group approach to analysing and addressing maintenance problems.

Mutual Consultants’ Role 

There are a number of different  RCM  variants available; the most widely applied version world-wide is RCM2 (promoted by the  Aladon Network ).  Mutual Consultants  use RCM2 exclusively.

For more information about RCM2, please see our RCM2 page or open our RCM eBrochure.


You may like to visit rcm.uk.net where you can find more information on  RCM  and download free  RCM -related tools such as screensavers and browser toolbars.

If you would like to register your interest in place(s) on one of our  RCM  courses then please fill in the form on our  RCM  Training page.

Go to eBrochure Go to eBrochure

Maintenance Has Changed 

Increased Expectations 

The world of equipment maintenance changed dramatically during the second half of the 20th century and it continues to do so today.

Several major influences have been responsible for driving these changes:

Looking back to the 1930s, we can divide up the years since then into three “generations”. We can then examine the expectations placed on the maintenance function in each of the three generations as follows:

First Generation: prior to the Second World War, equipment was relatively simple and over-designed, so it tended to be reasonably reliable. The failures that did occur didn’t matter too much and were quick and easy to repair. There was little need for the planned maintenance systems that are commonplace today.

Second Generation: the Second World War quickly led to increased demand for many types of manufactured goods and severely limited the supply of skilled labour to industry. In response, factory equipment became more mechanised and more complex. Failures (and their downtime) began to matter more so “preventive” maintenance systems were developed in an attempt to prevent them - usually these were fixed interval overhauls.

Third Generation: the last 30-40 years have seen an enormous increase in demand for manufactured goods and mass transportation. Industry responded with ever more automation and complexity in order to reduce the manpower needed to meet this demand; this in turn greatly increased costs of ownership and maintenance costs.

We can also look back at what was generally understood about the way in which equipment behaved and failed over the same three generations:

First Generation: it was widely believed that new equipment had a very low probability of failure and that this remained the case for a long period of time. After a certain age, the equipment would "wear-out" and, therefore, become more likely to fail.

Second Generation: an understanding of the concept of "infant mortality" led to the notion of an initial high probability of failure (which quickly settled down), followed by a long period of low failure probability before wear-out resulted in equipment becoming more likely to fail. Plotting conditional probability of failure against time on a graph produces the classic "bathtub curve".

Equipment maintenance consisted of nursing the equipment through the "bedding in" phase and then overhauling (or replacing) it before it reached the wear-out phase.

Third Generation: in the 1960s and 1970s the civil aviation industry undertook an extensive research project into the ways in which equipment behaves and, in particular, how it fails. This research showed that only 4% of civil aviation equipment failures actually fitted the classic bathtub failure pattern and that there were, in fact, an additional five failure patterns - most failures in the aviation industry conform to the sixth pattern.

The maintenance techniques available to engineers have grown in number and complexity over the three generations:

First Generation: the only real option was to leave equipment running and fix it if it failed.

Second Generation: the pressure for output fuelled demand for higher equipment availability. This in turn led to the development of the first “preventive maintenance” systems. Large and cumbersome (by today's standards) computers were introduced into the maintenance function in order to manage these systems.

Having examined how the changing world of maintenance drove the development of  RCM , the rest of this article describes what it is in some detail and explains  Mutual Consultants’  role in assisting companies with its application.

The developers of  RCM  took the unusual view (at the time) that the objective of equipment maintenance should be to keep the equipment doing whatever its users want it to do, rather than to prevent failures for the sake of preventing failures.

With this emphasis on preserving what the user wants, Moubray defines  RCM  as:

A process used to determine what must be done to ensure that any physical asset continues to do what its users want it to do in its present operating context.

It is, therefore, no surprise that determining the operating context and what the user wants the equipment to do is the starting point for the  RCM  process, which is applied by asking and answering the following seven questions:

What are the functions and associated performance standards of the asset in its present operating context?

Operating Context
In order to answer question 1, it is important to have a clear understanding of the operating context of the equipment being studied. This is because the operating context can influence what should be defined as failure and, therefore, whether a maintenance task is worthwhile.

For example: consider a small diesel engine used to power trains. This engine could be the only engine in a two-car train or it could be one of eight in a much longer train. These are very different operating contexts which will result in two very different views of what constitutes failure.

If a cooling water pump fails, the engine will eventually overheat and its protection will shut it down. On the two-car train this will result in very serious operational consequences because the train will come to a halt mid-journey. On the eight-car train it will result in a loss of 12% of its traction power. The train will continue to its destination with only a minor delay.

Any maintenance task for the cooling water pump that is considered in the  RCM  analysis will be much more likely to be evaluated as worthwhile on the two-car train than on the eight-car train.

In what ways does it fail to fulfil its functions?

A system or piece of equipment is said to have ‘failed’ if it is unable to perform its intended function(s) to the desired standard of performance. This includes partial failure (as well as complete failure) where the equipment still functions, but not to an acceptable standard (e.g. it may be operating too slowly or producing poor quality).

What causes each functional failure?

A failure mode is any event which is reasonably likely to cause a functional failure. “Any event” is not limited to equipment failures caused by wear and tear or deterioration (sudden or slow), but also includes human error, poor procedures and design issues.

“Reasonably likely” (i.e. credible) failure modes fall into the following broad categories:

What happens when failure occurs?

The  RCM  analysis group needs to have sufficient information so that they can make robust decisions about how to manage each failure mode.

In particular, the effects of each failure (i.e. what would happen when the failure occurs if nothing was done to prevent it) are required. This information allows the  RCM  analysis group to answer the questions posed in the  RCM  decision logic.

The failure effects record the problems (e.g. any undesirable/costly events) that the  RCM -derived maintenance schedule is intended to manage (i.e. predict or prevent).

The failure effects should, therefore, contain the following information:

In what way does each failure matter?

RCM  recognises that maintenance is actually far more about preventing or mitigating the consequences of failure than about preventing the failures themselves. In this way  RCM  focuses maintenance spend where it will do the most good.

Some failures matter a great deal (i.e. they have severe consequences) when they occur and some failures hardly matter at all (i.e. they have insignificant or trivial consequences).

It is usually worth putting effort into predicting or preventing high-consequence failures, even if they occur infrequently. On the other hand, failures that matter very little are often tolerated, even if they happen relatively frequently.

This can be illustrated by considering a simple maintenance task: listening to a bearing for any signs of rumbling. The onset of any unusual rumbling noise tells us that the bearing has already started to fail and that it must be replaced in the near future (if we wish to avoid the failure occurring). By checking the bearing for unusual noise, we are not doing the task to prevent the bearing failure; we are doing it in order to avoid the consequences of failure (which might be expensive if, say, the engine is destroyed).

RCM , therefore, categorises each failure according to the consequences of failure as follows:

What can be done to predict or prevent each failure?

Once each failure mode has been categorised according to the consequences of failure, a structured decision logic is used to select maintenance tasks. The  RCM  decision logic first looks to see if it is appropriate to perform a scheduled task to predict when the failure mode is going to occur.

If such a task is not appropriate,  RCM  then considers whether the failure should be prevented by regularly restoring the item’s original resistance to failure before it fails and if not, whether a scheduled replacement of the item (before it fails) is appropriate.

Predicting Failure
This entails monitoring the equipment in order to identify a detrimental change (i.e. a warning) that indicates that the failure is in the process of happening (early enough so that action can be taken
before the failure actually occurs). This is known as Condition-based Maintenance or Condition Monitoring.

How often the equipment needs to be monitored is governed by the time it would take from when the warning can be identified to the point at which full failure occurs. This is illustrated in the diagram below: the warning is shown at point P (Potential failure) and the full failure occurs at point F (Functional failure).

The monitoring task should be carried out at an interval which is less than the time between P and F (know as the P-F interval).

What should be done if a suitable proactive task cannot be found?

The  RCM  task selection logic ensures that proactive tasks are identified only for those failure modes that need them. When a suitable proactive task cannot be found there still remains the question of what else could be done in order to manage the failure mode.

It is not possible for one person to answer all the questions that  RCM  asks. The solution is to bring together a group of people (the “ RCM  analysis group”) who have technical knowledge about the equipment, knowledge of its operation (within its current operating context) and a basic understanding of  RCM  itself (through suitable training).

A sound understanding of the  RCM  process is also required in order to guide the  RCM  analysis group through the  RCM  process and achieve consensus in answering the questions. This role is fulfilled by an  RCM  facilitator.

RCM  has been applied in a wide range of industries in most countries throughout the world. Correctly applied,  RCM  produces a maintenance schedule that is optimised for the equipment in its operating context; the aim is to achieve inherent levels of equipment reliability and availability. The  RCM  derived maintenance and the process itself bring about the following benefits:

Safety - Greater safety and environmental protection due to:

Performance - Improved operating performance due to:

Cost Effectiveness - Greater cost effectiveness due to:

Our role is to impart an understanding of  RCM  to clients and provide support and guidance in its application; our goal is for clients to become competent to apply  RCM  themselves.

This is achieved via a combination of:

RCM  yields results very quickly; most organisations can complete an  RCM  review on existing equipment and achieve substantial benefits in a matter of months.

It is also an ideal approach for determining the maintenance requirements of new equipment of all kinds. When applied correctly, it transforms both the maintenance requirements themselves and the way in which the maintenance function as a whole is perceived.