Making Sense of the CMS Market

Six key steps to take into account when swimming the sea of condition monitoring purchase decision-making.


Condition Monitoring Framework
The recent acceptance of the value of condition monitoring in the wind industry is driven by two major factors: First, turbines coming off warranty are exposing owners to the true operations and maintenance costs of the wind farm; Also, major component failures are driving excessive maintenance costs. To combat these challenges, owners and operators are deploying condition monitoring systems (CMS) to detect faults early, before they cause secondary damage. Catching faults early means the cost of repair can be reduced, resulting in significant savings.

With the acceptance of condition monitoring systems on the rise, there is a related increase in the number of vendors bringing CMS products to market. Almost every turbine OEM offers a CMS, and gearbox and bearings suppliers are readily joining this increasingly crowded market. Furthering the complexity of choosing a condition monitoring system is the fact that several different technologies are available (e.g. vibration, oil debris, SCADA). Deciding which system or technology provides the highest value can feel like comparing apples to oranges.

Unfortunately, there is no performance standard or benchmark to compare the various condition monitoring systems. A prospective buyer is charged with the difficult task of determining which CMS will provide the right performance and the right price. To assist in this decision-making process, it is important to focus on the over-arching goal of a condition monitoring system—providing the user with recommendations that allow them to make optimal O&M decisions.

With this goal in mind, it is important to understand the process through which a CMS converts a physical measurement (e.g. vibration, oil debris, temperature, pressure) into a recommendation for action. Luckily for prospective buyers, there is a generic process that all condition monitoring systems follow. This process is comprised of six steps:
• Data acquisition – translation of a physical phenomenon into an analog measurement that can be converted into digital format.
• Data processing – processing the digitized sensor measurements into meaningful indications of component health.
• Detection – classification of whether the condition indicators are “normal” or “abnormal.”
• Diagnosis – validation of the fault and a determination of its location and severity.
• Prognosis – estimation of how much longer the faulted component will last before it needs to be replaced.
• Recommendation – determination of what maintenance action is necessary and when it should be performed

Figure 1 shows this condition monitoring framework in a graphical format. Understanding how a particular condition monitoring system performs each of these steps will give a prospective buyer a much clearer understanding of the system’s capability and the cost of each step in the process.

The process moves from data acquisition to recommendation in a linear fashion. Since the output of each step drives the next, the quality of work done in one will affect downstream performance, so considering the CMS from a holistic systems level is most appropriate. It is also worth noting that the condition monitoring process stops at recommendation. It is up to the operator to use those recommendations to make better operations and maintenance decisions.

There are many different types of wind turbine condition monitoring systems with very different methods of providing turbine health information. Instead of using one of these technologies as an example, we will use an even more generic example that everyone can recognize; a visit with your physician.

Data Acquisition
Every CMS starts with a sensor that translates a physical phenomenon into an analog measurement, which is then converted into digital format for further processing. In our example, a doctor will take a blood pressure measurement as a routine part of the visit. The data acquisition sensors in this case are the stethoscope used to measure cardiac cycles, and the pressure cuff used to measure arterial pressure. The digitization of the sensor outputs is performed by the ears (stethoscope) and eyes (pressure cuff) of the doctor.

The stethoscope and pressure cuff needed for a blood pressure measurement are not high-precision sensors. This is very different than wind turbine condition monitoring systems where the fidelity of these measurements will affect the down-stream processing, so understanding the sensitivity, bandwidth and accuracy of the sensor chosen is important. It is also important to understand if a CMS can determine component health on all of the fault modes that can effect it. Oil debris systems can detect pitting failures but cannot detect cracking faults. Vibration-based systems can detect both pitting and cracking, but most cannot determine the health of components in the planetary section. Prospective buyers should take an inventory of the components on their wind turbines that have been driving the largest maintenance costs and determine their most common fault modes.

Data Processing
After the sensor measurement has been converted to a digital format, the CMS must process the sensor measurements into meaningful indications of component health. In our physician example, the data processing step requires combining the data from the cardiac cycle (stethoscope output) with the pressure variations (pressure cuff output) measured at the same time. The physician must then average all the pressure variations over the course of the measurement. The complexity of the two measurement signals is reduced to two simple numbers, 120/80mmHg for example, which characterizes the patient’s current blood pressure.

The data processing step involves two distinct sub-steps. The first is to isolate the relevant portion of the measurement signal from the ‘noise’ and involves some sort of filtering of the original signal. In our example, the physician only looks at the pressure levels during specific parts of the cardiac cycle, filtering out the rest of the extraneous values. When the signal isolation is done well, it will increase the sensitivity of the CMS, allowing for easier discrimination between “un-faulted” and “faulted” components. It will also reduce the inevitable variation in these component condition indicators due to the complex environments and varied conditions in which wind turbines operate.

The second step of data processing is extracting the salient features of the filtered signal that provide an indication of component condition. The resultant condition indicators should ideally identify the presence of different fault modes in the component. For example, a gear can have several fault modes including root cracks, surface pitting or misalignment. Each of these fault modes manifests itself in different ways, so no single condition indicators will accurately characterize all of these faults. Therefore, several condition indicators based on different filtering methods should be used to identify potential fault modes in each component.

The goal is to design data processing that calculates condition indicators that identify all potential fault modes and easily discriminate between faulted and un-faulted components. Figure 2 shows two different scenarios—a condition indicator that results from poor data processing (top), and a condition indicator that result from effective data processing. In both of the graphs, the green distribution is the range of the condition indicator typical for an un-faulted component while the red distribution is the range of the condition indicator typical for a faulted component. In the top graph, there is a great deal of overlap between the two distributions due to inadequate data processing, so the ability to discriminate between a faulted and un-faulted component is poor. In the bottom graph the effective data processing has provided adequate separation of the faulted and un-faulted condition, so discriminating between the two is straightforward.

Once the measured signals have been turned into condition indicators, the CMS must classify whether the condition indicators are “normal” or “abnormal”. This is achieved  by comparing the current condition indicator to a reference range, which can be either a statistical baseline or model-based. In our example, the physician has determined the patient’s blood pressure, but that measurement itself is not instructive. It is not until that measurement is compared to the commonly used 120/80mmHg threshold that we can determine if it is high or not.

Setting the level of the threshold used to classify the condition indicators as either “normal” or “abnormal” is the crux of the detection step. The threshold is typically a high limit set on a condition indicator. In our example, the blood pressure threshold is based on studies including large populations of patients with no known hypertension. For wind turbine condition monitoring it is much more difficult due to complexity of the systems and the number of different turbine makes and models in the field. Figure 3 shows the un-faulted (green curve) and faulted (red curve) component distributions as before, but a fault threshold has been added. In this case the un-faulted and faulted distributions have significant overlap, so misclassifications are inevitable.

In the figure, the threshold was set to balance possible missed detections and false alarms. In practice, setting thresholds is even more difficult because there are few (if any) measurements of what a faulted component looks like. In the best case, CMS thresholds are set based on knowing what an un-faulted component looks like (green distribution in the graph) and a predefined probability of false alarms. Because of the inherent complexity and direct impact threshold setting has on performance, understanding how a condition monitoring provider will set thresholds is one of the critical questions to ask when selecting a system. Experience shows that systems that use a poor process for setting thresholds are more prone to false alarms that drive unnecessary maintenance.

Now that one of the condition indicators detects a faulted component, the CMS must validate the fault and determine its location and severity. The validation is done by examining the context in which the indication was high. The condition monitoring system can compare the current condition indicator to the historic value of the same condition indicator and under what operating condition it occurred. If this is the first high value and it happened under high transient loading, it may be best to ignore this indication until more evidence is gathered.

Continuing our example, if a patient had a high blood pressure reading a physician may be inclined to diagnose hypertension. Yet upon further discussion it was discovered that the patient had a stressful week. In addition, the patient has no family history of hypertension and their historic blood pressure values were lower. Given this context a physician would not diagnose hypertension, even though high blood pressure was detected during this single visit. Now imagine if the patient did have a family history of hypertension and their historic blood pressure values had been trending upward for several years. In this context, the diagnosis would be hypertension and the next step would be  to determine the severity.

Just like in the physician example, determining the severity of a turbine fault is a critical part of a CMS diagnosis. The action needed for a component with a small fault and months of remaining useful life left is very different then the action needed for a severely faulted component with only hours left. 

Once the fault has been validated and the severity is known, the next piece of information needed is an estimate of how much longer that component will last before it needs to be replaced—also known as the remaining useful life. The remaining useful life of the component can be estimated in several ways but requires knowledge of two things—the current severity of the fault,and an estimate of the future operating conditions of the component. Using our example, once the severity of the patient’s condition is understood, the physician can determine how it will degrade. If the current hypertension is low (fault severity) and the patient already lives a healthy lifestyle (future operating conditions), the prognosis may be that the hypertension will have little impact on the patient’s future well-being. If the hypertension is currently low but the patient lives a sedentary, unhealthy lifestyle, the prognosis may be that if the hypertension is left untreated it will lead to heart disease in five years. Both situations started at the same severity level, but the anticipated future conditions led to vastly different prognoses.

A prognosis for a wind turbine CMS is used slightly differently. Instead of changing future operating conditions to prevent component failure, an estimate of operating conditions is used to determine when the component will reach the end of its useful life. Figure 4  show a graphical example of the projection of future component health. The amount of time between the current time and the time the estimated trajectory of the component’s future health (blue dotted line) crosses a predefined threshold (red line) defines the remaining useful life of the component.

Once the condition monitoring system has an estimate of fault severity and a remaining useful life of the component, the necessary maintenance action and when it should be performed can now be determined. The recommendation step is really an aggregation step; information is taken from the diagnosis and prognosis steps and combined them into a clear recommendation of what to do next. In our example, if the patient is diagnosed with mild hypertension and the prognosis is that there will be no impact on their overall health, the recommendation would be to maintain their current lifestyle. If the patient receives the same diagnosis but the prognosis is that the mild hypertension will lead to heart disease in five years, the recommendation would be to exercise more and change to a healthier diet.

For wind turbines the recommendation will come in the form of a maintenance action that will be required. If a bearing is faulted the recommendation could be to verify the fault through visual inspection within the next month and schedule a replacement of the bearing within three months. This recommendation would allow an operator to plan maintenance outages ahead of time, reducing downtime and lost revenue.

Closing Thoughts
The framework of the condition monitoring process presented here should provide a guideline for prospective buyers when considering a condition monitoring system purchase. Many systems available do not cover the entire condition monitoring process. This may require an operator to interpret a significant amount of data, so be sure to ask vendors what parts of the process their systems cover and if additional services are required to get to a recommendation. In the end, the efficacy of a condition monitoring system is only as good as its ability to provide operators with information that can be used to drive better operations and maintenance decisions.