Demo C.1: Using Vibration Data to Predict the RUL of a Bearing

Demo
AssetBearing on a Wind Turbine
DatasetTimestamp, Vibration (g), RPM
Dataset ExplanationData collected periodically from Normal to identified Failed state.
ObjectiveBuild model to predict the Remaining Useful Life (RUL)

1. Data Visualization

The first step in the analysis is to visualize the data. Visualizing the data helps us to identify if there are any discernible trends or if the data needs cleaning (or processing) due to missing values or outliers. It also shows if the dataset is of the right quality to build a machine-learning model. Datasets with lots of noise and a mix of operating conditions without proper labels are not ideal for prediction analysis.

In this problem, the dataset has a timestamp, vibration, and tachometer reading.

Data was collected every day from Day 1 (when the bearing was in normal condition) up until Day 50 (when the bearing failed due to an inner race fault). Plotting vibration on Y-axis and Time on X-axis.

Each color band describes the data collected in one day. Recording of data was for 6 seconds. Hence, 50×6 = 300 seconds of total data. From the plot, it is clear that as day 50 approaches there is a change in the vibration pattern observed. Acceleration (g) increases near Day 50. Is this change alone enough to label the bearing as a failure? No. We need more certainty to label the failure. How about predicting the failure before it happened? Setting a threshold limit at 20 g to say a failure is too late and setting a threshold at 10g is too soon, as the data breaches it multiple times over the running.

We need a way to describe the dataset in a way that shows a clear, predictable trend toward failure.

2. Feature Generation

Features are derivatives of the dataset that shows the data pattern in a different projection. The given dataset has vibration data in the time-domain. The same dataset can be represented in a frequency-domain. Each domain can then be further processed to extract statistical features such as Mean, Standard Deviation, Skewness, RMS, Crest Factor, Shape Factor, etc. Data set represented in the frequency-domain can also be used to extract spectral features such as Spectral Kurtosis, Spectral Kurtosis Mean, Spectral Kurtosis Std, etc.

The data set of Time stamp and Vibration is now converted to Time stamp and Features. The table below shows this:

There may be many Features that can be extracted from a given dataset but not all Features are useful in making predictions on Asset Failure. Features are evaluated using metrics such as Monotonicity, Prognosability, and Trendability.

3. Data Partition

Before we select any feature and use it to make predictions, let’s partition the given dataset into Training Data & Testing Data. Model is generated from Training Data portion and its performance is tested on the Testing Data portion. There are different methods to partition datasets. For our problem, we assume the first 20 days out of 50 days is Training Data and the rest is Testing Data.

4. Feature Selection

Using Monotonocity metric to evaluate the list of generated features in the dataset:

The above histogram shows the value for Monotonicity for the different features that we have generated. Higher the monotonocity score, better the feature suited to make a prediction. Looking at the Feature Rank Table, the features are grouped at different score levels. For the analysis, all the features that have a monotonicity value above an arbitrarily defined threshold are chosen. Top 5 features are chosen for analysis and the rest are ignored. It may be tempting to choose only one or choose all the features; both will lead to imbalance.

5. Health Indicator using PCA

The five features chosen are then fused onto a lower dimension to monitor their performance. The method used here is Principal Component Analysis (PCA). The result of this is a plot of Combined Feature A vs. Combined Feature B. Now we compressed the dataset from 5 features down to 2 features.

Exploring PCA plot once more shows that Combined Feature A increases in value as the machine approaches failure. This is supported by the trend of the color map attached to the plot. Finally, we have Combined Feature A as the single FEATURE to predict the failure of this bearing. The trend of it is predictable we can use it to establish an alarm threshold & model the underlying distribution.

6. Building Model

Building a model involves fitting the pattern of the chosen feature (Combined Feature A) on a mathematical equation. Fitting data is a heuristic process and the asset-domain-knowledge will help in narrowing down the model type. Since the asset under study is a bearing, it is prone to damage accumulation once wearing starts. So we use the exponential degradation model on the Combined Feature A.

The model is built on the Training data and it is extrapolated to predict the future. Model has Confidence Bounds to quantify the uncertainty in its prediction. At the beginning (days 1 to 10), the data available is less in size and hence the prediction has a wider confidence bound. As more data comes in, the model fitting improves and the confidence bounds shrink in its variation.

7. Reliability Engineering

With this model fitted to the dataset, we can establish thresholds to raise an Alarm for the Reliability Engineers or Maintenance Crew to respond. Here, the Alarm limit is set at a value of 20. Asset Management professionals will also realize that this limit is the Potential Failure Point on the P-F Curve. This limit means that the asset has been detected to have an impending failure.

By extension, the time between the Potential Failure Point and the point of failure is the time available for the maintenance crew to repair or replace this asset. Potential Failure Point is crossed at Time 38 days. Functional Failure Point is at Time 50 days. So, 50-38 = 12 days is the interval available for the Maintenance team to perform the work.

If the Maintenance team needs more time ( to source parts or reach the site or admin delay), then the way to maintain this asset is to move the Alarm limit to a lower value (say 15 instead of 20).

Reliability Program for this asset needs to take this into account and plan the adjacent processes, such as stocking spare parts, personnel availability, and production planning.

References:

  1. Mathworks: Predictive Maintenance Toolbox Documentation
  2. Dataset from GitHub under license