Getting Started with Real-Time Anomaly Detection in Manufacturing

The era of the industry 4.0 has revolutionized how data is collected and analyzed. Industrial machines are increasingly equipped with IoT-enabled software for real-time data collection of sensor and production process data.
This data collection, combined with automated analysis can be used in condition monitoring systems to keep track of the health and performance for the machinery. Condition monitoring also provides the necessary foresight to schedule maintenance work ahead of failures, preventing unplanned downtime while operating the system with optimal efficiency [2].
This brings about the idea of predictive maintenance, a strategy that uses the insights gained from condition monitoring to predict when a component in a system might fail. Machine learning models can be trained to predict failure well ahead of the occurrence of potential failures. This allows for optimal maintenance scheduling to prevent breakdowns and hereby boost productivity [3].

The deterioration of equipment condition over time, up to the point of failure, is clearly shown in
a so-called prevention-failure diagram (P-F diagram). The critical time it will take for the asset to experience a functional failure from the initial detection of the potential failure condition is denoted as the P-F interval. Here is an illustration from IFM highlighting the importance of vibration data as an early indicator of a potential equipment breakdown.

P-F Curve Showing Run To Failure

Why Predictive Maintenance is Problematic

The difficulty with predictive maintenance is that its effectiveness is heavily dependent on the availability and quality of data.

Data collected from machines operated up until the point of failure is often missing, because many failures would result in costly downtime and are therefore prevented by all means. If this data is not available, it can be challenging to accurately predict when a machine might fail.

If the data is noisy, incomplete, or inaccurate, it can lead to erroneous predictions. There might be gaps in the data collection infrastructure, such as missing/ill-configured sensors or connectivity issues.

Anomaly Detection to the Rescue

Anomaly detection can play a significant role in mitigating the challenges associated with predictive maintenance. It helps to identify unusual patterns or behaviors in the operational data, which might indicate a potential failure. This is particularly useful where there is a lack of run-to-failure data.

Robust anomaly detection algorithms can often work with noisy or incomplete data and still manage to detect anomalies. This can be particularly useful in scenarios where data quality or availability is a concern.

But the first step in applying anomaly detection is understanding the nature of the data itself.

Time Series Data

Time series data arises whenever data is sequentially collected over a period of time.
The most familiar example of time series data with many of us are stock pricing charts, where each data point represents the stock price at a given time. But also the ECG, i.e. the electrical stimulus of your heart, can be interpreted as a time series.
The time hereby represents a well ordered metric which is monotonically increasing.

In a manufacturing process, data is collected from various sources such a sensors or controllers over time, with each value “stamped” with its time of origin, forming multiple time series.

Time Series Analysis

Time series data is unique in the sense that there is often a high degree of correlation between the data points. This correlation can be exploited to characterize the data more meaningfully. Think of a typical manufacturing process, where we measure the temperature of the molding material in a highly efficient injection molding machine. It is very likely that the temperature always follows the same recurring pattern.

The structure of time series data can be characterized by several key components, such as:

  • Seasonality or cycles: This refers to recurring patterns or cycles over time.
  • Trends: This component represents the overall direction the data moving. In contrast to seasonality, the trend describes the long-term progression over time.

These components can also be thought of as forming a regime that characterizes the stochastic behavior of the time series. However, it is important to note that within the same time series, these regimes can in turn change over time at certain change points. E.g. a previously stationary trend might be suddenly increasing after the occurrence of an anomalous event.

Anomaly Detection in Time Series Data

Anomaly detection in time series data aims to identify data points that deviate from an expected pattern over time.

These anomalies in turn can be divided into different types

  • Point Anomalies or Outliers are individual data points that deviate significantly from the rest of the data, like a spike in temperature in an otherwise controlled environment
  • Collective Anomalies involve data points that collectively deviate from the expected behavior. In terms of time series analysis, they form a change in the stochastic regime over time, e.g. a change in trend.
  • Contextual Anomalies are anomalies that occur only within a specific context. Hereby we look at correlation of multiple time series. For instance, a machine might operate at higher temperature when processing a certain product, but the same temperature when processing another product could be considered a contextual anomaly.

When we have the knowledge to decompose the data and structure a specific problem, we can select the appropriate solution approach.

Before applying any sophisticated machine learning model, particularly when dealing with simple, single-dimensional problems, look at the following traditional methods:

Several machine learning models have proven to powerful tools to detect anomalies, especially for complex, multi-dimensional problems. However, to determine their suitability in the context of real-time IoT applications specifically, the following characteristics are crucial:

  • Robustness in the sense that the model should provide reliable results even on incomplete or noisy data.
  • Computational effectiveness, i.e. the algorithm’s efficiency in terms of processing speed and resource usage, which is crucial to take decisions in real-time.
  • Simplicity is important to ensure ease of implementation and understanding of the algorithm, which adds transparency and trust to the resulting decisions.
  • Prefer supervised learning methods. Supervised algorithms require labeled training data to learn a mapping from inputs to outputs. This allows for enough flexibility to incorporate domain knowledge as much as possible.

Hereby, popular methods are for instance:

Equally important is the choice of an underlying platform with the ability to handle large-scale, real-time data analysis, such as Azure Data Explorer.
Azure Data Explorer is a fully managed services which combines high volume data ingestion with advanced data analytics capabilities.

Conclusion

Putting it all together, we have learned how to get a starting point in building our next generation manufacturing IoT application, providing more insights, more timely.

In summary, the following building blocks have been discussed:

  • Consider data collected by a condition monitoring system as time series data
  • Decompose time series data into essential components to create a well-defined problem formulation
  • Select a robust, efficient machine learning approach to detect anomalous data points or patterns
  • Integrate the solution into a highly scalable, data-driven compute environment to process high volumes of data in (near) real-time