Predictive Maintenance has long been the holy grail of the IoT. However, experience has also shown that successfully implementing predictive maintenance for industrial use cases is harder than one might think, from finding a sustainable business model to actually delivering the technical implementation. This case study provides an account of a successful predictive maintenance implementation for hydraulic systems from the perspective of Bosch Rexroth, a leading supplier in this field.
A hydraulic system uses pressurized fluids (usually mineral oil) to drive actuators to produce linear or rotational movements. Example use cases include hydraulic excavators, hydraulic presses, mining conveyor belts, shredders, hydraulic lifts, etc. Hydraulic components include cylinders and motors (to produce linear or rotational movements) and hydraulic power units to supply pressurized fluid to actuators. These consist of pumps, coolers, tank, etc. Valves are used to control the fluid flow and pressure. Hydraulic oil is not only used for power transmission but also serves as a lubricant and cooling fluid.
Benefits of hydraulic systems include:
- Simple generation of high forces (> 1x106 N) and torques (> 1x106 Nm)
- High power density
- Accurate control of high forces
- Simple, cheap and fast overload protection (Pressure Control Valve)
Hydraulic equipment vendors such as Bosch Rexroth supply machine builders with hydraulic components and systems either directly or via sales partners. Machine builders utilize hydraulic equipment to build industrial machinery, e.g., a hydraulic press, a plastic injection molding machine, or a conveyor belt for heavy loads. This machinery is operated by different types of operators. The hydraulic equipment vendor would usually also offer these operators different services, including spare parts, field service, and repairs. Without predictive maintenance, these services would naturally be reactive, i.e., only triggered after a problem with the hydraulic equipment in the field. This can lead to significant production outages. For example, the outage of a hydraulic component powering a conveyor belt at a mining site could lead to a shutdown of the entire mining operation.
Typical Problem Scenarios
What are the typical issues with hydraulic systems and components? A main reason for wear and breakdowns is contaminated hydraulic oil (by contaminants such as particles and water). This can lead to wear, which in turn can lead to reduced efficiency (e.g., increased volumetric losses in pumps, external leakage in cylinders due to worn seals) or malfunction (e.g., blocked valve spool). The result is lower efficiency or malfunction and increased breakdown probability. Breakdowns can be expensive: while the exact costs are usually use case specific, target customers for Predictive Maintenance typically have downtime costs exceeding 10.000 €/hour.
Often, downtime is reduced by built-in redundancy. However, this cannot fully guarantee availability: cylinders are typically not redundant, a pump breakdown can contaminate the hydraulic fluid and cause other damage, such as valve malfunction due to contamination by particles. Cleaning the hydraulic fluid after a breakdown and replacing all damaged components can be very time consuming (this may take weeks). Large, expensive machines often do not have a replacement machine to continue production after a sudden breakdown. With no advance warning, diagnosis of the causes and decisions on necessary maintenance measures can take a long time. Existing fail-safes in the machine are built to shut down the machine after a catastrophic failure and protect the operators and the environment but rarely contain advance warning features.
Predictive Maintenance: issues and solutions
A key problem for building a predictive maintenance solution for hydraulic components is the complex, individual machine behavior:
- Many different products are produced with the same machine
- Hydraulics are usually only a small part of the whole machine
- Upgrades/changes to the machine after years of operation, e.g., new cooler
- Environmental effects: e.g., temperature, vibration
- Individual changes applied by machine operator
The result is that in most situations, there are initially insufficient data for building an end-to-end AI solution. This is why the Rexroth team has taken an approach where AI-based anomaly detection is used to find interesting data patterns. This is combined with human experts to diagnose the anomaly and subsequently make customer-specific maintenance recommendations.
What can be measured, and what can be learned from it?
A key question for building a predictive maintenance solution is: what can be measured, and what can be learned from it to detect wear at an early stage? In hydraulics, wear is a key issue. However, wear is very difficult to measure directly in practice. Wear processes and component/system functions must be understood in detail in order to determine the correct sensors for data collection. Indirect indication of wear is typically achieved using multiple sensors. Additionally, sensors for measuring the operating point of the components are required since many values, such as leakage and vibration, are operating point dependent. Commercially available sensors are used to reduce costs.
A good example is the external leakage on pumps and motors: Flow meters for leakage flow measurement, operating point: Pressure, speed, displacement, temperature. Another example is cavitation on pumps (suction flow of the pump is lower than vapour pressure due to contamination, excess speed, dissolved air in the hydraulic fluid, etc. Oil vapour bubbles are imploded during the transition to the high-pressure side and cause wear when this happens close to metal parts (e.g., distributor plate). Structure borne sound measurements with accelerometers are used to detect changes in the frequency spectrum of the structure borne sound. The operating point (pressure, speed and displacement) also has to be included.
Why not simple rules-based analysis?
The next question is how to analyse this. Does it have to be AI, or could a simpler, rule-based or analytical model be applied? The problem with these approaches is complexity. While the hydraulic components are standardized, this does not apply to the machines built using them. Consequently, this would require new rules for each machine individually or models to be created and model parameters to be tweaked for each application, meaning a very high individual effort per customer. Furthermore, machine operation (e.g., dynamic operating points, variable environmental effects, changes in production, retrofits and modifications to the machine, etc.) would make the rules very complex and error prone, resulting in false alarms.
Why ML-based anomaly detection, but not prescriptive analytics/automated recommendations?
Because of the high complexity and missing labeled failure data of the individual customer environments, it has proven not to be feasible to apply an end-to-end AI approach, e.g., using deep learning with nonanalytic feature extraction using CNNs (Convolutional Neural Networks).
Consequently, the solution chosen by the Rexroth team is based on "classical" ML, using feature extraction (using domain-specific methods) and unsupervised Learning. Complex dependencies between features are solved by ML. The result is a working anomaly detection, but potentially with many possible causes.
This means that in addition to automated anomaly detection, a human expert is required for failure diagnosis and maintenance recommendation due to individual applications. It is also possible that changes were made to the machine, which cannot be measured with sensors (e.g., new cooling water supply) or that machine operators have changed the settings or are producing different products on the same machine.
The resulting approach is a two-step analysis process:
- (1) Machine Learning-based anomaly detection: Classic domain knowledge-based feature extraction + Machine Learning. Algorithm scans the data for interesting patterns. Output metrics, e.g., system behavior, are calculated and visualized on a GUI for human experts. Dashboards provide a quick overview of machine behavior.
- (2) Human experts diagnose suspicious data patterns based on general domain experience and application/customer-specific know-how. Sometimes it is necessary to ask the customer for further details (e.g., if mechanical modifications have been made to the machine or settings/parameters have been changed). This manual work is necessary as not everything can be captured in the data.
The ODiN Solution Offering
Based on the capabilities but also the limitations of the ML-based approach, the Bosch Rexroth team decided to build the ODiN solution, which is a predictive maintenance service consisting of:
- Application of a specific sensor package to be retrofitted into the customer machine
- Data acquisition unit and IoT gateway for cloud connectivity
- AI pipeline in the cloud
- Personal service support in case of anomalies and quarterly status reports
- Optional additional services, e.g., spare parts management, field service, repairs
The maintenance contract is signed with the machine operator. Maintenance can be carried out by Rexroth, a Rexroth service partner, the customer or a maintenance contractor. Maintenance contract templates are country unit specific and may be customer specific. The contract always contains an appendix detailing data use.
The offered solution is a one stop shop for predictive maintenance covering everything from application-specific engineering to maintenance recommendations and data transmission as well as a secure operation of the data platform. A monthly fee is charged for the service, and parts of the contract are charged as a one-time payment (e.g., installation of data acquisition).
The target customers are machine operators with high downtime costs. These machines are typically already in the field and have been operating for many years. Existing sensors do not provide enough data for a reliable diagnosis. Therefore, a retrofit sensor package and data acquisition unit must be installed onsite. After commissioning, data are sent to the cloud and stored on Bosch servers to be analyzed. ML-based anomaly detection provides insights into general machine behavior, and a human expert will offer maintenance recommendations to customers if required. Additionally, experience from field data is fed back to the continuous development of the ODiN platform and analytics solution.
Customizing the ML solution
Because of the high level of heterogeneity found at customer sites, efficient customization of the solution is important. The approach taken will be explained in the following.
Development and Customization Processes
The solution is developed using two parallel processes: the generic development process, and the customer-specific customization process. They are defined as two individual cycles: the AI DevOps cycle and the AI application cycle. These two are carried out by separate teams. The AI application team is responsible for implementing customer projects from customer acquisition all the way to monitoring the running applications. The task of the AI DevOps team is to continuously develop the analytics pipeline and deliver improved versions for the service as well as operation of the analytics platform.
A single, generic, analytics pipeline for anomaly detection is used for all applications. This enables scaling, as no customer-specific programming is required. The pipeline has the following steps:
- Data export: export data from the big data store for analysis
- Preprocessing: domain-specific preprocessing and feature extraction
- Anomaly detection: automated Machine Learning model generation for anomaly detection. The first model is always generated with the first data batch. Subsequent batches are applied to the model, and new model generation with the current data batch is triggered if the error exceeds a predefined limit. This results in a model library with each model describing a specific machine behavior. These behaviors can be manually labeled to create metrics for visualization in the next pipeline step
- Post-processing: generation of metrics for visualization and monitoring of applications
- Publishing: Calculated metrics and logs are published to kafka
The pipeline is configured for each application via a JSON configuration file, which contains sections for each pipeline step. This enables application-specific analyses without customized programming work.
Lafarge Holcim is a global supplier of cement and aggregates (crushed stone, gravel and sand), as well as ready-mix concrete and asphalt. In their cement manufacturing facility in Bulacan, Philippines, Lafarge Holcim’s challenge was to monitor the key indicators of the hydraulically operated clinker cooler in order to detect possible failures in good time.
A clinker cooler is an essential component of cement production. If it stops working, the entire production must stop within five minutes. Therefore, the sensors on the hydraulic system were installed in such a way that they send the essential physical quantities to the ODiN platform for analysis. Employees of Lafarge Holcim are receiving regular reports about the system behavior of their machine. A local service partner interprets the information provided by the ODiN system and gives recommendations for action to the maintenance technicians on site.
Feedback from Lafarge Holcim: Now we are able to learn much more about our own equipment than we did before. We can predict, we can see the health of the machine. We will install it to other hydraulic units within our business. With ODiN, we can be more proactive in capturing all the equipment data, making better decisions.
Summary and lessons learned
The following provides a summary and key lessons learned:
- Retrofitting sensors are necessary for required data quality; specialized data acquisition needed
- Application specific anomalies, modification in the machine or the production process by operators, etc. cause generic Machine Learning models to fail
- Building a working generic analytics pipeline for anomaly detection is possible, however, with application-specific configuration
- Manual model labeling by experts is necessary
- A human expert is required for failure diagnosis due to complex machine behavior
Perhaps the most important lesson learned in this project is that due to insufficient data for generic, end-to-end ML solutions, a low-cost solution for Predictive Maintenance of heterogeneous industrial environments is not realistic. Consequently, the team decided to offer a full service contract together with personalized support to maximize customer value. The combination of human expertise with ML-based anomaly detection enables a reliable and efficient Predictive Maintenance solution for the customer, helping to significantly reduce downtime and improve OEE (Overall Equipment Effectiveness).