background image design
Rocío Pérez Núñez 28 May, 2021

Seven keys to success with machine learning

Blog header image of neural network in the brain

An operational data journey to AI

With the boom of recent advances in industrial digitization, many companies are applying predictive techniques and algorithms to optimize and improve production processes.

Because time-series data generated in discrete manufacturing or industrial process operations provides valuable information on the past, present, and future health of equipment and production lines, it is integral to any company’s digitalization efforts.

Machine learning uses time-series data to find patterns undetected by the human eye. It has quickly become a leading approach to maximizing the value of operations data. Companies can apply machine learning to their time-series data when:

  • There is a pattern.
  • Historical data is available.
  • It is not possible to solve the problem mathematically directly, or in other words, there is not a mathematical equation relating process variables, such as temperature and reactor yield.

When it comes to machine learning projects, there are seven keys to success.

Graph of the seven steps to machine learning success

1. Understand your company maturity level

In order to identify the merits of operational intelligence initiatives, companies need to first assess the maturity of their efforts.

The digitalization team must consider both the technology and the culture of their company.

The team should first ensure that data is contextualized and properly cleaned and that users know proper operational processes. Next, the team should identify basic operational processes and their trends. The team can then standardize best practices and establish a real-time data governance strategy.

2. Ensure data quality

Engineers have an expression for when poor quality data input leads to unreliable and even unusable data output: "Garbage in, garbage out."

Operations data from sensors, control systems, assets, and mobile devices can generate poor quality data for a variety of reasons:

  • Communication failures—control systems, OPC server failures that return incorrect data, such as "Comm Fail," "I / O Timeout," etc.
  • Stale data due to network problems that delay updating, update with the same values, or update with nonsensical values.
  • Sensor accuracy failures that may affect one or more pieces of equipment.

As a company’s connected assets continues to grow, monitoring for poor quality data becomes an increasingly important challenge. Companies need to integrate information, equipment, and processes to with information technologies to ensure data quality.

Data standardization, range checking to remove unrealistic data values, and gap filling are all crucial for creating a useable, accurate data set.

3. Employ real-time data governance

While the people, processes, and tools used during different phases of the data collection process may differ, it’s important to establish a real-time data governance strategy to ensure data accuracy and quality. Establishing standard policies and processes helps companies develop and accurately manage their real-time data.

4. Use a machine learning platform that fits your model, not the model that fits the machine learning platform

The cloud is crucial to integrating information technology (IT) systems with operational technology (OT) systems.

Each cloud provider provides similar building blocks for high-volume, diverse data analytics solutions.

Use an IoT platform that enables and streamlines the implementation of common design patterns within and between these environments.

5. Visualize and find the pattern

Visualization allows user interaction with the environment, visualizing trends or mitigating errors based on acceptance thresholds. Users can add data from various systems to the visualizations to manage and detect the most significant events. Turning data into knowledge is key for rapid decision-making.

Visualization plays a fundamental role in all stages of data analysis.

With real-time visualizations, users detect anomalies or filter operations by event types as they happen. This enables richer data for subsequent machine learning and means that critical business decisions can be made in real-time.

Pi Vision display of turbine data

6. Share your knowledge

In a data-sharing context, data integrity and security are of utmost importance. Uploading operations data on a shared platform should come with a guarantee that it is only accessible to authorized people for a well-defined purpose.

Graph of OCS sharing data Visual trending in OCS improves situational awareness and enables real-time anomaly detection. Stream workspaces can be securely shared with colleagues inside or outside the plant, as well as with partners and vendors.

7. Solution automation

Finally, model training allows companies to automate machine learning solutions. This means companies can minimize the risk of human error and reduce time to market. In order to properly automate a solution, your company should first establish requirements, which could be anything from having a continuous data supply to data visualization goals.

A machine learning algorithm is only as good as its input data. Machine learning project success lies not in the technology but in the solution itself.

Rocío Pérez Núñez Rocio is a Pre-Sales Engineer and IoT professor at MIOTI (Institute of Technology in Madrid, Spain). With expertise in understanding how real-time data management can accelerate digital transformation, Rocio is focused on providing the best OT/IT, Cloud and Artificial Intelligence solutions for the industrial world customers. Rocio has a degree in Mathematics from Universidad Complutense de Madrid.
©2021 OSIsoft, LLC