The fourth industrial revolution, or Industrie 4.0, has led to exponential changes in industrial operations and manufacturing. Digital technologies and sensor-based data are fueling everything from advanced analytics and machine learning to augmented and virtual reality models. Sensor-based data is not easily handled by traditional relational databases. As a result, time series databases are on the rise and, according to ARC Advisory Group research, this market is expected to grow at over 6 percent per year. These databases specialize in collecting, contextualizing, and making sensor-based data available.
In general, two classes of time-series databases have emerged: well-established operational data infrastructures (or operational historians) such as OSIsoft’s PI system; and newer open source time-series databases, such as InfluxDB or Graphite.
What’s the Difference?
Functionally, at a high level, they both perform the same task of capturing and serving up machine and operational data. The differences revolve around types of data, features, capabilities, and relative ease of use.
Established Data Infrastructure
The industrial world’s version of commercial-off-the-shelf (COTS) software, most established data infrastructure solutions can be integrated into operations relatively quickly. OSIsoft, for example, says its PI System can sometimes be deployed in a week or less and that it can take advantage of a broad ecosystem of more than 450 data connectors, third-party analytics, visualization tools, and other technologies. (According to OSIsoft, Aurelia Metals was able to obtain a complete return on investment from its PI System in just 12 days). While most customers have installed the PI System as on-premise software acquired under a standard perpetual license, ARC anticipates seeing more customers subscribe to OSIsoft technology through cloud services. In some cases, that “cloud” could be a bank of servers sitting in a location owned by a third-party service provider.
In general, established historian platforms, such as the PI System, are designed to make it easier to access, store, and share real-time operational data securely within a company or across an ecosystem. While, in the past, industrial data was primarily consumed by engineers and maintenance crews, increasingly, that data will be used by financial departments, insurance companies, downstream and upstream suppliers, equipment providers selling add-on monitoring services, and others. While the associated security mechanisms were already relatively sophisticated, they are evolving to become even more secure.
Another major strength of established operational data infrastructures such as the OSIsoft PI System, is that they were purpose-built and have evolved to be able to efficiently store and manage time-series data from industrial operations. As a result, they are better equipped to optimize production, reduce energy consumption, implement predictive maintenance strategies to prevent unscheduled downtime, and enhance safety. The shift from using the term “data historian” to “data infrastructure” is intended to convey the value of compatibility and ease-of-use.
New Open Source Products
In contrast, flexibility and a lower upfront purchase cost are the strong suits for the newer open source products. Not surprisingly, these newer tools are initially being adopted by financial companies (which often have sophisticated in-house development teams) or for specific projects where scalability, ease-of-use, and the ability to handle real-time data are not as critical. Since these new systems are somewhat less proven in terms of performance, security, and applications, users are likely to experiment with them for tasks in which safety, lost production, or quality are less critical.
While some of the newer open source time series databases are starting to build the kind of data management capabilities already typically available in a mature operational historian, they are not likely to replace operational data infrastructures in the foreseeable future. Industrial organizations should use caution before leaping into newer open source technologies. They should carefully evaluate the potential consequences in terms of development time for applications, security, costs to maintain and update, and their ability to align, integrate or co-exist with other technologies. It is important to understand operational processes and the domain expertise and applications that are already built-into an established operational data infrastructure.
Convergence and Harmony
Rather than compete head on, it’s likely that the established historian/data infrastructures and open source time-series databases will co-exist in the coming years. OSIsoft, for instance, is collaborating with open source companies to develop edge technologies to make it easier to link more devices directly to the PI System as well as have greater local compute and analytic power.
As the open source time series database companies progressively add distinguishing features to their products over time, it will be interesting to observe whether they lose some of their open source characteristics. To a certain extent, we saw this dynamic play out in the Linux world.
The Real Database Battle
The most important differences between relational databases, time-series databases, and data lakes and other data sources is the ability to handle time-stamped process data and ensure data integrity.
While relational databases are designed to structure data in rows and columns, a time-series database or infrastructure aligns sensor data with time as the primary index. This is relevant because the primary job of the data management technology is to:
- Accurately capture a broad array of data streams
- Deal with very fast process data
- Align time stamps
- Ensure the quality and integrity of the data
- Ensure cybersecurity
- Serve up these data streams in a coherent, contextualized way for operational personnel
To gain maximum value from sensor data from operational machines, data must be handled relative to its chronology or time stamp. Because the time stamp may reflect either the time when the sensor made the measurement, or the time when the measurement was stored in the historian (depending upon the data source), it is important to distinguish between the two.
Time series data technologies - whether open source databases or established historians - are built for real-time data. Relational databases, in contrast, are built to highlight relationships, including the metadata attached to the measurement (alarm limits, control limits, customer spend, bounce rate, geographic distribution between different data points, etc.). Relational technologies can be applied to time series data, but this requires substantial amounts of data preparation and cleaning and can make data quality, governance, and context at scale difficult.
Data lakes, meanwhile, score well on scalability and cost-per-GB, but poorly on data access and usability. Not surprisingly, while data lakes have the most volume of data, they have the fewest users. As with time series technologies, the market will decide which and how these different technologies get used. But this will take time.
Why Use an Operational Data Infrastructure?
ARC believes that modern operational historians and data infrastructures, such as the OSIsoft PI System will be key enablers for the digital transformation of industry. Industrial organizations should give serious consideration when investing in modern operational historians and data platforms designed for industrial processes. Ten things to consider when selecting a data infrastructure for operations:
- Data quality - The ability to ingest, cleanse, and validate data. For example, are you really obtaining an average – e.g. if someone calibrates a sensor – will the average include the calibration data? If an operator or maintenance worker puts a controller in manual, has an instrument that failed, or is overriding alarms, does the historian or data-base still record the data? Will the average include the manual calibration setpoint?
- Contextualized data - When dealing with asset and process models based on years of experience integrating, storing, and accessing industrial process data and its metadata, it’s important to be able to contextualize data easily. A key attribute is the ability to combine different data types and different data sources. Can the historian combine data from spreadsheets and different databases or data sources, precisely synchronize time stamps and be able to make sense of it?
- High-frequency/high-volume data - It’s also important to be able to manage high-frequency, high-volume data based on the process requirements, and expand and scale as needed. Increasingly, this includes edge and cloud capabilities.
- Real-time accessibility - Data must be accessible in real time so the information can be used immediately to run the process better or can be used to prevent abnormal behavior. This alone can bring enormous insights and value to organizations.
- Data compression - Deep compression based on specialized algorithms that compress data, but enables users to reproduce a trend, if needed.
- Sequence of events - SOE capability enables user to reproduce precisely what happened in operations or a production process.
- Statistical analytics - Built in analytics capabilities for statistical spreadsheet-like calculations to perform more complex regression analysis. Additionally, time series systems should be able to stream data to third party applications for advanced analytics, machine learning (ML) or artificial intelligence (AI).
- Visualization - The ability to easily design and customize digital dashboards with situational awareness that enable workers to easily visualize and understand what is going on.
- Connectability - Ability to connect to data sources from operational and plant equipment, instruments, etc. While often time-consuming to build, special connectors can help. OPC is a good standard but may not work for all applications.
- Time stamp synchronization - Ability to synchronize time stamps based on the time the instrument is read wherever the data is stored – on-premise, in the cloud, etc. These time stamps align with the data and metadata associated with the application.
- Partner ecosphere – Can make it easy to layer purpose-built vertical applications onto the infrastructure for added value.
When choosing operational historians, data infrastructures, and time-series databases, many issues need to be considered and carefully evaluated within a company’s overall digital transformation process. These include type of data, speed of data, industry- and application-specific requirements, legacy systems, and potential compatibility with newly emerging technologies. Both established operational data infrastructures and the newer open source platforms continue to evolve and add new value to the business, but the significant domain expertise now embedded within the former should not be overlooked.
If you would like to buy this report or obtain information about how to become a client, please Contact Us
Keywords: Operational Historian, Real-time Database, Time Series Database, Open Source Database, Open Source Applications, ARC Advisory Group.