Industrial Data Platform Guide

The algorithms are not
the problem. The data is.

Why is predictive maintenance so hard? Why do AI pilots die on the shopfloor? Because we keep reinventing the wheel for every data project. It's time for platform thinking.


Every data project starts from zero

Every data project becomes a massive undertaking because you need to untangle the data spaghetti over and over again. Finding data, integrating it, cleaning it, adding context — that cycle eats 60% to 80% of every project's time before anyone gets to the actual analysis.

70%
of data projects fail
or stall before completion
80%
of project time spent on
finding & cleaning data
<Lin
outcomes scale less than
linear with more budget

For the 30% that succeed, learnings and results typically stay locked within the project scope. And the data engineers who glue everything together? They've become the bottleneck for every report, trend, and calculation being requested.

Data is trapped
Sensor data locked in SCADA, historians, and PLCs — inaccessible to anyone outside the control room.
Context is missing
L15.T01A.PV means something to an operator, but nothing to a data scientist. Asset and production context are absent.
Interface spaghetti
Point-to-point connections between systems. Every new use case requires a new custom integration.
IT ≠ OT data
Time-series data doesn't fit in relational databases. Batch context shifts constantly. IT tools weren't built for this.

The Good, The Bad, and The Ugly

The problem isn't starting — it's scaling. Making something work once, then making it work again somewhere else, and again, and again. Most organisations we work with have launched multiple data initiatives. The question is: which line does your organisation follow?

Three Scaling Scenarios: The Good, The Bad, and The Ugly
Figure: Three scaling scenarios — The Good (green upward trending curve), The Bad (red dashed stalling curve), and The Ugly (black dotted line). You can reuse this image with reference to us.
Select a scenario to explore

The Scaling Effect

The ideal scenario. Early progress moves slowly because you're building a proper foundation. But as time passes, each subsequent project becomes easier. The fifth project launches faster than the fourth. The tenth faster still.

Data connections get established once and serve multiple use cases. Quality improvements made in one dataset benefit everyone. Knowledge compounds — lessons learned on Line A accelerate work on Line B. What took three months initially now takes three weeks. The organisation crosses a tipping point where digital capabilities spread naturally.

Stalling

The more common scenario. Early wins come through shortcuts: custom scripts, manual exports, temporary workarounds that become permanent. Someone writes a PowerShell script to export data via FTP. Another builds a Power BI dashboard querying the MES directly — unknowingly slowing down the entire application. A third creates Excel macros so complex that nobody else understands them.

None of these solutions outlive the original creator. When that person leaves, the solution becomes a mystery box nobody dares touch but everyone depends on. The line flattens because each new project fights accumulated complexity rather than building on solid foundations. At a certain point, this way of working always ends in a gridlock.

Pilot Purgatory

Brief spikes of activity followed by flatlines. A pilot launches with a tight scope and delivers early success — only to die quietly because nobody owns ongoing maintenance and the solution was never designed to last. The data science model that predicted equipment failures brilliantly for three months, then stopped working because someone changed some setpoints or simply because winter turned into summer.

These are "innovation theatre" projects: impressive in presentation decks, invisible in operational reality. The result is a cemetery of abandoned pilots — each one technically successful but none of them surviving operational reality. Speed without sustainability produces exactly this.

The answer is platform thinking. Stop building point-to-point and start building foundations. Because every data connection established once, every context model maintained centrally, every quality improvement shared across teams — they all compound. That's how you turn patchwork projects into repeatable scale.

The Industrial Data Platform Capability Map

The only way to truly scale is to break free from the cycle of rebuilding custom data pipelines. We developed a Capability Map that captures the essential building blocks of any Industrial Data Platform. It's not a product — it's a framework for the right conversation.

The Industrial Data Platform Capability Map - 7 core capabilities and 4 supporting functions to Ingest › Enrich › Store › Expose Industrial Data in Context
The Industrial Data Platform Capability Map V2 - 7 core capabilities and 4 supporting functions to Ingest › Enrich › Store › Expose Industrial Data in Context - To help with the adoption we have labelled it as Creative Commons BY-SA 4.0 which means that everyone can freely use, share and adapt it as long as you attribute us and you distribute your work again under the same license.
Select a capability to explore

Every platform begins with getting the data in. In industrial environments, that means a secure, scalable connectivity layer speaking the many languages of industrial systems — old and new. A brand-new IIoT sensor streaming over MQTT next to a 30-year-old PLC on Modbus RTU. High-throughput for defect detection, redundancy for critical processes, buffering during connectivity loss.

Raw data alone is not enough. This capability links unstructured values to real-world meaning: asset hierarchies, production batches, maintenance events. ISA-95/88 aren't academic exercises — they're proven structures capturing decades of manufacturing wisdom. Context is what transforms L15.T01A.PV into "the baking temperature during batch 2025-W03."

Sensor data is notoriously messy: outliers, flatlines, NULL values, calibration drift, missing metadata. These issues are often invisible until you're deep into analysis. Data quality should be tracked, scored, and exposed as part of the platform — not addressed on an ad-hoc basis. Without it, your AI models become "The Oracle" that operators abandon the moment nobody is watching.

Keeping your data — and keeping it accessible. A high-performance time-series store at the core, plus event/alarm storage and publish-subscribe via MQTT. Layered storage following the medallion pattern: Bronze (raw), Silver (cleaned), Gold (analysis-ready). The broker connects all the dots in real-time — it's not a historian replacement.

Sometimes data is most valuable processed close to the source. Edge analytics can be as simple as computing statistical summaries or as advanced as running ML models on video feeds in real-time. Virtual tags, batch analytics, anomaly detection, predictive maintenance — all become possible here. Preprocessing at the edge also reduces load and minimises network traffic.

If data isn't presented in a meaningful way, does it matter? Most users will never touch connectivity or data models — but they will use dashboards and reports. Whether they're process engineers monitoring KPIs or operators reviewing last week's performance, their experience needs to be seamless. Rich visualisations, easy sharing, and collaborative tools that encourage discovery.

No platform exists in a vacuum. Exposing REST APIs for data science tools, sending curated datasets to the corporate data warehouse, enabling digital twin connections — sharing is critical to scale. This is also where MCP (Model Context Protocol) enters: a standardised way for AI agents to query maintenance history, pull real-time sensor data, or update work orders. Experimental, powerful — but guardrails are non-negotiable.

See the concepts in action

Capability Map
David explains the Data Platform Capability Map
UNS Explained
The Unified Namespace: what it is, what it isn't, and why it matters
Key insight

This is not a technology problem alone

We have enough technology available. The vendors in our DataOps Vendor Database are ready to help you implement. But technology without cooperation is just expensive shelfware.

The real challenge is organisational: getting IT and OT to speak the same language, agreeing on who owns the data, building a team that understands both the shopfloor and the cloud. If your organisation operates in silos, your platform will end up fragmented too — that's Conway's Law in action.

That's exactly why we built the ITOT.Academy — a 6-week live online course where IT and OT practitioners learn the frameworks, vocabulary, and cooperation models to push past "just a POC." Built by practitioners, for practitioners. Short, to the point, and brutally real. Our first 40 students scored it 9 out of 10.

Explore the ITOT.Academy

The full article series

The concept of a data platform is straightforward. But in many industrial environments, we're still far from achieving it. Left alone, every system drifts into chaos — entropy is the default. The difference between sustained transformation and one-hit wonders? Recognising that people, cooperation, and governance matter just as much as the technology.