Join the Insider! Subscribe today to receive our weekly insights
Willem (00:00)
Welcome, you’re listening to the IT/OT Insider Podcast. I’m your host Willem and in this special series on Industrial Data Ops, I’m joined by my co-author David and Aron Semle from HighByte. If you want to hear more about what’s shaping the world of industrial data and AI, don’t forget to subscribe to this podcast on our blog. Welcome, Aron.
Aron Semle (00:20)
Thank you, yeah, awesome to be here and talking to you guys today.
Willem (00:23)
And welcome, David.
David (00:25)
Yeah, thanks, Willem. Yeah, I’m really happy to be introducing Aron today to our audience. So Aron started his career at Kepware, later PTC. And in 2020, he became the CTO at HighByte So with over 15 years of experience in the world of industrial connectivity, we’re for a very interesting talk. Aron, can you walk us through your career and introduce HighByte
Aron Semle (00:51)
Yeah, sure thing. So you can hit the highlights there. So I think back in 08, I got out of school, Kepware is located in Portland, Maine. So they were just a local company I interviewed with. It was actually an hour late to my interview with Tony Paine, the CEO at the time, because I’m terrible with scheduling, but I ended up getting the job anyway. So that was kind of my on-ramp into industrial manufacturing. So it started out as a software engineer and kind of grew up in a product management there.
And then Kepware was acquired by PTC, I believe it was 2016, which is crazy. That was almost a decade ago now. And then I was launched into Thingworx. So I was introduced to Thingworx and Kepware and trying to drive that into market. And for folks that know Thingworx, I think was one of the first industrial IoT platforms really in the market. So I got a front row seat to that. And then, yeah, I joined HighByte in 2020. In between there, I had a brief stint at another startup in healthcare.
David (01:33)
Yeah.
Aron Semle (01:43)
And then I kind of got beat up there and I came back to industry where I think I belong and enjoy the time.
David (01:49)
Well, that’s
actually, that’s interesting. I also had a short, I would say I did a couple of things in healthcare myself as well. But for some reason, the voice of the industry or it just, it always directs back in.
Aron Semle (02:01)
Pulls you back. Yep.
Yep. Now it’s pretty cool. The breadth of customers and the breadth of like spaces you get to see in industry. Like I always tell people like we work with fish farms in Scandinavia and then some of the biggest oil and gas companies in the world, know, luxury goods. It’s amazing. The different, the different sectors of the economy you get to interact with. Um, but yeah, so then I joined high bite in 2020. So high bite is we’re focused on industrial data ops. Um, so a lot of the founding team came from Kepware at the time.
kept where we felt like solve the connectivity challenge in the factory, you connecting to PLCs, making that data available via OPC. But we never really put any context around the data. We’d always talked about it, but the main reason for it was there wasn’t a lot of consumers back then of contextualized data. lot of people weren’t integrating with IT systems, even standards like OPC UA, which had structured data in it, like we could implement it, but who is the going to consume structured data sets? So when I joined in 2020, the hypothesis was like, Hey, there’s
There’s more consumers of data in the IT side. Data needs to be contextualized coming out of the factory. This is our hypothesis. And I kind of laugh because when I joined John Harrington is one of the co-founders, you know, you can think of it as like we’re in a small shop. I’m sitting there coding all day, trying to get this thing going. John is in conversations every half hour on the hour talking to all kinds of people about industrial data ops back in 2020. And I would say, you know, with his pitch, which I heard hundreds of times, maybe he got through to five or 10 % of people.
David (03:11)
You
Aron Semle (03:27)
who would go, no, I understand the problem. Like I understand what’s going on and we need this. Fast forward to today. I mean, you guys have a whole podcast you’ve done. You know, I’ve seen kind of your sector analysis on the different parts of a data platform. Like the industry has just matured so much that now we have customers that call us and say, hey, we need to contextualize our data. He tells more about these strategies. So I think we’ll get more into what is industrial data ops, but it’s really cool over the brief period of time to see how fast it’s accelerated.
David (03:30)
Yeah.
Yeah, absolutely. The uptake is unbelievable. Typically, I start this series with the question to, I say, introduce your product and link it to the capability map. But I’ll start with another question first, because in the first episode of this series, Willem asked me the question, David, what is DataOps? because you now mentioned Industrial DataOps, Aron.
I’m gonna ask you the same question, what is data ops?
Aron Semle (04:26)
Yeah, so DataOps is an IT practice, right, that is linking data between desperate systems. you can think of like MuleSoft is a great example, or Boomi is a great example of a DataOps platform in IT. So you can synchronize SQL databases, you can connect the RP systems. So it’s this idea of I can create workflows and move data between systems. So there’s small teams at very large enterprises that would manage that tool set and link data between systems.
industrial data ops is a twist on that that says, Hey, we need the same kind of capabilities to move data in and out of the factory, but we have different sets of challenges when you enter the factory environment. Most prominent of which is like our data is not contextualized or in any kind of shape or form, unless you look at like, let’s call them more modern companies that were founded by it folks like a Tesla, right? Who look at data as their product. A lot of.
Historical manufacturers don’t, right? We’re trying to get product out the door. We’re not thinking about our data strategy. That has changed, I think, in the past five years. But industrial data ops is the application of that discipline to manufacturing with the addition of, we have to connect to systems that are different and we need to provide the context around all this data we have. Otherwise we can’t bring it up into IT because they’re not going to be able to make any sense of it.
David (05:39)
Yeah, thank you. That’s very, very interesting. I also like the, I would say the difference you make with these data first type of companies, like for example, Tesla. And I’ve also had several discussions with them and I would say they’re a data company first and they happen to also have physical assets where the majority are, I say, the entire industry excluding.
companies like Tesla and SpaceX. Well, they start with their assets, and that’s how they start thinking of data as a byproduct, unfortunately. So it’s up to us to change that, I would say, to change that narrative. With your introduction to industrial data ops, think that’s a perfect bridge onto our capability map. Again, for those who are now just tuning in.
link to the capability map will be in the show notes. So we came up with this map because we wanted to have a meaningful conversation. We want to have a conversation not about just buy our product, but buy our product because we are very good at this, this and this capability. Erwin, can you walk us through the offering, your offering and somehow
Try to link that to these capabilities, these seven capabilities we’ve noted down.
Aron Semle (07:07)
Yeah, yeah, it’s your thing. So I think, in my opinion, industrial data ops is still an emerging market. So conversations like this are really helpful to help people frame like what, you what is this? I think if you look at where specifically we lay out in that map, we don’t cover all of it. In fact, we cover probably half of it and that’s by, that’s by design. And I can explain that kind of as we go, but certainly connectivity, right? So we can connect to all the different types of, data sources you’d find in the factory OPC being one commonly for machine connectivity, but historians.
files are really common, believe it or not, especially in test equipment, file data. So we have to parse and translate all that into context, but MES systems, et cetera. very strong connectivity. Contextualization is like our bread and butter, right? So what we focus initially on is not just connecting, because we know that problem’s been solved, but how do we connect and provide rich contextualization to the data as it moves up the stack? So that’s our core modeling capability. Data quality.
I believe we cover, but I’d be curious what your definition of that is because it’s years ago, I read this book called Zen and the Art of Motorcycle Maintenance, which if never read it as a great book, but it’s most of the book is on like the definition of quality and really abstractly what it means. Specifically in manufacturing, you could say, you know, quality is gathering the right data at the right time, contextualizing in the right shape and getting it to where it needs to be. That’s quality data.
David (08:17)
Yeah. Yeah. Yeah.
Aron Semle (08:28)
At the simplest form, think the most complex form is I actually have a temperature sensor that has some bad readings on it, right? That, you know, I can, there are systems in place to try to detect that. That’s a more advanced quality reading capability, which we don’t, we don’t provide that out of the box, but applications on top of us could.
David (08:35)
Yep.
Ahem.
Yeah, absolutely. think when I talk or we talk about data quality, we really mend the entire span here, indeed, from having your basics right, having your right metadata, making sure you actually connect it to the right asset or the right sensor, really up until, I don’t know, yeah, going from this, I would say, bronze or silver layer. So from really your raw data to validated data, filtering out your spikes, filtering out your noise, all these types of things. But indeed, it’s…
Aron Semle (08:59)
Right? Yep.
David (09:13)
It’s also an emerging capability.
Aron Semle (09:15)
Yeah, yep. So I would say we cover that data broker. I think the other one was just, I’m reading some notes here. So I remember storage. So we do provide MQTT as a broker. So you can use this as an onsite movement of data. That’s again, is not our bread and butter. There’s enterprise brokers that do that better. In storage, we don’t offer any storage in the platform. And that’s very, we don’t do that because along with analytics visualization, we don’t cover either because our challenge there is we feel like industrial data ops is this very specific layer of
David (09:45)
Yeah.
Aron Semle (09:45)
I
need to connect data, contextualize it and get it to the systems that needed its scale. And if we start to, and this is a challenge on the R and D side and we develop people like, can you just add a, add a chart so I can graph some stuff? And it’s like, could I do it? Yeah, I could probably turn it around in a day, but as soon as I do that, suddenly all the requirements come rushing in for a visualization platform and we’re never going to be Grafana. We’re never going to be best in class. So we, we like to spend, we think the industrial data ops market is big enough that we can focus enough of our attention there.
David (09:56)
Yeah.
Yeah.
Aron Semle (10:15)
and allow other products that do better analytics and visualization to leverage those data sets more effectively. Because the other side of the thing, you look at visualization companies and they’ll provide some level of connectivity, but it’s generally not as full and as rich as you would need in industrial data ops. So those are areas that we, maybe someday we’ll do it, but historically we’re trying to fill out that, we’re trying to be the best in the market at the industrial data ops. And we think that capability isn’t necessarily included in that category.
the last one would be data sharing. think that’s all sort of bread and butter. So we are moot. don’t, we don’t contextualize and put together data for ourselves. We’re doing it for other IT systems and platforms, whether that’s seek, and you know, any, you name it, any, any platform that sits on top of us.
Willem (11:02)
Okay, well, another question I have. Usually we give it one for later, but since we’re not talking about the data platform, this is our first iteration of it, our first thought in it. How would you change it? Would you add something, remove something, change something?
Aron Semle (11:20)
on the day. No, I think you
Willem (11:21)
Or does it look great
the way it is now? I’ll take the compliment.
Aron Semle (11:25)
Yeah, it does look great. Yeah. No, I think, I think you covered all the capabilities and all of those are like requirements, not requirements, but questions we’ve been asked, right? Can you, can you add storage? And, and I think, I think our answer is there was we try to leverage what existing. So when the storage question gets asked, right, we have connectivity to PI to influx to timescale. If you want to take your UNS and take part of that and historize it, you can push it to that existing historian technology you have in the factory. And then we can have an API that pulls that data back out or pulls out averages to use.
You know, in some kind of contextualized data flow you have upstream. the idea is, cause one of the challenges is if you come in and say, okay, I’m data ops, I’m going to layer on top of your existing industrial environment because it’s heterogeneous and kind of a mess, right? But I’m going to provide an API layer on top of that. But if I also have storage and visualization, I’m going to start to compete potentially with some of the other solutions you already have in the factory. And if you’re trying to roll out a data ops strategy, and next thing you know, you’re at a factory in Illinois fighting over what the SCADA system is going to be.
You just got dragged down in your hole, in what you’re trying to accomplish. You’re not trying to rewrite their SCADA. You’re trying to enable enterprise wide use cases on top of the data they already have.
Willem (12:34)
So Aron, data ops still is very, let’s say, fluffy, non-concrete, vague for a lot of people. Like we said, in manufacturing, you brought a use case, maybe it can become something a bit more concrete for most people.
Aron Semle (12:50)
Yeah, yeah. I love to be more concrete, less abstract personally. So like, I’ll give you a really simple example. A lot of people can relate to. So we have a really large industrial customer that has like hundreds of warehouses across the States, right? Or the world in those warehouses, you know, over the last decade, they’ve implemented AGVs or autonomous guided vehicles, right? So one of the challenges that, so these are robots that are going around, picking up parts and moving them around. One of the challenges they have with that, that a lot of people can relate to that have had a Roomba or other, you
robotic vacuum devices, they get stuck, right? Stuck under a chair, stuck in a corner. So these AGVs are no different, but they’re just more expensive. So you can imagine large AGV is stuck in a corner. It’s supposed to be doing something, but it’s stuck. That’s downtime for a warehouse, right? You can consider that now. It’s supposed to be doing work. So that was their challenge. You dig into that challenge and the issue is that, well, they actually have like three or four AGV vendors across all of their…
David (13:30)
You
Aron Semle (13:48)
factories. So wasn’t like an enterprise wide purchase. One AGV vendor has Modbus, which we’re all familiar with an industry, right? Another one has OPC UA and then another one implemented Sparkplug B over MQTT. So these are just different interfaces to get access to the data. These AGVs are all doing the same thing. So what they did is they built a use case really simply that monitors. So they built a data model. They said, what data do we need to be able to detect if one of these things is stuck? So that’s what we call a data model.
The data model is, know, current location is it currently in production or not? You know, is it sitting on the sidelines? There’s supposed to be doing something at what was the last time it moved really simply. Right. And there was more data to it than that, but those are a couple of data points that needs. you sent with data ops, you send that data model to the factory, you know, with the high bite there and the message is, Hey, here’s the data we need. You hook it into your factory systems where the data is coming from, whether it’s Modbus, spark plug or OPC UA.
hydrate the data model and then send it upstream to this AWS Lambda function that we have that they wrote one time, right? That Lambda function processes all the data from these AGVs coming in, detects when one’s stuck and then sends a message back down to the factory to say, Hey, AGV X is stuck in this corner. Operator gets that through their local SCADA system. Let’s say it’s ignition alert comes up, they go find the AGV, kick it, it’s back in production. That’s a really simple.
It’s it’s a simple use case to understand, but it highlights all the challenges of, Hey, it’s a use case that scales across the enterprise, but all of my factories are different. I can’t deal with the enterprise level with those differences. I need that layer in there that OT can work with, but then IT can manage to enable these use cases that drive real ROI. Like we’re not moving data for moving data sake. There’s a real num financial number behind that use case.
David (15:15)
Yeah.
Mm-hmm.
And it’s a really interesting use case. it’s another, it’s an, I would say a non-standard one, typically we would talk about standard industrial assets connecting somehow to, I don’t know, predicts failures or something like that. But I can imagine this robot being stuck somewhere in a corner. You mentioned IT versus OT. Like in this type of companies, especially in logistics companies.
I would say the intersection or the boundary between IT and OT must be very thin because if you are building or running warehouses, they should probably be very tightly integrated into their ERP system. what did you encounter there? this an OT-led initiative? How did collaboration go across these lines?
Aron Semle (16:30)
Yeah, that’s a great question. So there they actually had a team that I would call it, they didn’t label themselves this, but I would call it a data ops team. So the idea, you know, the long-term vision of industrial data ops is for a large manufacturer, you probably have a team of half a dozen people that manage these data ops solutions. And that was the case there. So that team includes both some folks from OT and IT knowledge, right? And that our most successful customers have that again, they don’t call them data ops teams, that’s what you see it. What they basically have is like,
They have the tooling in place. And then they have, think of it as a backlog of potential use cases of, okay, this is one of them. It’s the AGV use case. Well, we have this other one around scrap reduction on our rolling machines, right? And these use cases can come from corporate. They could be supply chain use cases. Like we need to share these data sets with our supplier. So they’re more aware of when, you know, when we need orders or they could be very specific to the factory. should, they could come from the factory and come up the chain and say, Hey, we found this scrap reduction use case. We think it’ll scale everywhere else.
David (17:09)
Yeah.
Aron Semle (17:27)
This data ops team is responsible for defining those, the data models and the data flows up to the corporate systems and then distributing those locally again, to use the local OT talent as effectively as they can to say, Hey, can you just enable this? Like I’m not asking you for all a thousand tags you have in the site. I don’t even want that. That would be nice to me. But if you give me this set of 30, we could, we could save, you know, half a million dollars a year on procurement as an example. Right. So it’s really targeted.
kind of business ITOT divide.
David (17:57)
Yeah.
And maybe to even take this one step further. So what makes or breaks a data strategy for this use case, but maybe also in general, what is a good data strategy? What works? What doesn’t?
Aron Semle (18:17)
Yeah, I’ve seen so much now in five years. A bad sign is when a company approaches you and says, I have 100,000 tags changing once a second. Can we stream this up to the cloud? That’s generally not a good sign, right? And it’s not that we haven’t helped customers do that. We can do it. But what happens is in those kind of use cases, you spend maximum engineering effort doing an integration to some cloud service.
You’re, spending the max ticker on the cloud side because all that data is being ingested and you haven’t defined one use case. We’re going to drive out. Like it’s, it’s not a bad use case in the sense that, know, you’ve just digitized some of your data and there’s probably value there, but you’ve yet to define any of what that value is. And what eventually happens is someone on the business side comes in, looks at the expense because what’s the ROI. The answer’s mushy.
And I’ve seen those projects get killed and it, it, as an engineer, sucks, right? Cause you put all that time and investment, you’re like, no, we did make progress, but we just didn’t prove that last mile out.
David (19:11)
Yeah.
But you just
had to raise the limit of your credit card.
Aron Semle (19:21)
Right, right.
Willem (19:22)
sponsoring
those poor big cloud providers. I mean…
David (19:25)
Yeah,
yeah, doing a good thing.
Aron Semle (19:28)
Yep. Which the cloud providers are coming around on it too. think early on, they were hungry for those use cases. I think that’s churn for them in the long run too. So they do, you they want to see their customers get value out of the use of the cloud and then expand their use from it. So that’s generally a bad sign. If it’s completely OT driven, which a lot of times when those kinds of things happen, like I want to get all a hundred thousand, this conversation starts there. That’s usually coming from the OT side.
The other side is if it’s IT, solely IT driven. you’ll see, we’ve seen times when a mule software, a boomi gets put down into the factory floor as an ETL tool to get data out. It doesn’t have the connectivity the factory needs. doesn’t have the contextualization. OT people look at it and have no idea what it is or how to interact with it. And it ultimately fail or, you know, something like Kafka gets pushed down, which is it’s complex for an OT person to understand.
So generally it’s a good sign if IT and OT are both involved and there is a clear use case. think that’s one of the things that makes us a little different than some of the other vendors in the space is like we are very use case driven and we believe in, you know, building a UNS, let’s say, or building an architecture. You want that vision in place, but use case by use case, you can build and leverage that up versus like, you know, day one, I’m going to send all a hundred thousand tags to the cloud. We’ll build the architecture that eventually you can get there, but let’s move the data contextualize that you need today.
that you can drive value from and then build it from there.
Willem (20:51)
How do you manage that tension between going use case by use case with a very short term view, limited scope of course, with a long term strategy, a long term view and finding out what the good choices are that you have to make now in order to leverage the future?
Aron Semle (21:11)
Yeah, it depends sometimes on the culture. think, I think over the past year or so, it’s definitely improved. Like a lot of times our first interaction with customers are like, Hey, we have this use case, but, you know, here’s our overall architecture plan. So they actually come to us with some of that in place already. I think if you look back farther, you either get like, here’s my a thousand tag thing. And I’ve just designed an architecture with no use case. And you have that’s, that’s not terrible. Cause you can ask, well, let’s.
hypothesize some use cases on top. are you going to do with that data? That’s an easier conversation. Where their use case specific and too focused on that. think a lot of times those customers probably aren’t even reaching out to us, right? Cause they’re still kind of in the industry three-oh mentality of I’m just going to make these bespoke point to point connections and I’m going to use some custom code or an integration I’m aware of. And they’re not even thinking about, how could I leverage this data at other places? How could I build the architecture? I think that part of the market shrinking, but I think it’s still there.
Willem (22:08)
Okay, now that’s a bit of a different question because you’ve been in the industry for some while. But at same time, you’re like between manufacturing and also data, which is quite broad across a of different industries. What makes data in manufacturing different according to you compared to finance, healthcare and other industries? Why are we so late with adoption?
Aron Semle (22:32)
Why?
Because we’re focused on units of output for square foot and not, and just getting the job done. Yeah. I think different industries are like different segments are in different places. I think discrete is probably farther behind than most because it’s more people process. You know, if you look at batch, like pharma as example, they’re probably farther along and digitization. it does depend somewhat on industry. but I think, yeah, just in general, we don’t, we have a lot of time series data.
It’s not all of our data, right? But let’s say 80 % is actually machine data coming out. And that’s, that’s slightly different. I mean, if you look at financials segment, like they have, you could call it time series data, like rapid, you know, stock tickers and all that financial data is similar, but not quite the same. Um, but then we also have this blend of, Hey, we have mostly machine. We have a lot of machine data, but we’ve brought in these other IT products over time have kind of come into manufacturing. Like SQL is very prominent, right? We still have stuff in files. Um,
But it’s funny, people that join HighBit are people that aren’t familiar with the industry. My joke is, you know, go walk through an IT data center and take some notes, go walk through a manufacturing plant, take some notes, and then tell me about the difference. And I think that’s the heart of it, right? It’s like, we’re not this nice, neat blinking light. Everything’s in a rack. Like, no, it’s, I mean, it’s a mess. Like how we make stuff is a mess.
David (23:38)
Yeah.
Yeah, just recently I walked through a production facility where it was still all Windows XP, right? And you know, when you see Windows XP, you also know that they’re probably not running the latest version of a Skate application.
Aron Semle (24:02)
man. Yep.
Yeah,
yep. And it’s so much easier to be like, hey, if it’s not broken, don’t fix it because we’re focused on the bottom line of production. And we know that strategy isn’t good in the long term, but in the short term, it’s an easy compromise to make.
Willem (24:22)
Even in the short term, when I see those Windows XP PCs, I ask how long has it been there? As long as the pump. And if that PC breaks, what happens? Yeah, then we have a problem. Do you have spare parts for that PC? Can you reinstall it? And then it’s like, crap, you’re asking the annoying questions here.
Aron Semle (24:35)
Great question.
Yeah.
Yeah, is Windows going to still sell you an XP license for that thing?
Willem (24:49)
Will
David (24:49)
You
Willem (24:49)
you
even find a way to install it and configure it? Probably there’s no manual also.
Aron Semle (24:54)
Right?
David (24:55)
No, that also is kind of related to Unified Namespace UNS. You mentioned it a couple of times. It’s a concept. We also talk about this concept. have tons of questions, Aron, but maybe just to start off again from your perspective. What is the UNS from your perspective? What do you see regarding adoption, et cetera?
Aron Semle (25:22)
Yeah, UNS is, I would say it’s hot, right? Like a lot of people know the term now, know, Walker Reynolds kind of evangelized it and it’s definitely reached starting to reach critical mass. think our perspective of UNS is a little different than most vendors in the sense that a lot of times you think of UNS, you see it as, I’ve got data in MQTT and I’ve got a client connected and I can see this tree and I can see the data flowing. And people say, well, that’s the UNS, right? It’s the broker and it’s the data in motion.
And then you’ll hear people say, Snowflake is my UNS and there are some other system. And then it’s like, okay, well, what is this thing? part of its, part of its genius is it is hard to define. Right. And, but it, in our perception, you know, UNS is this idea of I have a logical way. think of the factory, right. And think of that as that’s the MQTT address space, hierarchy, all that stuff. And then, so I have a logical way I think of it. And then I have, I’ve linked that logical model to the actual data that’s in the factory that can hydrate it.
David (25:54)
Yeah, yeah, yeah.
You
Aron Semle (26:20)
Once I have that definition in the linkage, then I can put that in motion over MQTT. I can publish that all over a broker and that data is available. I can move it to Snowflake and I can move it to these other sources. So it’s a little more abstract, but that’s our, that would be my definition of UNS. It’s the logical way I think of the factory linked to the sources. So I can put that in motion. And I think that’s powerful in the sense that, you know, we see a lot of customers that build and this is an OT initiative. We’re going to build UNS. So they go in and they,
contextualize their entire, every asset in their factory. They spend six months arguing over the perfect data model for an injection molding machine. And then they, I know, right? Right. Let’s define this thing. And then they publish it to MQTT and like a year later, you you jump in, you see what they created. It’s like, this is cool. Who’s using the data? Crickets, right? And like they’ve made progress, but that’s the use case driven approach of, don’t
David (26:52)
Yeah, over.
Willem (26:55)
Only six months. I mean, it’s engineers together.
David (27:11)
Mm-hmm.
Aron Semle (27:16)
Don’t just build a UNS to build a UNS. Have a use case. Maybe MQTT is the ideal transport for that use case to the consumer. Publish it through that and then continue to build that out as you go. But don’t start with, know, I just need to throw all of my data into a broker and I don’t have any consumers. It’s like a bunch, it’s just a bunch of data shouting in a room and no one’s listening. It’s like, what did we accomplish by that goal? And I think.
UNS has other challenges when you think about MQTT, like it doesn’t support transactions. People try to do writes through the UNS MQTT broker. And in my opinion, that’s really dangerous because there’s no feedback loop inherently in the protocol. It’s not good with historical data. So there’s just other, there’s other data movement paradigms and data types in the factory that it doesn’t address enough that I think over time, the definition of UNS will morph. But we’ll see.
David (28:07)
and any crystal ball statements in which direction it should morph in your opinion.
Aron Semle (28:13)
I think
eventually what it becomes is it’s almost like this API layer for the factory, right? Where, and I don’t say that because I think high bite is a UNS. would never make that statement. I think that’s too broad. and I’m not big on just like jumping on marketing terms. but the idea that, Hey, it’s the abstraction between what I have in the factory for assets and control PLCs, OPC servers, all that, versus how I logically think about it. And then I, how, how I interact with it.
Streaming data over MQTT is just one of those interaction patterns, but I could have an API where I could reach in and request data. I could change set points, for example, and pull historical data, other means, but more holistic in terms of all the data patterns we see in the factory.
David (28:55)
That’s an interesting one. gonna I’m gonna I’m ask you another another buzzword. Well, artificial intelligence obviously is one, but I saw you writing a post on LinkedIn on edge. So that’s two best words edge computing and artificial intelligence. You also said you’re you’re bullish on edge AI so.
Aron Semle (29:11)
Right.
David (29:20)
Again, same question. What is HAI and why do you think it has such a great potential?
Aron Semle (29:26)
Yeah, yeah, I’m excited. We’ll see. I’m bullish, but we’ll have to see by the evidence. But we are getting pulled into a number of initiatives around the use of edge-based AI, specifically around operator assistance, which I’m excited because like in the post I mentioned, a lot of initiatives we’ve had over my career are all about getting data out of the factory and better planning, better predictive maintenance. They don’t necessarily help the operator who’s sitting in front of the line. And I think edge AI lowers the tech barrier of entry that
that an operator could actually benefit significantly from it. So I’m excited there. think edge-based AI is the concept of what you’re doing in chat GBT, but I want to run that locally. And the reason I want to run that locally is because there could be a lot of IP in leveraging this and feeding it the right data to make interesting, meaningful output. And if I do that with an LLM owned by some large tech company,
Could they just rip that off and then go scale it all my competitors really quick? So like, I don’t want my industrial data going out to the factory while I figure this out. So I actually want to run this locally in the compute. So I have Nvidia cards or whatever it is running these at the edge to experiment. And what’s interesting about that is, you know, we contextualize data for inside the factory, but also that layer on top with IT systems for that internal contextualization. really speeds up the need for that, right? Like I’m not, I’m not just contextualizing data for IT consumers.
David (30:27)
Yeah
Aron Semle (30:49)
I’m going to contextualize the data inside the factory for this new edge-based AI consumer. And what’s fascinating for me, where I think this is going, is you’re going to have really hyper-tuned agents, edge AI agents, and you’re going to have to, from a data ops perspective, provide really specific data sets to really specific questions. And you’re going to need the tooling inside the factory to do that, to say, hey, a shift change, this is the specific data I need to feed into the LLM so it can have an inference on like,
how fast we’re going to be able to speed up on the next shift. But those kind of curated specific data sets, which are use cases, where products like ours are integral to exploring that kind of capability to build that on the fly without needing to write code and do a lot of customization. So whether or not the LLMs can come up with meaningful results and not hallucinate, that’s the open question, which we’re trying to figure out.
Willem (31:40)
Yeah,
because usually when we hear about AI in this space, it’s mostly analytics, machine learning, those kind of AI applications. You’re specifically mentioning LLMs. I was just wondering, do you have some concrete examples of how LLMs can work in a production environment where knowing something for 95 % usually doesn’t cut it?
Aron Semle (32:05)
Yeah, I think they can be suggestive, I’ll call them suggestive agents. I guess I’ll just make that word up. the idea being that if the operator can go to this thing and ask simple questions and get like broad answers, right? Like that could be like, hey, you know, I’m behind on my shift. What could I do to catch up given what you know, you’ve seen historically, and it can actually give me suggestions on things I can investigate. It’s not going to do process control, right? It’s not going to go change set points and all that, but at least assist me.
while I’m on the line trying to fill the hopper, do all this stuff. Like I don’t have time to jump to the HMI and be like, click, click, click, let me look. Like don’t have time. But if I could just ask this thing and get some kind of inference out of it and in terms of what’s going on to clue me in where I should look, I think those are the early use cases that could be really interesting.
Willem (32:50)
So we’re
not going to make chat GPT closed loop applications.
Aron Semle (32:55)
I would not. All my experimentation with it, I would not. Like nine times out of
David (32:58)
You
Aron Semle (32:59)
10, you get a great answer and then you get like a cat meme and you’re like, I don’t like, right, right. Not, I don’t think that’s not reliable enough yet. Yep.
Willem (33:02)
Your PLC says does not compute.
David (33:11)
Although a cat meme from time to time in operations might also just… Yeah, on your HMI.
Willem (33:18)
On your HMI, on your HMI. Like everything’s
Aron Semle (33:18)
Maybe that’s what you need. Yeah.
Willem (33:21)
going to hell. Here’s the cat meme to deal with the stress. I don’t know how you would sell the business case, but might sell to the internal user.
Aron Semle (33:22)
Right.
Yeah, yeah, that’s awesome.
David (33:33)
At least we had fun while the thing exploded.
Aron Semle (33:36)
Right.
Willem (33:38)
Just another question, like you said, 80 % of our data is time series data. I’m also expecting that we’re going to see more and more variety in the type of data. I’m hearing that lots of use cases even include video, audio, other types of data that are not only time series based. How do you deal with that in a data ops space?
in general, how you contextualize those kind of things.
Aron Semle (34:11)
Yeah, so we do, we do a lot of use cases with like images, you know, for, for a, like, AI and, image recognition quality stuff. So typically in data ops, the way there’s two ways you can do it. can either move the data as binary, you know, up to S three or Azure blob or whatever mechanism. The other thing that’s pretty common is you base 64 and code both images and video, and we’ll send that stream that over HTTP and, or sometimes MQTT as well. So there’s different ways to.
and when you base 64 encoded, then you can start to embed other metadata. Like you think of it as a JSON payload, the files base 64 encoded, but there’s metadata around like what line it came off of timestamps, kind of thing. So there’s different ways to slice and dice it.
David (34:52)
That’s interesting because then I would say the image itself is stored probably somewhere at a cloud provider. Do you also see the other way around where you do the processing on the edge?
Aron Semle (35:04)
Yeah, yep. you can also feed, depending on the infrastructure they have, you could feed that to local. And a lot of times that has more to do with fault tolerance. Like, you know, if the cloud went down, it doesn’t happen often, but could you deal with it? If not, okay, you need to run it locally. Do you have the compute and the hardware to run locally as well?
David (35:22)
One of the challenges I’m seeing, not just with this type of data, but with, I would say, manufacturing data in general, is that most of the, let’s just call them the cloud providers, so the big ones on the IT side, they’re not really used to hold manufacturing data. Well, you can store everything in blob storage, obviously, but to make sensor data readily accessible, that’s sometimes quite, I would say, quite a challenge.
But I think what’s even more challenging is to also stream that metadata, the asset context, production context, MES data, et cetera, et cetera. So I’m just wondering, again, from your perspective, how is this right now? Is this changing? Do you see certain cloud providers moving towards some kind of a standardization like what we already have for so many years?
Let’s put it the other way around. I have the intention or no, sorry, not the intention, the feeling that we’re kind of reinventing the wheel today for something we kind of fixed 20 years ago in OT.
Aron Semle (36:34)
Yeah, I saw your talk with a twin thread on this. I think there is a challenge there, right? So the cloud providers do store it in different formats. You are starting to see Parquet as a format for time series data start to emerge more. Amazon just added this S3 table support, which is Parquet under the hood. That is a standard. It’s kind of like SQL, though. You can format it in Parquet. It’s got a schema. You can format it however you want. But you can open Parquet files.
It is a challenge. My only take is that like the challenge we had in the factory, let’s talk about like OPC servers of connecting to PLC protocols. That was, that’s hard. Right. Writing PLC drivers is hard. mean, I came from Kepler, like I know it. for all the HMI vendors to go write top quality PLC drivers to all the, all it was almost impossible. Right. So what emerges is this OPC standard and companies like Kepler emerge. Cause that was a really challenging problem. Like if I spun off one of my guys and said, Hey,
or one of my engineers and said, go write a control logics driver. That person’s gone for six months to a year, even if they’re efficient, right? Like it’s hard. It’s not easy when you get up to the layer of, we’ve got contextualized data, you know, in, in the cloud, but it’s slightly different formats. Like it’s not as big of a challenge to write a parser, be able to manipulate that. And I think that’s part of why a standard doesn’t exist today yet. That doesn’t mean there isn’t a need for one. And I totally agree that a standard would benefit.
that space, but I think that might be the reason we are where we’re at today. And competition of the cloud providers as well.
David (38:05)
Yeah. Yeah.
Willem (38:09)
I have one last question before we wrap this up. You work with OT, you work with IT. I’m curious about your experiences in working with IT data teams, because usually they come from an environment working more with relational data, ERP data, and their tools and mindset are really centered around that, because that’s where most use cases for them are. What are the difficulties in working with them when you’re working in manufacturing?
and how do you evolve that relationship?
Aron Semle (38:42)
Yeah, the first challenge is you have to get all their lingo, right? So then they’re like data lake, data lake house, data mesh. Like you have to go chat, GBT or Google, all that to try to understand what they’re talking. There’s a lot of terms they throw around on the IT data side that are for some reason water related. Like, I don’t know. But like all the analogies. But typically the best.
David (38:53)
You
water related pipelines.
Willem (39:01)
Data flows, it’s the new oil.
Aron Semle (39:07)
mean, really smart folks, right? The best thing to do is just try to deliver the data in the format they’re most familiar with. So maybe they want it through Kinesis data streams where they can process it with, I think Panda is one of the technologies, but whatever they’re most familiar with, you’re better off trying to meet them there than to force them down into the factory to deal with any of the, you know, the intricacies that we have. So I think a lot of the times when we’re dealing with, I guess they’re less IT data groups. We’re dealing more with like business units that have IT data folks on them.
So being able to, maybe it’s Parquet, but being able to deliver it in the format, in the location that they’re familiar with in their cloud storage mechanism and not creating like a one-off new technology for them to learn. think the other thing is from our product perspective, like we’re divided straight down the middle between OT and IT. So when we’re on the floor and you’re logged into the system, you can browse tags and stuff. You can drag and drop all the OT stuff that we’re familiar with. That IT surface layer that we expose and the capabilities are really familiar with, with the IT. So being able to containerize those and deploy us at scale.
managed state, being able to observe like hundreds of instances that are running and be notified. So we play really well with both of those groups. And think that’s really important in industrial data ops ecosystem, but specifically with the data, you’re best off just asking them, what do you want? And then delivering it in that format and dealing with all the transformation and hardness on our side.
Willem (40:25)
I want 100,000 tags with one second resolution in my data lake, data house, data pool thing.
Aron Semle (40:32)
Right,
right. The answer to that is no you don’t. You actually don’t, but you don’t know it yet. Yeah.
David (40:34)
We should, we should, yeah, we should intervene. should. Why don’t we
Willem (40:37)
Yeah, let me show you the bill.
David (40:41)
introduce the data jacuzzi or something like that, you know? Hey, Aron was really great having you on the show. That’s a wrap for this episode of the ITO, the insider podcast, where we again explored how to make industrial data work for us. So thank you again, Aron and Hi-Byte.
Aron Semle (40:43)
Right, right.
David (41:01)
for sharing your insights and to you our listeners for tuning in. If you enjoyed the episode, don’t forget to subscribe at itotinsight.com and leave a rating. Yeah, and see you next time for more insights on bridging IT and OT. And until then, take care. Bye-bye.