Join the Insider! Subscribe today to receive our weekly insights

?

Willem (00:00)
Hi David!

David (00:01)
Hello Willem.

Willem (00:03)
So, who did you bring today?

David (00:05)
Another David.

Willem (00:07)
Okay, this is gonna be confusing.

David (00:09)
So David Rogers, thank you for joining us. David is not just another IT guy. He combines experience from the manufacturing world, for example, at Boeing and Sightmachine with a strong background in AI. And he’s now the senior solutions architect at Databricks, focusing on their manufacturing customers. yeah, so David, welcome in the show.

David Rogers (00:13)
Thanks for having me.

Thanks for having me. Yeah, it’s pleasure. Big fan of your work. Just really happy to be on here with you all and talk a little bit about things going on in the manufacturing space and all things data and AI.

David (00:47)
Yeah, and we are looking forward to this as well. I have to say Databricks, think it’s currently, it’s all over the world. We come across Databricks probably every day. but for those who don’t know what Databricks is, can we start with just a short introduction? What is Databricks? What is your role at Databricks? What do they do today?

David Rogers (01:10)
Excellent. Yes, Databricks is a cloud data platform that sits on top of all the major hyperscalers. So Google Cloud, Azure, and AWS. And what we provide is an integrated tool set to be able to do any sort of data and AI workload. That can be around ETL, extract, transform, load. It can be around generative AI with large language models or AI agents or classical ML around…

linear regression and support vector machines, you name it, classification algorithm. So all integrated tooling with integrated governance and be able to scale very efficiently using the cloud. So that’s the Databricks technology and something I’m looking forward to kind of tell a little bit more about what we do in the space in manufacturing. I come from a background in manufacturing, applied AI into the manufacturing space. You kind of listed off some of the places I’ve been before.

And it’s, that’s what I bring to Databricks today is I lead our industry solutions. So we’re thinking about all parts of a manufacturing enterprise. So that could be a supply chain, could be in the factory, in the operations domain, could be in the customer realm and the customer experience and understanding you have equipment and machinery that’s performing in the field, being able to understand how that’s performing and manage it through its life cycle. So all of these sorts of things, a manufacturer has data coming at them from

David (02:31)
Mm-hmm.

David Rogers (02:35)
all different sides and they need to be able to process it and manage it and govern it. And that’s very, very important, that governance aspect. And we’ll talk a little bit more about that here today.

David (02:45)
Yeah, sure.

Willem (02:47)
before we get into the exciting topic of data governance. Just kidding. Now, I’m hearing that you work in a manufacturing team or a manufacturing industry group. What sets it apart from the other groups? Because I’m sure that they’re all saying like, no, for us, have lots of data, data governance is important and we’re unique. What sets the manufacturing group apart?

David Rogers (02:59)
Yes.

Willem (03:11)
groups.

David Rogers (03:11)
Yeah, really

just the way we deal with the physical world, right? With manufacturers, they have these physical assets. And so for us, there’s a lot of different partners in the ecosystem that are really, really key to us to help us reach where the data is being generated, right? It’s generated in the physical world somewhere, and it’s very difficult to process that where it’s being generated. And so bringing that data into Databricks in the cloud,

can really help drive efficiencies there. Some of the other teams across Databricks maybe have a fully digital substrate. So it’s a lot easier for them to bring in data across the, you know, within their ecosystem. But in ours, we really have, we’re excited to be at Hanover this year and have our partners come with us in this journey to realize this physical AI.

David (04:06)
Could you like, we follow the bits and bytes, let’s assume we’re at the shop floor somewhere, maybe you can share an example or so. Lots of the, I would say many people in our audience are probably aware about SCADA systems and all the alike on the shop floor. I’d love to follow the bits and the bytes a bit from the shop floor until they end up…

Maybe not even in Databricks, but they end up in a data product somewhere. Could you like walk us through a typical story?

David Rogers (04:42)
Yeah, absolutely. I think in a factory context, there’s a lot of data being generated and it’s usually generated and it stays oftentimes very local to that factory. There’s many reasons why, but the starting point is that they have acute problems. There’s a stop in production. There’s a quality issue where they have, you know, product escapes or things that they’re finding out about. So they’re constantly triaging these problems. And the only way you can triage them

is by looking at the SCADA system data, the data historians, the other sorts of inspection equipment that’s been collecting the last n number of products that went through. And that’s what engineers, process engineers, factory managers are looking at every single day to keep that factory running, yielding the appropriate yield, and really improving their performance over time.

And so all of that data, it’s sometimes trapped, it’s sometimes forgotten about, unfortunately, it’s deleted or it runs out of memory. But when you start to think about the AI age that we’re living in now, it’s that long context of history, all the different products you’ve run in your factory, all the different changes you’ve made to your line, all the different maintenance procedures, maintenance actions you’ve taken, all of that is data. It can be natural text, it can be time series data.

It can be image data, can be video data, all sorts, every different data type you can imagine is something that’s being generated in a factory. so the cloud is a way to be able to process this very, very efficiently. So that’s really where we come in in our technology. But there is a long journey there to track the bits that are coming generated at the source on the machinery, on a PLC to get it all the way to the cloud.

happy to kind of take you through that in more detail. But I think it’s really important to think about just the broader estate of what is data and then help you think about how do I use that data and use it more effectively to drive whatever outcome I’m looking to in the factory.

David (06:47)
That’s interesting. We touched on that also in a previous episode of this series, the fact that you mentioned again this multimodal or multiple types of data, because I think in a traditional manufacturing data system, or whatever you want to call it, we typically work with sensor data and maybe some manual inputs, maybe some transactional data which comes out of SAP, et cetera.

clearly mention other types of data as well. you see, I would say, is that more from your experience an add-on or are there really, I would say, very interesting use cases to be developed using these types of data we’re less used to or less, I would say, capable of working with?

David Rogers (07:39)
Yeah, we’ll take video, maybe the hardest sets. Oftentimes factory environments have safety concerns that you have to be aware of and you wear PPE, personal protective equipment. All of that, being able to maybe automatically detect that off of a CCTV, then it actually will bar you entry from that work cell or that place in that factory. That is something that’s real. Factories are doing it today. They’re processing that data.

in Databricks. And that is something that’s, is really the thing that matters the most, right? The health and wellbeing of the workforce. So I think like, that’s a good one where it’s like, you don’t think of that as like, hey, I need CCTV data to run my factory, but you really do because you need your workers to be able to, you know, be safe in their environment so that they’re able to do their job efficiently and effectively. So that’s a, that’s a big part of

One of these kind of data sets you don’t really consider. Of course, the actual sensor data and all that, everyone’s trying to use that data, but there’s a lot of other contextual problems when it comes to that that the industry’s trying to solve in order to make sense of it for AI and ML systems to then provide the value back into the workforce to take action quicker.

David (08:45)
Yeah.

Willem (09:03)
You also mentioned that you guys are really strong and everything that’s AI. Now, to most people that’s still Chat GPT. I guess you would probably have a more nuanced view on what it means in a manufacturing context. Because I once tried to ask Chat GPT, can you run a plant for me? And it’s politely declined. maybe, do you have some insights for that?

David Rogers (09:12)
Yeah.

Absolutely.

Absolutely.

Yeah, AI is a it’s a it’s an amazing term in terms of it’s a, you know, kind of dreams of chat GPT and these humanoid robots at this point, like all sorts of different stuff. But really, AI is a set of technologies that allows you to, you know, have a machine learn some some set of patterns in the data or the way the equipment’s operating. So this can be, you know,

David (09:40)
Yeah.

David Rogers (09:58)
regression type algorithms where you are determining certain variables are occurring in your factory produces certain outcomes. Very explainable, very kind of, it’s doing a very relatively simple optimization. Then you kind of get into other type of AI algorithms to maybe determine, these are kind of the neural networks of the world that determine like able to detect something in an image. So you’re able to.

create a bounding box or a segmentation to say there’s a scratch on this part. And then you start getting into language models that maybe can help reference maintenance manuals or things like that where you’re trying to repair a machine, but you have gloves on and you have glasses and PPE. And so you want something that you want to just be able to talk to it. And it tells you back, look under this thing and then turn that switch off so that it shuts everything down and you can go operate on the machine.

And sometimes, like, so all of these different things like AI, it’s a different AI approach, but it’s all under that umbrella of AI to think about. So when someone kind of tells you AI for, for manufacturing, really start to think about what is the actual problem you’re trying to solve. And then you can start thinking about what is the AI technique or approach that will help you go solve that.

Willem (11:18)
So it’s more like a menu of techniques, solutions, technologies that you’re going to use. I think what they all have in common probably is having access to that data and making sure that it’s complete, that it’s understandable. That brings me a bit back on the topic of data management, the data governance, to be able to do those things. How do you usually see that happening when you enter in a project? Do you have like…

David Rogers (11:33)
Yes.

Willem (11:45)
a wonderful data set all labeled complete high quality and then you’re asked to do the or do you just get like a bunch of data incomplete with errors and are you asked to perform miracles?

David (11:53)
You

David Rogers (11:58)
Yeah, you get a little bit of everything. think the key is really thinking about what is the outcome you’re trying to achieve and then making sure you’re able to align the data against it. So you don’t want to pursue a project where, yes, maybe algorithmically or the AI techniques available to solve it, but the data isn’t available and vice versa. Maybe the data is available, but there’s not actually a way, an algorithm of going solving.

David (12:00)
Hmm.

David Rogers (12:28)
We’re very pragmatic at Databricks, making sure our customers are using our product to drive outcomes in their business. So it is something that we work with them to think about where their digital manufacturing maturity is at to help them say, okay, this set of data, you’re using a UNS or universal namespace to understand the context of that data before you bring it into the cloud.

So when you have really smart data scientists and data engineers working on this data in the cloud, then they’re able to understand what is actually happening on the manufacturing process and they can put the pieces back together. If you don’t have that context that UNS or ISA95 or other ways of kind of tagging that data from when it originated, what product was running on the line, which works cell, which machine, et cetera, then you’ll never be able to put that puzzle back together because

When we think about manufacturing, it’s a line, right? So you have a cause effect upstream and downstream of whatever operation. And that’s in order to train any AI ML algorithm, you want to be able to trace back from where the data was generated to where the maybe quality issue happened or degradation in performance happened. And in Databricks, in the governance layer, we can govern each piece of data at scale.

And then we can put together the lineage of this data so we can understand what transformations happen on it. What, what was it joined with from the MES or the ERP system about what product was running and really bring that rich context and then allow a data scientist and AI practitioner to then be able to reason about holistically what’s happening in the manufacturing process to then devise the algorithm. So that

greatly increases their chances of success of pulling off a data product project within their factories. And it’s something that we uniquely kind of are able to deliver on and then alongside all of our other tooling and capabilities within the Databricks platform.

David (14:39)
I’d like to understand a bit the let’s say the balance between where you actually have your master data at which level you also have a product I believe is called Unity catalog so could you like is there what’s the best practice from your opinion

David Rogers (14:52)
Yes.

Yep. So Databricks is a cloud-based technology. We’ve open-sourced the Unicad a lot governance framework. That is our core governance framework within the Databricks platform. So it relies on cloud constructs to govern the data. So that once you get the data into object storage within the cloud. So for those unfamiliar, that’s AWS S3, Azure Data Lake Storage Gen 2, or Google Cloud Storage. Once you get it there, however you get it there,

We have many, many great partners in the space that can help you get it there. But you just get it there with all the contests. If it’s JSON, if it’s XML, whatever it may be, whatever the payload may look like, doesn’t matter. You can just get it with the right context into the object storage. And then from there, we’re able to govern this fully through its lifecycle in the cloud. For the edge governance and how some of that kind of, it’s really case by case.

There hasn’t, in my knowledge, maybe you all can help me, but I haven’t seen a full end-to-end governance at the edge. There’s certainly a lot of great industrial data ops solutions coming to market that are very smart on the AI ML and UNS and bringing all this. But I think that governance is something that maybe it gets overlooked. It’s been overlooked in the cloud layer for a long time.

David (16:03)
No.

David Rogers (16:21)
it’s going to take a little bit before it makes it to the edge. And someone who’s very smart can figure out the, whether it’s a company or some people kind of figure out what that governance framework looks like for an edge environment or for a factory specifically environment and start to get the community to rally around that. And certainly with Unicatalog being open source, you can connect into that at the cloud layer.

David (16:47)
Yeah, maybe I know you have a question to ask, but I just want to say one thing to David. You mentioned it’s only recently being adopted on the IT cloud sites.

Willem (16:47)
Okay.

Yeah.

David (17:02)
Isn’t there, it’s maybe one of the reasons in OT we see a lot of difference, difference between lines, difference between assets, difference between age of those assets. I would say in general, seems to be rather difficult.

David Rogers (17:21)
Yes, absolutely. You have equipment from many, many different vendors across the ecosystem. right there, their focus is on producing that, like that mechanical electrical process, right? Like it’s very, it needs to be very, the equipment itself needs to be very, very good at that. The kind of way it governs data isn’t, it’s probably an afterthought. I haven’t worked at a machine builder myself, so I can’t speak.

fully for that. But my guess is it’s probably more of an afterthought and kind of that earlier question about maybe what sets manufacturing apart from from other businesses within Databricks is that there is this just ability to, know, when you’re when you’re dealing with the OT or the kind of non cloud data sets, you have to really think about, you know, bringing that governance as soon as possible to the data.

David (17:53)
Yeah.

David Rogers (18:17)
And when you live all in the cloud and you can govern all of this from the moment it’s originated, because the origination happens and it’s brought into the object storage. so you have that full origination for the physical world of manufacturing. We can add to our lineage graph the additional things that were happening on the OT side or things outside the cloud. So that’s something pretty unique about our technology.

And then that you can also federate down to your on-prem systems as well. there’s a lot of unique capabilities here at Databricks on how we can do this. And we can govern not just the data in the cloud, but also the connections to various systems that may be living in your factory environment.

Willem (19:09)
How would you define that data governance? Because I can have a clear idea what somebody means when they say data lineage. Okay, I can understand that. It’s easy to explain. Data governance sounds a bit fluffy. How would it look like in practice? And you even mentioned federated governance. So is there a central governance, a decentral governance? Have you seen some examples where it works?

David Rogers (19:34)
Yeah, absolutely. This is something I work with the largest enterprises, largest manufacturers on earth on. And it’s a very difficult problem for some of the earlier governance technologies. And now with Unicatalog, you’re really able to orchestrate this. When you say it, like, what is governance? There’s many core elements to it. I think the number one is access control. So you have a lot of times manufacturers are working with

suppliers, outside vendors, customers, have PII data, certain contractual requirements on how to use data, and you have to respect all those. And it’s very hard to prove that if you’re not using a technology like Unicalloc. So that governance can prove, it’s provable, that this person accessed it, it follows that zero trust, you’re able to audit every single access to it, you’re able to

Audits, all the permissions granted on every single piece of data. And when I say data, I don’t just mean like a table. I mean, images. mean, you know, time series data. mean, machine learning models themselves. All of that, that’s also data. And we can govern all of that through the same kind of access permission hierarchy. And then you want to be able to share data very securely. You want to be able to…

drive, again, I mentioned the auditing, like all these different kinds of core capabilities will prove to your partners, vendors, customers. The thing I was going to go into was masking on the PII. So you have sensitive data, you have data sovereignty. You want to make sure data resides in the specific regions it was generated in rules and regulations in certain regions of the world. So all of these sorts of things, that’s what I mean when I say governance. You need catalogs able to address every single one of those.

David (21:14)
Mm-hmm.

David Rogers (21:26)
So the right piece of information and right piece of data only is seen by the right set of eyes and is provable that it was only seen by that set of eyes.

David (21:35)
That’s certainly not an easy problem to solve. I’d like to take this, I would say, one step further into scaling, scaling of, well, the thing people used to call big data, maybe, I don’t know. But anyways, moving beyond a proof of concept. So David, if you allow me, I’m going to play the devil’s advocate here. There is this consensus, or at least a lot of people say like, yeah, we now understand that just

dumping data, all the data we have in the cloud, it’s probably not the best idea. On the other hand, other people used to say or still say like, but we still need some kind of a central data repository in order to bring all these data products to our end users. So I’d love to understand also how you tackle the big data challenge at Databricks.

David Rogers (22:32)
So I think the biggest is going back to my earlier point about being very pragmatic about it. What I love about working with manufacturers is like they have problems to solve that cost millions and sometimes billions of dollars. And so you want to be very pragmatic about it. so when I’m advising our customers, I’m really helping them think about what are their core production processes.

many manufacturers specialize in a very, very specific ways. They have an intellectual property around that. Like you’re going to want to collect all the data for those because that is core to your business. And so from there, then as, as you’re optimizing, improving that performance of that process, you have all the right data available. Now, if you have like non-core processes that aren’t contributing to, to defects or aren’t contributing to downtime or whatever, it’s equipment that like,

It’s not, not super costly, not super impactful to your business. Like you probably don’t need to collect that data, right? It’s not, but a lot of times you don’t know that you, that that machine was critical. So a lot of times people skip that step of like really what is really being very thoughtful on their strategy around what data is like, what machinery, what processes are critical to their business. And then what are the ones that if the upstream isn’t also collected.

then they won’t be able to understand why the downstream. So it’s like, we hear a lot about, know, feedstock optimization and things where starting to get like very deep here now of like where, you know, if you have different feedstocks from different suppliers, you want to collect all that quality information from your suppliers. The only way to do that today, very scalably is through Databricks, like Delta sharing, uni catalog, able to share that data at scale. Now you’re able to bring that into your QA processes to then go into say,

doing some value added operation on top of that feedstock to then produce some other either finished good or intermediate good. And then now you have enough data to say, okay, if I get this set of feedstock, I need to set the controls to X, Y, and Z to then produce this type of this amount of yield at the other end. And you’re able to have a machine learning model kind of getting to the AI to say, yep, we can, AI can now understand cause effect across feedstock plus machine machinery controls to outcomes.

And that’s how you kind of piece together the puzzle. And it only gets done when you are thinking holistically about how you make your product and collecting the data for each kind of key input or set of parameters to then do it. So don’t make a data swamp, but be thoughtful about the way your business works. usually the easiest way is honestly when you’re creating a new process.

Now, like, especially in the automotive industry, there’s a lot of things around batteries. If you want to think about how that battery may, it’s changing rapidly. There’s all of these different experimentations with chemistries and, and you name it, but there are struggles with yield and things like that and, and, and cases. So think about when you’re ramping that production and how you’re instrumenting the data to then understand the core kind of processes to make that battery. So you can help, you’re going to, you’re going to improve your.

David (25:34)
Yeah.

David Rogers (25:51)
commissioning, reduce your commissioning time so you get a faster time to yield. And then you’ll be able to understand it as you tinker the process and improve it along the way. So it’s like new processes, new, new product introduction, new processes are great opportunities to think through a data first mindset, because that’s going to allow you to be, have a competitive advantage and how you bring that product to market.

Willem (26:15)
Another aspect of scaling, not so much on the data side. I’m often wondering what makes most sense in your experience. So what are you, where do you see most value? Is it those very specific analysis of the physical reality combined with the right techniques and the right data? But of course, very niche, maybe even on one production line or

Do you see more value in deployment of more generalized solutions that might not be the perfect fit, but might be just good enough that are easily rolled out to, I don’t know, whole company? How do you see the balance between those two? Where do you get most often called for?

David Rogers (27:02)
I think it’s important to think about the problem in terms of organizational capability. You want to be able to have a repeatable way to solve whatever your problems may be. I think, yes, you can start with high value ones. You can start with low value, but easy to implement. There’s all these different ways you can think about it. But if you don’t have the core, what I really think is important right now in 2025 here is these digital technologies have been doing this for a decade now.

These digital technologies are real, tried, tried and true. And technologies like Databricks, in particular, like we have manufacturers operating at enormous scale on the Databricks platform. And that’s where they have, they figured out this organizational capability aspect. And so for us, we’re guiding them on here, here are the opportunities we’re seeing that are of the highest value, but it’s really important for them to know how to deliver on that repeatably.

whether that’s through their own workforce and there, or through scaling through their partners of choice to help them implement. think it’s all really up to, we want to make sure we have the technology platform and substrate that allows them to solve whatever problem they have that can be solved with data and AI. So that’s kind of where I’ll leave it on that topic. Like it’s such a personal conversation when we go in on like…

what the client is working for or what their problems may be, that I’m very much kind of back to that pragmatic, like, let’s make sure we’re solving a problem to build that larger organizational capability. And that will be a repeatable way for them to adopt, of course, the Databricks technology, but also down into their OT technologies of how they need to do this moving forward to be successful.

David (28:58)
from I would say the end user perspective, so not from the data brick side, but from the end user side, what are, I would say the capabilities an end user should have in their organizations in order to make these projects successful. Because that’s also something we read often is that lots of transformation projects in general, not only AI, but…

tend to fail and I believe it’s mostly not because of the technology, that’s mostly about how an organization behaves. So what should an organization be capable of?

David Rogers (29:38)
Yeah, I think a lot about finding your kind of core team that’s going to be able to go to the factory and go to the, we’ll say IT world, right? It’s like, I know you all work on the ITOT convergence, right? It’s really a communication problem. And we can solve that in the Databricks context, like in the cloud layer, right? Like we have collaboration tools. can…

You can collaborate all across the world on the same exact notebook and same exact data through the governance layer, all that good stuff, right? But nothing replaces like showing up where the value is created on the factory floor. So I think a lot of the Toyota production system type principles of like go where the value is created being a part of that manufacturing community. And then vice versa, the manufacturing computers, the OT kind of community also kind of.

recognizing the talents of the IT world. So it’s like a, it’s just like sometimes that just breaks down way too often. And like, that’s the number one thing. And every time I’ve given that advice, and then I really see a senior leader within a manufacturing enterprise, kind of take it to heart and actually go operationalize that. I’ve seen rampant success a year later, right? It’s like every single time, but it’s like, don’t try to.

Don’t try and say everything, we’re going to solve all problems. We’re going to scale this out to our factory network of hundreds of plants. Like be very thoughtful about, this, like back to like, how do we make money as a manufacturer? Where are our core processes? What do we have to be the best at to remain competitive and start there, get your team together. And then you can scale that across the network. But if you, if you try to like boil the ocean, so to speak,

then you will, you will struggle and you’ll, you’ll end up in that kind of failure category. So that’s, that’s how I do it. And then it becomes the path of least resistance. So after you kind of, after people see like, Hey, this factory was able to think about it this way and they are able to have success. And it was actually enjoyable, not like such a burden. Then they start to then other factories start, you know, get that cross pollination to your network across your network to then.

scale your team that started on that first one, bring them to the next one, bring them to the next one. And slowly but surely you’re building that new capability, this new digital AI kind of capability within your OT environment. And your IT folks get a much better appreciation for the challenges that go on in the factory environment. I think that’s often missed. And so find your IT leaders, your talented data scientists, data engineers that are willing to go to the factory.

and empower them to go do that, reward them for going to do that. That’s going to be huge for them connecting, again, where the value is created. It’s not created necessarily in the IT world, sometimes with service offerings and other things, but a lot of times it’s actually created in the factory, so you need to connect the business back into that place.

David (32:48)
Do you dare to make a crystal ball statement? What is going to happen in 2025? Is the domain maturing? It will be about data ops. Any thoughts?

David Rogers (33:01)
I’m not one for too many predictions. think I’m very much a, the AI world is moving fast enough, keeping up with that. I think it’s really this growing community of practitioners that are just converging the two worlds. I think that continues to mature. I think we’re at Hanover, like the digital halls probably will be.

David (33:05)
Ha

Yeah.

David Rogers (33:26)
bursting

out the doors, Like last year it was already jam-packed. I’m sure this year, like, I mean, there may as well be digital technologies on the sidewalk, right? It’s gonna be all over again. And so really trying to figure out, you know, this community and bringing it together to continue the momentum we have. That’s what I’ll say, I think is going to keep happening. And through publications like yours, it’s really going to just continue to,

build that community and showcase, like, here’s how you actually do it. And then if it starts to become, you know, just how we do business in manufacturing moving forward.

David (34:07)
even on the sidewalk, will be visiting your boot or in a couple of weeks. Hey, David, I think that’s a wrap for this super insightful episode of the ITO, the insider podcast where we again explored how to make industrial data work for us with a specific focus on AI. So David Rogers, thank you so much for joining us.

David Rogers (34:11)
Yeah, yeah, yeah.

Thanks for having me. Look forward to seeing everyone in Hanover. We’ll be over in the Microsoft and AWS booths. They’re big partners of ours. And we’ll also be out with SAP and Litmus on stage speaking about our partnerships with both of them. So really looking forward to seeing you all there. And thanks for having me.

David (34:54)
Thank you so much. And to our listeners, you for tuning in. If you enjoyed the conversation, don’t forget to subscribe at itotinsider.com and see you next time for more insights in bringing IT and OT together. Until then, take care. Bye-bye.