AI governance: Building smarter AI agents from the fundamentals, part 4 Artwork

The AI Fundamentalists

A podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses.

All Episodes

The AI Fundamentalists

AI governance: Building smarter AI agents from the fundamentals, part 4

July 22, 2025 • Dr. Andrew Clark & Sid Mangalik • Season 1 • Episode 34

Sid Mangalik and Andrew Clark explore the unique governance challenges of agentic AI systems, highlighting the compounding error rates, security risks, and hidden costs that organizations must address when implementing multi-step AI processes.

Show notes:

• Agentic AI systems require governance at every step: perception, reasoning, action, and learning
• Error rates compound dramatically in multi-step processes - a 90% accurate model per step becomes only 65% accurate over four steps
• Two-way information flow creates new security and confidentiality vulnerabilities. For example, targeted prompting to improve awareness comes at the cost of performance. (arXiv, May 24, 2025)
• Traditional governance approaches are insufficient for the complexity of agentic systems
• Organizations must implement granular monitoring, logging, and validation for each component
• Human-in-the-loop oversight is not a substitute for robust governance frameworks
• The true cost of agentic systems includes governance overhead, monitoring tools, and human expertise

Make sure you check out Part 1: Mechanism design, Part 2: Utility functions, and Part 3: Linear programming. If you're building agentic AI systems, we'd love to hear your questions and experiences. Contact us.

What we're reading:

We took reading "break" this episode to celebrate Sid! This month, he successfully defended his Ph.D. Thesis on "Psychological Health and Belief Measurement at Scale Through Language." Say congrats!>>

What did you think? Let us know.

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

LinkedIn - Episode summaries, shares of cited articles, and more.
YouTube - Was it something that we said? Good. Share your favorite quotes.
Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

Speaker 1: 0:03

The AI Fundamentalists a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, andrew Clark and Sid Mongolek. Welcome everybody to part four in our series about agentic AI building it from the ground up the right way. Before we begin the topic, though, I want to make sure that we celebrate today on this recording of this episode. Sid Mongolic is now a PhD. Yay.

Speaker 2: 0:42

Thank you. Thank you, it's all very exciting yay, thank you.

Speaker 1: 0:48

Thank you, it's all very exciting. Yeah, I was before the call. I was asking you like how does it feel to like defend, and then what is it feeling?

Speaker 2: 0:53

like after that? I think that if you ask most academics, they'll say that the work never stops, and I think that I've accumulated enough debt to my PhD that I'm going to be working for them for another. You know, good year now but, it's feeling a lot better and I think there's a lot of stress taken off being out of the uncertainty of it after defending.

Speaker 1: 1:15

Yeah, I can imagine. So that's, that's awesome work. I know that you know you have been helping our listeners, probably over the past two years really, you know, in helping them understand the networkings of ai and natural through natural language processing and all the research that you've done. Um, we know we've benefited greatly and you know for everybody here you know trying to do better and really adopt better practices for the modeling systems that are going to carry their businesses forward. Um, we're all a benefactor and appreciative of your work.

Speaker 2: 1:47

It's been a lot of fun too, and if anyone's headed to ACL this year that's an NLP fan, Let me know I'll be. I'll be there.

Speaker 3: 1:56

Awesome, definitely get to use some swag to take or something, some fundamentalist swag.

Speaker 2: 2:01

Oh, that's a good idea.

Speaker 3: 2:02

Ooh, yeah, I like that Also. I'd love the listeners to respond back. I think it'd be a great idea for a podcast episode of Sid's game for walking through some of his PhD work. He has a lot of great contributions, like what you'll be doing at ACLI, as an example, so I think that'd be a great episode too. I'd love to get some.

Speaker 1: 2:23

Let's do it Well, sid. Let's not delay this anymore for everyone. We've had a lot of information and ground to cover, so let's get to the wrap up.

Speaker 2: 2:34

Awesome. So here we are in our final chapter of our Agentic AI section seminar block, and now we're going to be talking about governing these models. And what's the difference between governing this travel agent model that we've been building for the last few episodes versus what you're seeing now in the market these LLM-based models? So just to give a bird's eye view. At its core, when we're talking about governance, we really want to think about how we want to challenge the assumptions of our models and address the limitations of these models, and this is going to require us to do a lot of the standard model development, lifecycle monitoring and documentation that we do, and also enumerating all of our IT and security considerations. So think of governance as basically the full package of making sure that our model does what we said it did when we set out to build this model and that it will continue to do what we expect it to do in the future, definitely.

Speaker 3: 3:34

And that's what I'm very excited to be providing soon the full wrap up of really getting all the different pieces together from this fun journey we've been on of like the how revisiting how we get back into mechanism design, the how revisiting how um we get back into mechanism design and our how our utility system fits in and we'll also be providing a mathematical specification around that. But really wanted to be focusing on the. There are unique governance considerations on top of the bird's eye view that sid was mentioning um and I think we should also do a deeper dive in another podcast, revisiting or visiting for the first time some of these like critical um blocking and tackling governance things. So like we're not wanting to trivialize the actual amount of work you normally want to be doing and it's a lot of really great things to help make your modeling systems better and it's a lot of work for doing the model development life cycle and it and security. So by no means are we we saying that those items aren't aren't super critical and super exciting topics in themselves. But today we want to be wrapping up with because the agentic multi-step series we've been on the additional governance considerations on top of the normal.

Speaker 3: 4:36

So with agentic, because you're now saying, instead of just one-step systems, almost all the systems we've talked about up until this point on the podcast are one-step. It's predictive, it's even generative, is one step. These are all like one step processes. But now we're increasing the service area immensely by having the multi-step work with agentics so that even has in the same principle based buckets of, you know, fairness and performance and security. It gets a lot more nuanced and a lot of additional. You know, think of an ice break, A lot of the additional things you might already be aware of, which, of the MDLC and things we're now making much more complicated and intertwined as we're adding these additional states into the system.

Speaker 2: 5:18

And so let's tackle the first big question, which is basically how is governance expectations different for agentic AI? If you're pitching me that these are intelligent systems that are fully autonomous, that can learn, adapt and act on their own, well, do we even need to govern them then? And the answer is obviously yes. If you need to have an example, just imagine that humans need governance too, and humans are about as autonomous as we can get. So if we take humans to be the bare minimum of what governance looks like, let's then build on top of that to think about some of the specific limitations of agentic AI systems that we need to actually catalog.

Speaker 2: 5:59

So we're going to go through performance, security, agency confidentiality and cost. On the performance front, when we're looking at models, agentic AI systems have to do multiple steps and multiple modelings through a decision-making process. These errors can then compound. So even a really good model which is 90% accurate on each individual step in its process over a four-step process, becomes a model that's only correct 65% of the time. So these are some of the considerations that we're looking at when governing AI systems is making sure that we capture the performance of them through the entire lifecycle and then reporting final results through the entire life cycle and then reporting final results.

Speaker 3: 6:45

And it's not just us saying this as well. This is what the academic studies are showing as well, and even if you go to OpenAI's website or Anthropix, they're even showing. You see the here's the single-step accuracy and here's the multi-step accuracy and topic for another day of how some of these systems are overfitting on the benchmarks but another topic for another day. But you'll see the performance drop off significantly because of these multi-steps. So, as we've started with this whole series, we're still questioning do you always want an LLM as the reasoning component in a genetic? But what we're very much suggesting here with some of these thought processes and we'll get into some of the more risks and the governance attributes in a little bit where it falls down even more is why would you be putting that 90% accuracy on just data ingestion versus and the state of like hey, I'm providing you my credit card number now Like that's a lot of potential for errors in performance by dropping a letter or sorry, a number or something, also with all the security and other issues that you might have with that.

Speaker 3: 7:47

So it goes back to well, why don't we have some of those steps, a solid UI in some of these areas to reduce that accuracy rate Because, as Sid mentioned, nine times, nine times nine times nine times nine is no longer 0.9, right, it is now 0.65. So it's another thing to consider. Just very simply like to forget any of our biases about. Is it the right tool for the job? Just literally just diagramming out when you talk about some of these performance ratios, by the number of steps you have, it has a even at like. Honestly, I think it's more of an exponential increase in error rate. But just to be fair, saying it's a linear increase in error rate, you're still in a potentially not ideal position.

Speaker 2: 8:29

Absolutely, and you know, I think you keyed up nicely into our next two concerns, which are security and confidentiality.

Speaker 2: 8:37

Using these LLMs as I'm sure we're all aware, to do our agentic work for us ends up putting us back in the situation we were talking about when we were all first getting chat CPT, which is hey, don't put client code in here, client code in here.

Speaker 2: 8:50

Hey, don't like, put your personal, important emails through the system. You now have opened up a new potential security vector, so how are you doing your due diligence and your governance to make sure that you're not leaking any of your information out? Likewise, you are now, by using these agents, having to do the opposite case, which is, clients are going to use their model and someone's going to put their address in there. Someone's going to put their credit card number in there. Someone's going to put their address in there, someone's going to put their credit card number in there. Someone's going to put their social security number in there. Do you have the standards for data maintenance, cleansing and automation to handle this type of data being put into your database? This is now in text given to you and how you make sure that you no longer are a vector for releasing that data.

Speaker 3: 9:33

And this is a great point as well.

Speaker 3: 9:35

These are like the second order considerations because, again, when we talked about like the framing of, traditionally, we have single step systems so you had to worry about.

Speaker 3: 9:43

You made to make sure your static database, which maybe had consumer data coming back in, was maintained and well done, but it was essentially a one-way transaction. You're making predictions on the information. It's very much take data in prediction, take data in prediction. Now we're having these now feedback loops of data now commingling and sitting in here and you have like a two-way street essentially for some of the security information, versus always a one-way, like a historical XGBoost or even a deep neural network multi-layer. It only shoots information one direction. It takes in specific information, makes a decision based off how it was trained, puts a result, but it's a one-way transaction versus now with these systems, you're now making it two ways Again increasing the potential for security and error rates and again not even saying there is security ramifications, like Sid was mentioning, with these GPT-based systems that are much different than even if it's a more basic system, but forget any of those additional security vulnerability areas. We're talking a two-way street versus a one-way street when it comes to security.

Speaker 2: 10:42

And then that leads us into the question of cost, and I think that we often brush over cost because we feel that, well, this stuff is like pennies on pennies to run, but these costs absolutely compound, especially in agentic-based models, where we need to use this model over and over and over again to do quote-unquote reasoning, repeated step processing, refinement of prompts and questions to get our expected solutions. And so, as part of your governance journey, do you have mechanisms for capturing the true cost of running these models and do you have guardrails in place to prevent overspending? Can a client talk to your travel agent built with an LLM, and talk to it about you know what kind of car to buy and then end up using your credits? Do you have mechanisms to make sure that your model is being used how you expect it to be used? Have mechanisms to make sure that?

Speaker 3: 11:33

your model is being used, how you expect it to be used. And then there's the other. One of the things very near and dear to our hearts here on on this podcast is you need to start baking in the governance cost as well, right, so, like in human cost, in like the technology, like even if you have additionally, you're adding, say, you do want to be careful with your, with the, with your pi information, so you can add something like aws, guardrails if you're using Bedrock or something like that. Right, so there's additional costs on top of some of these other systems as well. But the increased surface area like we're now in the two-way street we have the increased performance, so you're going to have to have more validation, more specific monitoring, more objective review, like all of this other non-tangible outside of like Sid is mentioning very direct costs on the variable costs of these systems cost money to run in. Stop, that's very direct. But then you have the second order additional surface, the additional surface area, governance, man hours, expertise, um, additional software to help keep it in line, like all of those other costs need to be factored in as well. Um, definitely my economics bent coming back here. Also, I have an underground accounting, so that's coming back here too.

Speaker 3: 12:33

But, like all the additional monies that aren't being talked about of like you really have to, as we do this horse racing, you have to also consider holistically when you're telling, hey, can we use a utility-based function like we've talked about? I of course have a bias that everybody knows about that. But the cost has to come in effect too. So, say, depending on what your application is. Say, say, for sake of argument, the utility-based system is 80% accurate. But it's 80% accurate because that's the only part. The other part's deterministic. It's going to always get the credit card right, it's going to always get the other parts right, but the inside is 80% versus 90%. Still, technically we're higher accuracy than 65%. But in any case, how much is that 10% worth to you?

Speaker 3: 13:10

If, say, it is 90 to 80, for sake of argument, how much is that worth? And have you factored in all of those fixed and variable costs on? The utility thing is basically free to run and instantaneous. You can run it on a small, you know serverless environment in a lambda function or something super trivial, and it does still very high accuracy. So sometimes it could be worth it. But you have to be considering all of these second order, third order, fourth order costing as part of it as well, and I think that's often brushed over like ooh, it's cool, it's AI, we want to use it versus considering even the just naive costs. Don't even think about the governance or the risks.

Speaker 2: 13:48

That's right. And even if you get this cost right, you're still left with the question of does your model do what you told it to do, which is the last piece of governance here, which is basically what you do when you first set out your project and you first make your business case. Can the model accomplish the task you want to accomplish, and have you confirmed or have you done any work to see that your model is actually aligned with completing your task?

Speaker 1: 14:15

You can very easily end up with a model that gives clients what they want on paper, but then that doesn't actually map onto any useful reality. You can almost turn that as the cost of not doing. Governance is not knowing any of those steps going in or how to measure them as you're building or even in pre-deployment. Is that right?

Speaker 2: 14:32

Yeah, absolutely. I think that's part of the cost of oh well, we don't want to do this the hard way and so you'll pay the cost in the agent error.

Speaker 3: 14:44

And two additional areas I think we should briefly touch on as well. And again, the validations on the agentic systems are more expensive and the fit for it's more complex to do those objective validations which we need to do, as Susan susan, you mentioned. Two other areas we want to quickly touch on is uh, remember the start of this whole series. We had a dr michael zargham on um. He was talking about that principal agent problem as an example where you're having to now also used to be make a prediction or not, right that one way naive type. No matter how complex the system, it, it's a very simple, like one step process. Now we are providing our agency as an individual to a system to make decisions on our behalf. So you have that whole agency consideration, all the ethical implications and also like the performance things Sid's mentioning on top of that as well. And then another thing that we're seeing a lot with. We briefly mentioned security, but also these systems have a lot of security vulnerabilities and easy to hijack and jailbreak and things like that, which is worth considering. A lot of research there on misalignment and emergence and things. And then, finally, confidentiality is something that is proving very troublesome for these agentic systems, which is LLMs, don't have currently the ability, this current generation. Well, I have thoughts. I'm not reasoning. That's a whole other conversation. Maybe that's another future podcast, but in any case, confidentiality is being proven.

Speaker 3: 16:02

They don't know what data, like Sid mentioned the credit card information or the social security number. The system is having a hard time. Your organization might have very good data classification standards and you know which environments you can be using, types of information and things. However, in the studies that researchers are doing and this is becoming it's rearing its head more and more in the agentic world is the system doesn't know when it's okay to provide that information or not. It doesn't know when it can provide SID my social security number or not, provide Sid my social security number or not. And again it comes back to the agency. Have you given it the right to do that? But then, just maintaining those state variables and knowing the data classification standards of this data can never be used in this context, but can be used this context these systems.

Speaker 3: 16:44

First off, they were never built for this. If you go back to the original transformer papers from Google research and then open AI back when openAI was open, these systems weren't built for this type of thing. So, first off, you're trying to adapt the system that wasn't built for that use case, so, no matter how good it is what it was built for and then having to line up these additional burdens of the classifications and things these are all these other things that you don't just intuitively hey, I'm going to have this multi-step system make decisions for me are those areas you need to look at. So, even if you're wanting to use an LLM for some of the reasoning components of these systems, you really need to thoughtfully consider. Are you sure you don't want to be using deterministic programming wrappers around some of these more confidentiality and security and agency considerations?

Speaker 2: 17:33

and security and agency considerations. And I think that leaves us with the final bitter pill of trying to govern agentic models, which is every single task along the journey. Every single subtask needs its own governance. It's not sufficient to use a single governance and a single performance metric to encapsulate the entire model. If your project has sub-modules, each of these modules must be borne out and proven and evaluated. It's typically not possible to make wide, large, complex modeling system statements. It's often more realistic to grade and evaluate individual pieces of these agentic models or tasks that these models complete and then create an aggregate assessment of the model.

Speaker 3: 18:20

Great point, sid. So, like Susan, you mentioned the increased need for, like those, we're huge advocates of it here at Monitor. But there's objective, independent stress testing of your modeling system. So you will still have that end-to-end testing if you will. I'm going to use like engineering terminology here, like integration tests. That's an integration test, an adversarial integration test, but we're trying to have. Additionally, we'll have conceptual review and and things like that. But we need to have that.

Speaker 3: 18:44

Does the whole system work, which is currently how the majority of agentic valuations and benchmarks are really based on that. Like I put thing at the top, this thing come out the bottom and we already know that's at like 65% at best. We're being very generous on those four-step process. That's very important. But, as Sid is mentioning, when we're really talking about governing these things in production to mitigate more of the confidentiality risks and these bigger risks that you can't just say, yeah, it kind of it works well enough when I do end to end, you really need to start taking taking unit tests of the input and output layers before it passes into the next step. So there's also we've talked about this several times, we probably want to do it again on the podcast of traditional agentic, state-based systems like dynamical systems, and you have those different state-based. I remember when we had the astrophysicists come on and talk about this and things. There's traditional multi-step processes. You have that. You can evaluate each individual step.

Speaker 3: 19:36

But you're now adding again these LLMs aren't normally interpreted. You can't understand the states of the system very well. There's not really a good way for tracking them. But what you really need is be having at those different tiers to help have those screeners. You're not allowed to move it to the next stage. You really need to have unit tests at the.

Speaker 3: 19:52

Did I even import the information correctly or not? If not, stop here and restart and you need kind of that governance layer on like how accurate am I? And that monitoring layer how accurate am I? Am I capturing the information correctly? Read out on the state for me.

Speaker 3: 20:05

Then if you pass, then you go to the reasoning component and then, if you pass, that it still makes sense you insert someone else's credit card number in here. Then we can go to the execution and things like that. So you have to add these like filter layers essentially, if you will, and then it kind of comes back to well, if we have to do all this on the governance side of the house and because the systems inherently aren't going to be thinking about these middle stages, why do we use it in the first place, which is what this whole series was, where there's other alternatives in there. But in any case, these are just the concerns you need to be aware about and you might decide that the reasoning component in the middle you really do want to have an LLM, but it also begs the question maybe you should have a really slick UI ingesting the social security numbers and things. You can make sure that that stuff is tight. If not, you're going to have to have these increased monitoring layers in the middle.

Speaker 2: 20:53

And when you're thinking about these monitoring layers and you're thinking, well, okay, you're telling me to do every step of the process. What are some of the steps I might expect to have to do? I would propose that there are four major types of steps that every time your model does these things, you have a governable event. Every time your model or agent perceives the world which could just be a text input it could be something coming in from a camera Anytime your model reasons, makes a decision, thinks about the world, evaluates the current state. Anytime your model acts in the world, anytime it executes a function, anytime it buys a plane ticket, anytime it books a hotel room, and anytime it updates its priors or learns. Anytime it has done this perceived reason act cycle and has decided okay, this is what I know about the current state of the world. Let me integrate that knowledge and make new decisions based on that. That's a governable event. So I would say that perceived reason, act and learn are a pretty good, reasonable way of knowing where these filtering steps need to occur.

Speaker 3: 22:00

Fully agree and that really lines up nicely with some of the definitions you have from NVIDIA and IBM and other folks about what are the stages of an agentic system. And that's also where we got the four steps of setting out there. But we're essentially saying, to really responsibly do governance or responsibly use these systems, you essentially have to break it up into the four steps, so it can't really be a complete end-to-end solution. You are really building four discrete systems in there. So then that also opens you up a little bit more for like. So we're back to just really great SaaS platforms around the input output and then you're just updating the database with priors and then you just have some sort of a modeling paradigm in the middle.

Speaker 2: 22:43

And so if we decide to do this the correct way and we have these kinds of models that actually have definable, understandable steps and modules that we can then govern, we might then ask ourselves some additional questions which we wouldn't ask ourselves for the LLM model, not because we shouldn't, but because we almost fundamentally can't. Questions like understandability Can you interpret what this model has done and act on that and, if you need to, during an audit situation, share that information? So interpretability, explainability these are major pieces of governance expectations that you'll be met with and you will likely fail these on an LLM test but are actually feasibly capable of doing in a more traditional state-based agent. And to Andrew's point earlier, with these slick UIs for SSNs or handling confidential information outside of a reasoning LLM, you would then be expected to do compliance correctly. You can tell an LLM and a prompt to do things correctly. You will find that it will sometimes get it right and even when it does get it right, you're usually going to take a hit on performance while it's doing it.

Speaker 3: 23:59

Maybe we should have even started with some of these in this walking through of like why did we come to the opinion that you should be using a simpler system? Like some of the trouble spots and this is hopefully this episode really helps to maybe explain our thought process a little better about where. Why is this troublesome? Why is this not necessarily a great idea? Because a lot of the market right now it's like hey, just you're gonna kind of more is a lot. Throw more data at the problem, throw more compute at the problem and it'll eventually just figure itself out.

Speaker 3: 24:33

I don't personally believe that with the current paradigms we have is going to be able to work out because of a lot of these considerations we're having here and based on the current research again, we might have systems that can do this.

Speaker 3: 24:40

It's the current GPT-based systems we have right now struggles in these areas. Maybe they can have some hot wire things around it, Maybe they can build some additional things, but then it comes down to, like we talked about and Sid mentioned earlier, the cost. What is the cost here versus just buckling down and building the system? Even you can use AI-assisted tools to help build the system, but building the system in a more responsible way that allows us to better track these areas. Because if you're using the deterministic programming for the inputs and outputs, you don't even have to do this type of because you know 2 plus 2 always has to equal 4, you don't have to do as robust validations outside of standard unit tests. It will remember the state properly. As long as you have the database secured and locked down, you don't have to worry about the social security numbers getting all mixed up or being in the wrong spot or anything. It's just you know that that is secure and reliable and we have many, many, many years of experience doing that at scale reliably.

Speaker 2: 25:38

And so now you're at the stage where you've identified where you need to do your governance, and now you're left with the question of well, how do I actually do that? And so I would task you with some of the following action items. Make sure to log any transaction done by the model. That includes, you know, the client asking a question, the model returning a response. Anytime the model does an action through a tool, like a tool former, that needs to be logged as well.

Speaker 2: 26:02

Keeping robust logs of all inputs, outputs and reasoning used by the model is an essential action item. You're also then going to be expected to, with this log, monitor this log. Make sure you're checking it on a regular basis we typically see weekly to monthly being a pretty reasonable frequency, and we're looking for things like drift. We're looking for are the clients asking about new things the model wasn't expected to do before? Is the model giving new outputs that we haven't asked it to give? And making sure that you're staying on track with your model? You'll also be expected to, having done this, uh, in the beginning, but then also at the end, enumerate any new assumptions or limitations you've identified with the model and make that very clear to business owners, stakeholders and, if relevant, clients as well. And, along with your monitoring for drift, also make sure that you're monitoring the performance of your model, make sure that it's doing well over time and, if the performance seems to be the result of stale data, potentially retraining your model be the result of stale data, potentially retraining your model.

Speaker 1: 27:14

One question throughout this conversation about governance that comes to mind is that you mentioned earlier, when we got to agentic, we were going through the good steps of governance. Then we got to agentic and you started talking about every single step in agentic needing governance, and that brought to mind the phenomenon of human in the loop as a governance method for AI. Can you put that together for me?

Speaker 3: 27:37

Yeah, I said I'm really interested in your thoughts as well. I think this could be a great future episode of just specifically digging into human in the loop. Maybe that's our next episode. Love to hear it from the any any listeners what they'd like as well. But I think it's a little bit of a fallacy and a crutch at the moment that people are kind of using as a hey, I don't have to do as much governance because I have a human in the loop. But my challenge to that would be how many times do people really not listen to the system? And especially if you're using it in mission critical areas like claims processing, pricing, health care decisions, um, like I don't know, I don't know if they're using it here like faa decisions about like airplane, like just anywhere you can think of it that there's like critical applications of ai, how many people would be like overruling the system or your human bias is going to be. I want to trust what the system tells me and then you have to really like.

Speaker 3: 28:30

There's actually like a story of this happening in the in the cold war, even of um. Once there's like a russian missile side, missions, missile side. I think they saw that the us they thought their lights went off as a new system, thought the us had launched a bunch of missiles to russia. They hadn't, and thankfully someone overruled that that didn't actually happen and they second-guessed the system. But in these mission-critical areas humans have the tendency to just follow the system You're supposed to and your bosses just follow the system, like there's that whole system around, follow the system, trust the system and then, from a governance side, a lot of companies are like well, I don't want to have to do governance as highly, I don't want to have to stress test it, I don't want to have to do all of these additional checks for these systems. So I'm just going to say I have a human in the loop and the human has to take the final decision.

Speaker 3: 29:14

So, but from like a, from a legal perspective, you'll still be liable if this system goes awry. You can't say, hey, this, this was discriminatory, but I had, I had a human looking at it and they went along with it. Well, from the FTC perspective, you still had a human going along, you still as a company, this thing happened right, this consumer was harmed. You're still not in a great spot. So I think it's a little bit of a misnomer right now that, hey, we can just have a human in the loop and then we don't have to actually do all of our AI governance requirements. So I want to spend some more time thinking about this, and I think this would be a great podcast for us to dig into and maybe even get some guests on. But that's my initial reactions. I'd love to hear what you both think as well.

Speaker 1: 29:56

I can't wait for us to dig into that a little bit further too, because it does raise questions about accountability. For example, an engineer can't tell someone couldn't turn in code and say, well, so it came from the AI system or the co-pilot, that engineer that responds for that code is ultimately responsible. So there's one example, there's human in the loop, governance over output as one step, but yeah, in an agentic systems like there's not going to be that opportunity, in agentic systems like there's not going to be that opportunity, and actually you want a little more stability and consistency. I would think, over the governance step and the process through that.

Speaker 2: 30:33

And accountability is still a huge open question in this field. You know there's questions about you know, if a Tesla hits you and not someone else, do you go to the line of code and then go to the engineer and say you wrote this line of code and that's why I got hit, like the the line of code, and then go to the engineer and say you wrote this line of code and that's why I got hit? The question of where does human in the loop end? Right, to what extent is the model doing anything? If you say that there's a human in the loop, you still own all of the risk. You're not able to hand any of this off to someone else.

Speaker 2: 31:02

And if you just say you have human in the loop, which means you have a human that looks at the outputs of the model and says, okay, the output looks reasonable, but doesn't check all the intermediate steps in this model, you don't have any reasonable accountability. For if a step goes wrong halfway through to say, oh well, this module was the issue, your entire model is now culpable for any mistakes that are made along the way, which I think goes to Andrew's point. People have a very natural positivity bias. They want to believe that the systems are correct and they would rather check a box and say it looks good to me than say, yeah, I don't think this looks right, this looks fishy and we need to investigate this as Sid, as you mentioned at the top of the episode, you know humans are the most autonomous beings and they're going to have their own biases.

Speaker 1: 31:54

So, um, that certainly lends itself here and I agree this is an episode that we'll dig into in the future. Uh, listeners are going to have to tune in uh and send us some thoughts ahead of time.

Speaker 3: 32:07

Well, thank you both. I've really thoroughly enjoyed this series. Going to look forward to writing up probably one of our math-heaviest blogs we've done, where we're trying to put our utility function and kind of how those components work, how it fits in the mechanism design. Have some fun writing that up. No, I mean genuinely, I'm not being sarcastic. I'll have fun writing that up. Um, no, I mean genuinely, I'm not being sarcastic. I'll have fun writing that up, but no, I think this has been a really fun series.

Speaker 3: 32:32

I like these multi-step series. I'd love to hear, uh, any listeners if this is a fun thing. I'd like to do more of these in the future, of these like really deep diving into a topic I definitely think human in the loop is is turning into something to be really great for us to explore. I don't know if that's a one-off or a full series, but thank you both for this is a. I've really enjoyed this, probably the most fun set of podcasts we've done yet, at least in my opinion. I've liked this multi-step series. Again, my bias I'm trying to call out my bias whenever it is. I'm an economist at heart, so I like to do like that when we had a lot got to play with econ here, so but I really thank you both walking through this. It's been a lot of fun.

Speaker 1: 33:09

Yeah, that's up to you guys.

Speaker 2: 33:11

This has been great. Cool, andrew, do you have any closing thoughts on agentic AI? If you had to say one little piece, what would you leave people with in this new world where everyone wants to pull up Databricks and use its agent-based modeler and have it made for them within 30 minutes and then deploy it into production?

Speaker 3: 33:29

Wow, I mean, this is a hard one for, like the what's a pithy summary of this area. I just, I think it's just go back to the. The more that we do in the AI world, I think it's the more important to just take a step back, get perspective, see how history repeats itself, read some classics, understand, like, how stuff works and also get a little bit of perspective. This is tulip mania from the night from the 1600s. We've had these different, different things all throughout history and we even had the auto ML craze a few years ago and that fizzled as well. This is a variation of that, say, like, as you're just saying, and I just there is definitely as as with the internet that came out, we have, it's a productivity enhancer and the world is much more productive than it is now. There's not mass job layoffs or replacement or human or computers doing all the things. However, it's a productivity enhancer. So I I really think that we just need to be viewing AI in the world of like it's going to make us all productive. It's going to make the world potentially better if we properly govern it. If not, it makes it worse.

Speaker 3: 34:36

But try and keep perspective on what it is, and I know it's very hard to like not get caught up in the news cycle and do things, but view it in perspective and see it as a tool. But no tool can you be giving your agency as a human. And these technologies just point blank, do not do what they're billed as right now, like it's not thinking, it's not going to take away, it's not going to become Terminator. This isn't a sci-fi film. R2d2 or C3PO I guess it is is not coming right now out of open AI in the next year, like that's not where we are. So don't give your agency to these machines in the sense of like I don't think anymore.

Speaker 3: 35:09

I just hit, go with whatever you name your tools agent and really just focus on how can you use this to be more productive or how do you fit this better, but making sure not get caught up and make sure you care about the individual, the people, and doing things responsibly and doing the hard yards and just realizing, if you zoom out in the context of history, this is one very small blip and although it feels big right now and it could be material for overall, like we had wheels were big, electricity's been big, internet has been big, ai is going to be big, but see it in the context for what it is, versus seeming like, oh my goodness. The whole world tomorrow is 100 different and we forget everything we've ever known about how to do things responsibly and learn and grow. I just think, try and keep it in perspective. That way would be my slightly long-winded thoughts, sid.

Speaker 2: 35:57

I mean, what's left to say? I mean, I'll say, yeah, I think that people feel like they might be missing a boat if they don't get this. Did I miss NFTs? Did I miss Bitcoin? Did I, you know, miss out on the latest AI trend? And people are going to do agents because they're out there, but maybe not because they're good. So, evaluate your business case and if you have that golden ticket business case that actually needs a GenTech AI, then I would say, you know, absolutely, go for it. Do the hard yards, do the proper governance and make great agents and don't feel like you need to make an agent just because everyone else is making an agent.

Speaker 1: 36:43

Thank you guys for your final thoughts and for our listeners. I hope. I hope you know I echo both what andrew and sid have said. I hope this series really helps you think about if you are doing, if you found that golden ticket item and you are planning to go the agentic route. Um, please listen to our series and, you know, let us know if you have any questions. We're really here to help you do this right, do it responsibly and be successful. Thank you for listening. Until next time.

People on this episode

The AI Fundamentalists