The AI Fundamentalists

Model documentation: Beyond model cards and system cards in AI governance

Dr. Andrew Clark & Sid Mangalik Season 1 Episode 25

What if the secret to successful AI governance lies in understanding the evolution of model documentation? In this episode, our hosts challenge the common belief that model cards marked the start of documentation in AI. We explore model documentation practices, from their crucial beginnings in fields like finance to their adaptation in Silicon Valley. Our discussion also highlights the important role of early modelers and statisticians in advocating for a complete approach that includes the entire model development lifecycle.

Show Notes

Model documentation origins and best practices (1:03)

  • Documenting a model is a comprehensive process that requires giving users and auditors clear understanding: 
    • Why was the model built? 
    • What data goes into a model? 
    • How is the model implemented? 
    • What does the model output? 


Model cards - pros and cons (7:33)

  • Model cards for model reporting, Association for Computing Machinery
  • Evolution from this research to Google's definition to today
  • How the market perceives them vs. what they are
  • Why the analogy “nutrition labels for models” needs a closer look


System cards - pros and cons (12:03)

  • To their credit, OpenAI system cards somewhat bridge the gap between proper model documentation and a model card.
  • Contains complex descriptions of evaluation methodologies along with results; extra points for reporting red-teaming results
  • Represents 3rd-party opinions of the social and ethical implications of the release of the model


Automating model documentation with generative AI (17:17)


Improving documentation for AI governance (23:11)

  • As model expert, engage from the beginning with writing the bulk of model documentation by hand.
  • The exercise of documenting your models solidifies your understanding of the model's goals, values, and methods for the business

What did you think? Let us know.

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

  • LinkedIn - Episode summaries, shares of cited articles, and more.
  • YouTube - Was it something that we said? Good. Share your favorite quotes.
  • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
Speaker 1:

The AI Fundamentalists a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, andrew Clark and Sid Mongolek. Hello everybody, welcome to today's episode of the AI Fundamentalists. Our topic today is about model documentation, and this is one of those tough topics, because keeping notes and documentation in any profession is often a dreaded task. In addition to creating and maintaining docs, there's often a taboo for model developers and their management on how, where and when the doc will be used in perpetuity and in process. Today, we're going to cover best practices in documentation, including model cards, system cards, their intentions and more. But before we get there, let's start with the origins of model documentation. Like most business practices, it solved a problem at one time. But, sid, then what? How did we start and where did we go?

Speaker 2:

Yeah, so I don't want to act like. Model documentation is like a brand new idea. I think people have been documenting stuff for a long time. People document their code, people have little internal documents, but in 2018, there was a nice archive publication called Model Cards for Model Reporting. This gets republished in 2019 in ACM and this is kind of a big paper in that it really kind of sets the precedent for how people want to think about documenting models.

Speaker 2:

This gave some light guidelines and some recommendations for how we want to do documentation, so I'll give you, you know some of the highlights here and then maybe talk a little about, like, how this influenced the space that we live in.

Speaker 2:

So the paper describes that model cards are short documents accompanying trained machine learning models, provide benchmark evaluation in a variety of conditions, such as race, geographic location, sex, skin type, disclose the context in which models are intended to be used and detail the performance evaluation procedures. This sounds pretty good and what they're hoping for is that, like, you will use this idea and create an index card worth of information and then you can hand it off to someone and say, like here's an index card like this describes my model, at least at a high level and this gets picked up by by Google most famously, I think Google has been doing a lot of model cards. They have a lot of educational resources where they advocate for the use of a model card. We then later on see groups like OpenAI do system cards, which we'll get into later and how they looked at that, you know. This is a really good start and I think this put us in a good place to at least have a discussion about this and to think about what kind of things need to be documented.

Speaker 3:

Yeah, and I think it's in the AI world. It seems kind of like at least a lot of the public perception that model cards with the start of documenting AI. I would say that's really the opposite of like best practice. I know we harp on it way too much on this podcast, but going back to like NASA and the Apollo program, the beginning of modeling and things like that, it's really scientific method driven approach of really academic papers. You know what's the assumptions you're making, what are the limitations? Like all of that type of like. It's a scientific paper type of rigor. And how do you, how are you validating that this idea is is valid and who peer reviewed it and all those things. If you really take a step back, that's really where this industry came from and how models were originally built was a scientific process. So I think that that was where it came from.

Speaker 3:

Then we started with kind of the boom of computer chips and more people doing, silicon Valley starting and all that internet boom and all of those. Those things happened. So people started just really moving fast and breaking things. Quote unquote um, so then I think they're kind of the dearth of well, nobody was doing anything and like ai was still this new thing that wasn't really regulated and, by the way, already at this time, based on the basel accords and occ and all the different in financial industries, you already had to do a lot of these processes. This stuff was not new. It was already happened, like 2010,.

Speaker 3:

The model white papers, which are essentially these academic papers that are rigorously reviewed, but in Silicon Valley, as we've talked about on this podcast, a lot like statistics and system engineering and things none of this stuff is new. However, computer science has kind of taken some of these other things. Some of it has been developed there, some of them have been borrowed, but they've kind of like reinvented everything from scratch, or at least marketing wise, right. So I think you started seeing Silicon Valley starting to realize, hey, we actually need to document something here. And they really came out with model cards, which to me, is that bare minimum, like almost junk food style model documentation, if you will.

Speaker 3:

It's very satisfying in the instant, but then it doesn't have that lasting power. So I think we went from even Silicon Valley realizing you need to have something, but it's really that's not the beginning. That's really just like the oh drat. This has gotten really unwieldy where, based on the concepts of dry like don't repeat yourself and document your codes so you can come back a year later and understand it like top of software engineering. You started seeing that software engineering coming into the modeling of like. Hey, models need to at least be documented. Like software engineering, you need to at least know what you're using with the system, because it's less intuitive than code and there wasn't as many standards.

Speaker 2:

Yeah, I think that's totally right and I think that this is a good opportunity for us to kind of go back and look at, like, what were those old modelers doing right? What are the actuaries that were doing? What are those early statisticians doing? What did they think was enough? And then we can see where model cards kind of try to address that but then don't make it all the way.

Speaker 2:

So some of the things that we really wanted to see in a holistic documentation process was kind of this story of like start to finish. How was the model built, right? So the first question you want to ask yourself is why was the model built? What is the purpose? What is the use case? Document that clearly document that as soon as you start developing the model, revise it at the end once you finish the model and you've said, like, well, it's actually a little bit more narrow than we planned it for. Then there's the intermediate step of what went into the model right. This is extremely relevant for regulated markets where we need to know what type of data were you collecting and how was it used.

Speaker 2:

Then we have implementation details, right. This is usually rather high level. Like you know, we use this model architecture. We used a transformer, we used a decision tree, we used a gradient boosting. And then a comparison of what is good about this paradigm and what's not so good about this paradigm, right, why it was well-suited for your use case, but maybe not all use cases. And then, finally, what comes out of the model.

Speaker 2:

Does the model give a single output? Does it give multiple outputs? And then how much those outputs correlate with some type of demographic outcome, right, if we see really good outcomes going to certain groups and not going to other groups, that should be accounted for. This was like part and parcel. This is basically what was expected of people doing old school modeling and what went into them, and so you can see that model cards are trying to kind of get at this idea and trying to hit some of these points.

Speaker 2:

You know we might feel like the model card is great. Well, it hit all those best practice points and I think, from the perspective of a consumer or someone that's going to use the model, I think a model card probably is sufficient for them. Right, they don't need a deep, lengthy explanation. But let's not get too ahead of ourselves, because there's two groups of people who really, really aren't going to be satisfied with a model card and that's going to be the auditor and then your research team later on, like next year. So if you look at the original paper that describes what a model card is and you look at some of the examples in there which they very thankfully put some examples in there what a model card is, and you look at some of the examples in there which they very thankfully put some examples in there, they're only targeting half a page to maybe two pages long of content.

Speaker 3:

And I think this is what really, sid, I love how you're highlighting this. I think also, when this original paper was made, it's like hey, no one's going to be reading these academic papers, like we already have this now, where researchers read academic papers. They have that information. But when the genesis of the model card? Now I haven't talked to the initial authors myself, but what I imagine is, as these are researchers, that even made the model card is like hey, the general consumer needs something more consumable, so let's get those salient details and give them. I don't think ever.

Speaker 3:

At least it doesn't say in this paper that you don't need to then be doing the full thing. This is a summarization, the more intelligible to the user, not saying, hey, you don't actually need to have the other information. I think this is where in industry we've kind of gotten it confused of like oh, model card can just actually be these couple bullets Like regardless, instead of saying that the paper is like a half page, two page model cards. But now you're seeing these little index card model cards being used in industry and they're like well, that's enough versus like no, no, it's a summarization of the larger thing behind the scenes.

Speaker 2:

Yeah, I think that that's exactly right that these model cards are like they're good summaries. And I think for someone who just wants to use the model and just wants like a glimpse of like hey, why am I going to pick this model versus the other model that came out last week? Why is your model better? The model card is enough for that use case. But I don't think that we should get confused and then assume that that means that the model card is documentation or that it'll stand up to rigorous examination or gives you enough information to replicate that model.

Speaker 3:

Agreed. I think that's really. What we're seeing is the confusion on what the purpose is and thinking that just saying the high level is enough, versus the original idea of model cards. Was the consumer label not everything and you don't do anything else?

Speaker 2:

That's right, and we're not just trying to hype this up here. We have seen large groups like Google, like IBM, that have stood behind model cards as a form of model governance, and so we're trying to push back against that and say we need to build proper, holistic, full model documentations and these are going to be on the order of tens of pages long, because that's what you really need to recreate a model and to really explain to someone who is trying to poke holes in your model why your model does what you think it does, why it matches the use case and why you did the right things along the way.

Speaker 3:

And this is where, if we look at other fields that are like actuaries, as an example, they use models, predictive models uh, at times for specific tasks. It's more of a professional body than data science over they have, but it's a very more defined use case. Uh, same with economists and things like that. But actuaries, specifically, they have their own standards that define you know what what good model documentation look like. And one of them, like, they have very specific like what are the assumptions, what are all the assumptions, what are all the limitations?

Speaker 3:

And really being in depth, that outlining all those areas and that's what we're seeing is like a lot of these. If you look at going to any field that focuses on this type of thing, you're going to see these parallels of the level of depth that you need to really have. That peer review is really this common thread, that scientific method, peer review across. And then the model card is just a good way of summarizing up that information for the hey, I have this model built in a company. It's in the inventory and I'm just seeing one of the models that we have at the company. Model cards are great for that. It's just knowing what their use is and knowing where they fit in the broader context, instead of they're not an into themselves, they're not governance themselves, but they can be a good summarization of governance for non-expert consumers.

Speaker 2:

And that kind of brings us to this idea that there's something in between. Right, if on one end you have the model card, which is just the index card, and on the other side you have an academic paper or a white paper which is just like here's every single parameter that we use. Here's all the outcomes we got. Here's a discussion of how it was developed. In between, we have this idea of the system card, and I'll have to give props to OpenAI here for their system cards which they used for their GPT-4 models. They're, you know, not at this point.

Speaker 2:

It's an open secret that they're extremely caged about how their models are built. We don't know how big they are, we don't know what data goes into them. All that is very secret and private. But they did release this system card and if you have time, I absolutely recommend looking at the GPT-4 system card. We'll have it linked in the show notes and this really lets you see the deep and complex evaluation that they did against their model.

Speaker 2:

One thing that they were definitely not asked to do but they did, for example, was to do red teaming.

Speaker 2:

Right, they had a team that was given the GPT-4 model and they told them make the model, give some bad outputs, make it give malicious outputs, make it give dangerous outputs and really bounds, test their model, do that evaluation and then report it.

Speaker 2:

And this type of red teaming is not common, it's not required, it's nowhere in legislation, but it really shows at least some good faith planning to make a good model. And then also they probably have to do this internally anyways to put this into the public. So then sharing those results is actually very valuable to us. And there's also a really good section there which talks about social and ethical implications of the model which were not written internally. They actually got external groups to review the model, look at the implications of it and do a proper third-party write-up on their opinion of what this type of model does to impact the world that we live in. So you know good points there. I think the system card is not enough to recreate the model, but it's actually rather good for describing the weaknesses of the model and why the model was built the way it was built.

Speaker 3:

Yeah, we're definitely making history today giving open AI props, but they deserve it here. That's where they've really at least to my knowledge really termed this new phrase of system card, which is essentially what the market wants from a model card and what they think they're getting from a model card. But it's like, hey, it is a proprietary system. You don't want to be especially with third party a lot more companies building models and selling models to companies and things. There's this whole need and for consumer advocacy and things, there's a need for some public information about your model. That's not the peer reviewed level that you would have for replication because there is IP there. However, that's some sort of like how do I get comfortable that this thing is saying what? What it is? It has a USDA organic seal on it. Whatever have you like, it has some sort of a. You can trust that. They're just not making this thing up and there's been some thought into it.

Speaker 3:

System card, to me, is really addressing what that need is of the market of having an easily digestible high level understanding of what is going on and what the system can be used for and if you can trust it or not. So I really think OpenAI did a great job with this and that independent objective assurance, if you will, even though it's red teaming. That's slightly different. They had, like, I think, two or three actual firms and then open source red teaming, so they did a lot of this validation. So it provides to me that system card is really what the market thinks they're getting in a model card, but with a system card now you have a company that's actually delivering that and it's something you can trust, that for the general use cases that they define, it can perform at this level and they're even giving accuracy scores and things and they're not trying to just say we're 100% accurate either. It's really really well done, so very impressed with them.

Speaker 3:

I think hopefully this helps with a new trend away from model cards being the end-all, be-all in governance to pushing that next step of well, you need that objective validation and you need to actually make sure that it's real, because, as much as we want to believe everybody's doing the right thing at all times, having some sort of a test or objective assurance makes me as a consumer more comfortable that an open AI model is doing what they say it is, or a Facebook model or whoever that you can actually trust it versus like if there's no, no checking ever. There is some people that might be motivated to maybe push the boundaries a little bit on what's what's proper.

Speaker 2:

And I think this also helps build out a really nice continuum of model documentation. So we would ask that a model developer would first write that really in-depth document that would allow an internal team to recreate that model from that, take the pieces, that which are the evaluation, share that in a system report and then have another team distill that report one level further down into a model card. So we're not trying to say that model cards are useless. In fact, I think that they are part of this three-part solution, where one part is why should use this model? That's the model card. Why is this model good in an evaluation sense? That's our system card. And then what is it going to take to validate this model and then recreate this model? Is the proper white paper or academic report?

Speaker 2:

And I think this takes us to our last topic here, which is, I won't say controversial, but I think it's upcoming. It's kind of undecided on and I think we're making some decisions as an AI community how we want to do this. The temptation is, if model documentation is very hard, why don't we just automate this? Why don't we just let an LLM or an AI system do all this for us and generate that model documentation for us. So I'm curious to hear you know your guys' thoughts on what you know. What have you heard, what have you seen and what do you know? What are your initial or long-term feelings about? Basically, there's going to be pressure to automate this process and people are going to do it and people are doing it. We've heard people doing it. So what do you guys think about?

Speaker 3:

that Unleashing the Kraken on this one. I definitely don't think that's a good idea. This is back to my audit days. It was before AI and LLMs. I mean AI, not before AI, before LLMs were a common thing.

Speaker 3:

There's always this push back when I was working in internal audit for like, oh, internal audit needs to start automating processes and we need to be automating. But it's like, so wait, like when I was working in banking and things like that. So you have first line automating process. You have second line making sure that their process is automated. Now third line is going to automate the checking of the first line's automations. It's like, at some point, how many turtles on the way down are you going to keep doing of automation and whoever?

Speaker 3:

The job of internal audit and internal audit has kind of gone back and forth in this. But overall, they agree. Like the job of the internal audit function is to be the one that spots, checks to make sure it's doing as it should be. So like, of course you have spell checking and things like that you can do with LLMs. But you really need to be at some point if you're starting to say these AI systems are going to be doing more impactful things and automating how the world works. And now you're saying like, hey, all these genius computer scientists, phds, are building the systems, can't be bothered to explain their thought process, the assumptions and limitations of what they built and why, throughout human history, how science works and how people describe systems. Like somebody at some point needs to be accountable for the system, as the conversation we had with with patrick hall a while back uh, if nobody's account, if if there's like nobody accountable, then I forget exactly his, his word, uh wording around.

Speaker 1:

If everyone's responsible, nobody's's responsible. Exactly.

Speaker 3:

Exactly. So you get that that part of like somebody that's signing off on this model. If they're going to risk their reputation and do do something, um, automation, well, that's on them. But as a general rule, like pushing this, like let's automate governance, make my God, make governance easier. Same with them. Let's just say governance is a model card type.

Speaker 3:

Push you means you're trying to use AI irresponsibly and you don't even know what are the parameters of what good looks like and things. So like I would say that the more we're automating the rest of the world, the more a company is trying to use AI. That makes the documentation and understanding the assumptions, limitations, what you're doing and why, more and more and more important. Now, of course, there's and we have them to utilities and things you can do, to, like you know, looking at data distributions and things. Yes, you can have repeatable functions that can do that kind of thing, and there's lots of automations. You can do continuous monitoring and things like that. Yes, those things can be automated, but somebody still needs to be connecting the dots a human being of this is what you're doing and why. Here are the assumptions I made, here's the limitations, here's how it should be used and reviewing those components. So just saying that like hey, I'm not going to actually write up my limitations, I'm going to just point an LLM at my GitHub repo. That's the problem as a governance thing. Now, if you're doing that and then thoroughly reviewing the whole thing, well that can be your personal workflow and nobody's saying you can't do that of using, but like I am going to be automating governance and taking a human out of the loop of governance.

Speaker 3:

I think it's sorely misguided. It's not compliant with any regulations that are coming out, specifically eu and things like that. But I think it's very misguided of this always searching for the easy way out. Um versus like if you're trying to AI, you need to be understanding the limitations, or we're going to get into this weird situation of nobody knows what's going on and like you have all these papers and stuff of how, how LLMs and stuff have decay over time. So there's, it's very dangerous, I think, to be just be like we're going to automate the governance. There are ways, there are certain responsible ways, but I think that's a dangerous path. It's nuanced. That needs to be really kept in focus. What are we trying to accomplish and why? And understanding that there does have to be a human in the loop at some point.

Speaker 2:

I would say that I'm pretty much in agreement with that standpoint. You know, ai is going to be a really useful part of helping you out with this, giving you good nudges, maybe giving you good notes on what you've already written. But this exercise of writing documentation is part of model building. Until you sit down, you write down. The model is going to do this. I put this into it. It has these limitations.

Speaker 2:

By yourself and by hand, you'll never understand how your model works, and if you let this piece happen through an LLM or through an AI system 100% or 90% you don't understand how your model works. You'll have a hard time defending it, and if someone comes by later and says, explain how this model works no one can because the AI is not going to help you out then this process of model documentation is part of the model building. Is what we're trying to emphasize here that it's not something to skirt up and slide away from. You could use help from AI systems to write these types of things, but this is something that you have to do pretty much on your own.

Speaker 3:

Totally agree. You can use tools to help your workflow where no one's disagreeing with that but it is the model owner's workflow and they need to own that workflow.

Speaker 1:

Yeah, and have the accountability check above that, exactly Okay. Any final thoughts for our listeners? Today We've covered a lot on model documentation, origins of it, where best practices originated from, but really then, what's happening now with the usefulness of model cards, system cards and the importance of what you're doing with them and when? Any final recommendations for people who are struggling with the amount of work or maybe the change management it's?

Speaker 3:

introducing into their jobs. This is a hard part for data science practicing in industry of like you need to move fast, you need to do more models, like there's always that push but there's really a need and we don't necessarily always know the solution for this but like, of good data science leadership and accountable executives that know that, like hey, to do these things responsible, to have long term models that are performing are meeting business goals. You see all the studies out there. You know 80% of AI projects fail things like that. Well, one of the major contributing factors is because nobody understands what's going on and nobody documented it and that person left and like.

Speaker 3:

So documentation is so crucial. If you look at, like, how bridges are built, nasa, any of these things like documentation and understanding and really writing up is a crucial component to all those complex human endeavors. So why would models be any different? So, really trying to reestablish this, like really defining, trying to reestablish this like really defining and also for me personally anyway, and I'm interested in said so, it's like defining on paper what you're doing and why really helps clarify and really helps you find the way.

Speaker 3:

What about have I thought about it this way? Or if I write it down in a way that I had to fully define it and give it to Sid. Sid's probably gonna have a great, a lot of good peer review. For me, like, you're skipping a lot of the great, how do I make a actually fit what my business wants it to be? So when you do drive through fast food style model development, sure you'll get a quick, cool demo really fast, but you're not going to be fulfilling the need and we see that based on the studies from Gartner and others, like models aren't doing what people think they're going to be doing right now. So could model documentation be the missing link? I definitely think that's one of the components the missing link.

Speaker 2:

I definitely think that's one of the components and, on top of just making models right, if we're just talking about speed, if you just want to make models faster this actually will help you speed up your process, because you're going to build one great model. You're going to build it from the beginning, understand how it works, know what's going on, and that first model will be significantly closer to the right model than if you just went ahead and did the model and then try to document it post hoc. It's this process of building things, the slow and deliberate way, which is going to allow you to build proper AI systems. And so it might feel slower to build that first model, but you're going to be building it on much stronger grounds and a much better foundation.

Speaker 1:

Absolutely. We had a brazen training that was like train slow to race fast in rowing, and I think that that's it sounds so applicable here in terms of like, yeah, what might be as long haul it might be a long haul to do that documentation and take those steps. Of like, yeah, what might be as long haul it might be a long haul to do that documentation and take those steps. But the ounce of what? The ounce of prevention is worth a pound of cure? Um, I'm gonna make sure I get all the cliches in there, sure, well, I mean it's exactly.

Speaker 3:

It's like what we've talked about since day one on the podcast and great thanks for bringing that up as well. As, like fundamentals matter, you can't get around the fundamentals. Documentation is a main part of fundamentals. You go to rowing, running, you name whatever sport, hobby, whatever it is, the fundamentals matter in everything, right. Like, you can't just skirt the fundamentals Michael Jordan didn't stop the fundamentals. You're like, if the masters of any craft are good at the fundamentals, well, we're basically taking legs off the stool If we start saying, well, you don't have to document something, you don't have to validate something, it's just going to magically work because it's AI. No, that that means you're missing the fundamentals. No one hits peak performance in anything in life without focusing on the fundamentals.

Speaker 1:

Well for everybody listening. We thank you for your time. We hope that this episode on model documentation helps you take a step back and look at your processes a little more closely and really understand what's at stake whenever you're defining what's happening with your models. If you have any questions for the hosts, please leave us a note at wwwmonotarai. Slash the AI fundamentalists Until next time.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Shifting Privacy Left Podcast Artwork

The Shifting Privacy Left Podcast

Debra J. Farber (Shifting Privacy Left)
The Audit Podcast Artwork

The Audit Podcast

Trent Russell
Almost Nowhere Artwork

Almost Nowhere

The CAS Institute