Supervised machine learning for science with Christoph Molnar and Timo Freiesleben, Part 1 Artwork

The AI Fundamentalists

A podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses.

All Episodes

The AI Fundamentalists

Supervised machine learning for science with Christoph Molnar and Timo Freiesleben, Part 1

March 25, 2025 • Dr. Andrew Clark & Sid Mangalik • Season 1 • Episode 29

Machine learning is transforming scientific research across disciplines, but many scientists remain skeptical about using approaches that focus on prediction over causal understanding.

That’s why we are excited to have Christoph Molnar return to the podcast with Timo Freiesleben. They are co-authors of "Supervised Machine Learning for Science: How to Stop Worrying and Love your Black Box." We will talk about the perceived problems with automation in certain sciences and find out how scientists can use machine learning without losing scientific accuracy.

• Different scientific disciplines have varying goals beyond prediction, including control, explanation, and reasoning about phenomena
• Traditional scientific approaches build models from simple to complex, while machine learning often starts with complex models
• Scientists worry about using ML due to lack of interpretability and causal understanding
• ML can both integrate domain knowledge and test existing scientific hypotheses
• "Shortcut learning" occurs when models find predictive patterns that aren't meaningful
• Machine learning adoption varies widely across scientific fields
• Ecology and medical imaging have embraced ML, while other fields remain cautious
• Future directions include ML potentially discovering scientific laws humans can understand
• Researchers should view machine learning as another tool in their scientific toolkit

Stay tuned! In part 2, we'll shift the discussion with Christoph and Timo to talk about putting these concepts into practice.

What did you think? Let us know.

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

LinkedIn - Episode summaries, shares of cited articles, and more.
YouTube - Was it something that we said? Good. Share your favorite quotes.
Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

Speaker 1: 0:03

The AI Fundamentalists a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, andrew Clark and Sid Mungalik. Welcome to today's episode of the AI Fundamentalists. Today we'll discuss machine learning for science. This is the first episode in a two-part series on supervised machine learning for science. We are excited to welcome back Christoph Mollner. Christoph is the author of several easy-to-read data science books, including the Modeling Mindsets, which some of you may remember from our very first episodes of the podcast. Christoph co-authored Supervised Machine Learning for Science with Timo Freibusleben, who is also on the call today. Timo is a postdoc in the Philosophy and Ethics group at the University of Tübingen and an external member at LMU Munich, where he completed his PhD and worked on explainable AI systems. Christoph, timo, thanks for joining us.

Speaker 3: 1:16

Thanks a lot for having us.

Speaker 4: 1:17

Yeah, thanks for the invite.

Speaker 2: 1:21

We're super excited to have you today and we're really excited about this topic, this intersection between scientific modeling and machine learning. We previously had an astrophysicist on the podcast talking to us a little bit about how complex systems are modeled by scientists to do stuff like galactic simulations, and there's a lot of interest in machine learning there. So let's maybe start there and machine learning there. So let's maybe start there. When you look at the needs of scientific modelers, what kind of problems do they need to solve and how might they originally have been doing this and how could those methods be improved by using machine learning or supervised learning?

Speaker 3: 2:01

I think. First, it's important to say that it's very difficult to generalize to all of science because different scientists have very diverse needs, and so it's a bit difficult to generalize.

Speaker 3: 2:15

I think, what unifies them is the idea of prediction, and it was like, if you think back of Karl Popper, there was even the definition of science as something that makes predictions that are falsifiable. So we say something and it might be wrong, but if it's correct, great, then we can make progress. So one thing is I mean prediction is like a necessary condition, but it seems like we also have other goals, and what I like to characterize science with is more like with their goals, and what I like to characterize science with is more like with their goals and other goals that we listed in the book is goals like control.

Speaker 3: 2:52

So you want to enact control over the system, you want to explain a phenomenon or you want to reason about a phenomenon, and so the question is how to achieve these different kinds of goals. So if you, for example, ask a social scientist who studies life outcomes and what kind of studies people will take up you might find out okay working class children will often not get a master's degree in the end, or not get a bachelor's degree.

Speaker 3: 3:29

So you could say, oh, that's a sign that we should invest more in people from academic households because they're more likely to succeed in these things. But that's actually the wrong kind of reason, right? So we want to intervene to make the world more equal, to make it a better place. So here we.

Speaker 3: 3:46

We need to enact a certain amount of control and have some kind of causal understanding, and I think the way that scientists usually gain this kind of understanding is by building up world models like little toy models that represent the world in a certain way, and then they start with certain assumptions, integrate the data and knowledge at the same time, and at some point they increase complexity again and again, and again, until they have better and better models. I think that's the standard picture of how scientists traditionally work.

Speaker 2: 4:23

And so when we take that in the context of machine learning right, you almost compare prediction to hypothesis reasoning and there's this idea of natural controls in experiments and this idea of causal modeling. How do you see machine learning as a paradigm that matches these problems well and could help to advance scientific research?

Speaker 3: 4:48

For some scientists because of the reasons that I just gave you, because there's these different kinds of goals, I think for some scientists machine learning seems like the wrong tool because it doesn't tackle the questions they are actually interested in or they used to be interested in. So I think this extreme prediction focus is a bit of a downside. But there are also very good things about machine learning. I think the very clear idea of what a prediction is and what performance is. I think this operationalization of prediction in terms of a benchmark. I think that's a really cool aspect of machine learning. And like, especially if you face problems where this traditional approach seems not to be really usable, because we just don't know like much about the phenomenon, we don't we are not able to give very good assumptions that we think are sound.

Speaker 5: 5:48

I think in this case machine learning seems to be like an alternative approach.

Speaker 3: 5:51

We can say no, no, we're not building up models from simple to complex, we are not going the traditional path, starting with a complex model and try out what we can get from it and get very good predictions. The key question is then, like how you said in English, can we have the cake and eat it too? Like, can we have super good predictions and at the same time also have, like this, causal understanding and control of the process and reason about the phenomenon of the model? And I think this is the big open question, and I think our book at least that was the intention is trying to go this way so say we can use machine learning with this prediction focus but at the same time, a lot of things that got lost in the way we can gain back.

Speaker 4: 6:52

Maybe to add to that also. I think machine learning, if you understand it, if you can use it, it's quite a convenient tool to use because you don't have to put too many assumptions into what you're modeling, so you don't need a complex theory. I mean, you've got to know what you want to predict and what to predict it from. So there's a lot of domain knowledge, of course, already going into this. But otherwise, if you have the data, the machine kind of figures it out on its own. So from that perspective it's very convenient. And it depends, of course, what you use it for. But if you use it to maybe where you have used something else before before, like a very slow simulation, so you're suddenly also having a tool that is really fast and, um, yeah, taking away all this like theoretical work you might have to do.

Speaker 4: 7:36

So I think that's also like just a very good starting point, why many scientists also just try it out and see, oh, it's quite useful. And then later on you get the questions of like, oh, yeah, it's just modeling associations. What do we do now? Is it really what we want? Do we need maybe a more causal approach? So I think that's yeah, it's a very good. I mean, it's also a hyped up tool, but I think there's substance behind it, obviously. But yeah, that's the starting point. It's just a super useful tool, and then afterwards you have to start figuring out how to add all the things that are missing from this tool.

Speaker 2: 8:16

Yeah, I think that's a really strong point that any scientist can use these tools to create very strong baselines, basically just information-based baselines, no, just information-based baselines, no domain expertise required, and then they can compare it to their complex modeling systems, right, which may be like agent-based or may have some understanding of the natural world and the physics or interactions we expect between items in the study. I think that this also can cause scientists to get a little bit scared about machine learning techniques. Is that if it doesn't really understand anything about what's going on under the hood, they may be a little bit scared to use it. So I kind of want to like build up the problem. Then we can kind of work through some of the solutions that you guys posit. What are some of the very obvious problems and concerns that scientists have with using machine learning that really hold them back from jumping in?

Speaker 4: 9:05

So there was actually like this survey where they, I think by nature, and they asked, like 1,600 scientists, what's like the greatest concern about using machine learning or AI and science, and the top concern was that there's a lack of understanding then about like the science itself, because there are more parts and pieces you replace with these algorithms that just learn from the data, the more you as a human scientist are disconnected, of course, from like building a model by hand, let's say, even if it's like a statistical model I mean it's also learned by data but you're controlling more parts of the model or if you're doing like an agent-based simulation, then you're also, like you know, you're more in control of like the individual pieces and parameters and variables and the more machine learning you use, the more you give up that control and especially if you use very complex models like deep neural networks or gradient boosted trees, then you give up a lot of control.

Speaker 4: 10:14

So, yeah, I think it's a reasonable concern to have, at least if you just get started with machine learning, because there are some tools, of course, that you can use to get more into interpretability, understanding, robustness and uncertainty, can use to get more into interpretability, understanding, robustness and uncertainty. But if you're just going with the defaults, yeah, you just have this one function, one prediction function that gives out your prediction, and you haven't learned much, maybe, about your prediction problem if you haven't looked deeper into what you're actually doing. And if you haven't looked deeper into what you're actually doing and if you compare that, of course, with the traditional way to do science, then yeah, it's a lot less hands-on.

Speaker 3: 11:00

One thing.

Speaker 3: 11:01

I would add is that often scientists stay after some form of causal understanding of the process. And the strange thing in machine learning is that if you just keep on adding features, your model will be better and better and better. So this kind of careful feature selection and only pick the ones that have causal insight, it's not like if you can get predictably better, why not just keep on adding features? And I think if you think of something like in medicine or in education science, where you only have observational data, I mean sometimes no one knows which one is a causal feature and which one is not, and so you can maybe at some point get good predictions, but is this really what you're after?

Speaker 3: 12:05

I think this is one of the reasons why, in psychology, in education, science, in medicine.

Speaker 2: 12:08

Sometimes people are more skeptical, I think, and I think that's all very spot on right, and so I guess, maybe at a high level, since in part two of this series we'll talk about more of the specific mechanics of this. But how would you guys talk about the ways in which scientists can use machine learning in a way that they can embed their own domain expertise and in ways in which they can get the causal insights that they're really requiring in domain expertise, and in ways in which they can get the causal insights that they're really requiring in order to publish and basically be taken seriously in their own field?

Speaker 4: 12:39

So, yeah, well, one very trivial answer maybe is that just by defining the problem, of course you're pouring into like a lot of your domain knowledge already into it, like by just deciding like what's my targets to predict.

Speaker 4: 12:56

So because at least if you're working on a new problem kind of that hasn't been solved with machine learning before or attempted with it before, then you have to make a lot of these difficult decisions. You have to make a lot of these difficult decisions like what's your target? And if you have like a time dimension and you're like, okay, should I aggregate it by week or by month or do account events or just say event yes or no in a time window, like there's a lot of like these dozens of decisions like and if you look at it after you've decided it all, it seems like obvious, but it's often a very complicated process to get there. And then you also have to decide what are the features that are going in. So I think a lot of these decisions, if you do them right, they already map your domain knowledge onto this machine learning domain for feature inputs, problem statement and so on.

Speaker 4: 13:51

So I think that's yeah, if you maybe overlook sometimes, but I think there are a lot of things already decided at this point.

Speaker 5: 14:02

With us doing kind of this outside, in of like normally, as you mentioned, timo, really traditionally, you know, scientists build small models and they get more complex right as their understanding grows. But now that we're starting kind of these like let's build our domain knowledge in with more inputs and then making this complex system thinking they understand it, using a little bit of like causal analysis, to try and like say that it's the right thing, but because they're starting with the complex model instead of the small one, are there any issues? You're seeing around kind of making this confirmation bias, of assuming that there's these relationships and issues when there's not, it's predictive, but you're not actually understanding the system.

Speaker 3: 14:53

I mean the classical examples are these kind of shortcut learning cases where you have like, for example in medical imaging, you have these hooks by doctors who, just like, gave their labels already, and so the model was almost perfect on the training data set and also on the holdout set because it was from the same data set.

Speaker 3: 15:21

But if you start to put it to another hospital or even with the same device, if you use the same device and get new data that doesn't contain these hooks, then you just have a diminishing performance. And I think just from the performance directly inferring that the model has actually learned something meaningful is very difficult. And if I think back historically, I think when people first found out about adversarial examples, I think they were really thought like maybe this machine is not learning anything meaningful at all. I think we are a bit more optimistic again because we found, like just for example, this paper by Elias at all, where they say, oh, maybe these models learn good predictive features but they just don't use the features that we use when doing these classifications. So smaller manipulations on features that we don't use but actually predictive, can completely diminish the performance and maybe that's not as bad as any permutation can completely diminish performance.

Speaker 4: 16:33

Maybe a counterpoint to that. So, yeah, we can have short cut learning, but we can also have the case that well, so I've been there being in a Kaggle, competition and, of course, no domain expert. But I have ideas like, ok, this is going to be the next really good features, and I spend a day doing feature engineering. Then I put it into the model and I'm like running it, evaluating it and it's like maybe a little worse than before and I'm like it kind of like crushes, like all your expectations, and this way it can be the opposite of a confirmation bias, because you're like confronting your domain knowledge with actually, yeah, with- reality or at least with your data, and so if you have this theory maybe okay, this, these are these.

Speaker 4: 17:17

Let's stick with. An example like these blood values are predictive, maybe, of some disease and then you put them into your model as a feature and it turns out that it doesn't help the performance of your model, then it's's actually quite humbling experience. So it could, of course, be that you made some mistake or so, but often it also can give a measure of how good your domain knowledge was, at least in this setup. You have to understand, of course, now how the machine learning works, and maybe it's already covered in other features and so on. But if you understand all these things, then it can be a really good way to evaluate your domain knowledge and counteract a confirmation bias.

Speaker 3: 18:07

Yeah, that's a really nice point about this chapter that mainly Christoph wrote. There's this two-way street, so we can all integrate on one hand side domain knowledge, but we can also test our domain knowledge. And not everything that improves performance necessarily means that we made progress, but if we think certain kind of knowledge is valuable and meaningful, it should actually also increase performance thank you for going into that.

Speaker 5: 18:37

Yeah, I've definitely seen, like machine learning, people either go two ways on it be like turn the brain off and it's just, you know, overfit it and rock and roll, or it'll be, you know, overthinking to extend. So I really like the balanced approach you guys are, uh are, proposing. I think that makes a lot of sense. Thank you for going into that.

Speaker 2: 18:55

Yeah, and I think this kind of highlights this undertone, where maybe scientists are doing the right things, but we just don't see that in the public, right, for every paper that gets published, where an MRI gets cheated by the fact that there's a ruler in the image and that biases them all towards saying that there's lung cancer in a radiology scan, there's an equal number of scientists that are testing hypothesis in a lab with a machine learning algorithm and saying like wow, my domain expertise does not actually match up with the statistical learning available to us. So we are seeing this interplay, but maybe we're seeing more of these negative examples out publicly. What do you feel, then, is the current relationship between science and machine learning, and do you really anticipate this changing over time? Or do you think that we've established some norms, or what are some norms you expect to be established in the future?

Speaker 4: 19:51

What I'm seeing is that it's very different by field. So some fields are quick to pick up Machine learning is already using it. One field that does it a lot is ecology. I've seen many examples of machine learning being used and being established. And in other fields then you see some first papers which are like hey, we are used to working on problem X and here's a way how we could do that with machine learning. So that's on problem X and here's a way how we could do that with machine learning. So that's just that's the value of the paper, and maybe then that's the first paper will like open the space for more papers that solve the same problem now with machine learning.

Speaker 4: 20:30

And then there are other fields which don't use it as much. But what we have found is also that it's really difficult to find a field where it's like there were zero papers with machine learning. So in general we see a pickup of machine learning in every field really. And yeah, I think there's always this accepted range of tools that scientists are using and I think it's probably not written down anywhere but each field has their tools that are accepted. So, for example, in medical research you often find a logistic regression model and in general statistical analysis.

Speaker 4: 21:09

And yeah, if you go to a certain journal, then it's clear that some type of analysis would be expected which matches this, this, and maybe it's easier than if you use frequentist statistics instead of bayesian, and maybe you cannot get machine learning published in this journal and things like this.

Speaker 4: 21:29

So this is really like also a cultural thing, like what tools people are using, but also like a learning thing. So so machine learning is, I mean it's an argument that it's quite old but like with the tools that we have and time it takes to adopt for also well, not machine learning experts, but experts beyond in other fields to adopt the techniques. And yeah, I mean also if you look at neural networks, they've just become a lot easier to use. I mean, in the beginning I remember playing around with tensorflow. In the beginning it was really difficult to work with it, and now we have a lot easier tools to work with neural networks and you can just or download model weights and get started super easily, and it's just a very different field right now. And also, as a researcher, you can now much more easily access these things and use the machine learning in your research.

Speaker 3: 22:31

Something that I always thought is like. At the moment we see a lot of these cases where there's just new problems that we can approach as scientists. So medical imaging was just a lot harder before, and now we can like tackle a completely new kind of problem set. And so there's a community that's almost exclusively relying on machine learning, and if you look into their research papers, to me they look basically like machine learning papers. They have a benchmark somewhere, they introduce their algorithm and at some point they discuss really often mainly the shortcomings of the machine learning algorithm or the modeling approach.

Speaker 3: 23:17

What I think is where it's getting more interesting is in fields where machine learning is partially substituting or an alternative approach to the same problems we had tackled before, and a really cool case here is weather forecasting. That's something what I'm really interested in is what we've seen recently. There was this new model called GraphCast from Google, where they showed the model is actually outperforming the HRES. So the best weather forecasting model we have, which is from the European Weather Center, and here we see they're actually tackling the same problem. Both of them are interested in making good predictions, good weather forecasts, and now the question is how to integrate those tools.

Speaker 3: 24:06

How are these different communities tackling the same questions in a slightly different manner? So I think that's really exciting and I'm really curious what's happening when the two collide in the sense, these two cultures. I think something that your question also raised for me is this more long-term question like is maybe, in the end, machine learning going to automate science? Can we get machine learning to run the whole process? And I think, as a philosopher, for me that's a very intriguing question and I think we can see parts that could already be automatized, the parts of the modeling, but, for example, one crucial bit that I would be really keen on seeing cool cases there is so humans do a lot of feature engineering human scientists, so they don't just take the raw data but they actually think about oh, what could be good representations?

Speaker 3: 25:10

And now I build simple models with these very few representations that I use. And what machine learning is doing is the alternative approach is just using a lot of features, but maybe machine learning can help us also in this first step of finding very good features that are also attributable to humans. And I think that's something that machine learning still has to show is that they can find meaningful new scientific laws or representations that we as humans can also use and understand.

Speaker 2: 25:44

And I love both those answers because they kind of get this idea that, like, we're kind of seeing like a growth curve, that like as scientists first start using machine learning, the title of the paper says like with machine learning, right, it's like very prominent. This is the model, this is the method. We kind of are over this hump in some fields, like in financial sciences and health sciences, where you just write the paper and machine learning as a tool the same kind of tool that a Pearson correlation is just a tool in your toolbox. And yeah, I wonder if we are heading towards this point and I agree to Timo's point that we're seeing starts at this but we're not really seeing it successfully happening these machine learning models creating really good and interesting features on their own or, you know, developing their own hypotheses or their own prediction methods. But that could be like that could be a potential next step. But I think that right now we're in a really strong phase where we can see machine learning as another tool in our toolbox and not as a piece of magic.

Speaker 1: 26:38

Learning as another tool in our toolbox and not as a piece of magic. Well, christoph and Timo, you've given our listeners many ideas to think about where the sciences can expand their predictive power through machine learning. Everyone, please join our next episode for part two, where we'll shift the discussion to how to put these points into practice. In the meantime, be sure to check out their latest book, supervised Machine Learning for Science how to Stop Worrying and Love your Black Box, available at your favorite bookseller. We'll also provide a link in the show notes. If you have any questions, please contact us. Otherwise, we will see you in part two. Thank you.

People on this episode

The AI Fundamentalists