AI in practice: LLMs, psychology research, and mental health Artwork

The AI Fundamentalists

A podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses.

All Episodes

The AI Fundamentalists

AI in practice: LLMs, psychology research, and mental health

September 04, 2025 • Dr. Andrew Clark & Dr. Sid Mangalik • Season 1 • Episode 35

We’re excited to have Adi Ganesan, a PhD researcher at Stony Brook University, the University of Pennsylvania, and Vanderbilt, on the show. We’ll talk about how large language models LLMs) are being tested and used in psychology, citing examples from mental health research. Fun fact: Adi was Sid's research partner during his Ph.D. program.

Discussion highlights

Language models struggle with certain aspects of therapy including being over-eager to solve problems rather than building understanding
Current models are poor at detecting psychomotor symptoms from text alone but are oversensitive to suicidality markers
Cognitive reframing assistance represents a promising application where LLMs can help identify thought traps
Proper evaluation frameworks must include privacy, security, effectiveness, and appropriate engagement levels
Theory of mind remains a significant challenge for LLMs in therapeutic contexts; example: The Sally-Anne Test.
Responsible implementation requires staged evaluation before patient-facing deployment

Resources

To learn more about Adi's research and topics discussed in this episode, check out the following resources:

What did you think? Let us know.

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

LinkedIn - Episode summaries, shares of cited articles, and more.
YouTube - Was it something that we said? Good. Share your favorite quotes.
Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

Speaker 1: 0:03

The AI Fundamentalists a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, andrew Clark and Sid Mongolek. Welcome everybody to the AI Fundamentalists. Today's episode we're going to be talking about language models for psychology research. This is one in several episodes that we're dedicating to practitioners and hands-on application of LLMs in real life. With that today we have Adi Ganesan. He's a PhD researcher at Stony Brook University, university of Pennsylvania in Vanderbilt. His research is at the intersection of computational social science and natural language processing, specifically looking into understanding mental health through the lens of language. You can find his work published in the top NLP journals as well as some of the top psychology journals. He's also Sid's recently former lab mate.

Speaker 2: 1:12

So, adi, welcome to the show, hey guys, it's great to be here and I'm very excited to talk about language models and mental health today, because there's a lot of hype going on outside about these. Models can do everything except your grocery shopping. Maybe it can also do that in the future. But there is also some amount of trepidation that a lot of research opportunities are going on on the other side for people like me working in this intersection. I'm here to maybe tell that it's neither. It is neither good at doing everything, nor are all the opportunities gone away. Yeah, I'm excited to be here. Thanks for offering this platform.

Speaker 3: 1:57

For sure. We're really excited to have you and that's where we often on this podcast very much make fun of LLMs or talk or like talk about the limitations and try and, like you know, burst the bubble a little bit of of the hype, but I think sometimes it might appear that we're all anti LLM. So that's where it's like the research you're doing is something that's like there are real viable uses. However, like how it's often talked about and you know how Sam Altman and folks talk about LLMs, is is a little interesting and that's what we're usually talking about. However, we do on the fundamentalist generally uh, genuinely believe there are real use cases for large language models and you are a cutting edge researcher doing exactly that, so really excited to have you on the show before we hop it into our subject.

Speaker 4: 2:42

Uh, I guess just a fun question for you guys. What have you guys been reading recently? Anything interesting on your desk?

Speaker 1: 2:48

I'm going to throw that one to Adi first.

Speaker 2: 2:51

Ah yeah, so I've started reading this book by Douglas Walton. It's just called Abductive Reasoning Very short, simple title. It's been a very good, engaging and enjoyable book so far. Kind of very relatable in the daily practicum of being a researcher. Where do I actually exercise this and how much of focus has been steered into just deduction and induction. And I also like this concept of framing abduction as a game. You know, a dialogical game where you come up with a hypothesis that can explain an observation and then the dialogue goes into a commitment-based game how much committed you are to that hypothesis and trying to unroll that, how much it is supported, how much of the hypothesis is supported by other evidences. So it's been a fun read so far.

Speaker 4: 3:48

Well, I think we can just hop into it then. So I guess, adi, tell us a little bit about the research work you do and what you've been up to Obviously very similar to the work that I've been doing but I'd love to hear about what you're doing and how LLMs play a role in the research that you do how. Lms play a role in the research that you do.

Speaker 2: 4:07

Yeah, so I've been mostly looking at assessments for the last two or three years, just getting at this aspect of how can we build automatic methods to be able to precisely assess the mental health of a person. And one of the aspects that has been missing in research, or rather that is more in the nascent stages of being addressed, is the longitudinal aspect, that an assessment doesn't take the longitudinal aspect of the lived experience into consideration while making an assessment and that is something that I've been working on building longitudinal transformer models, taking behaviors of people contextualized over time as well as through other behaviors, in order to form a more accurate assessment. But now I'm starting to work on conversational applications, work on conversational applications. What are the missing pieces in the way these models, or chat GPT-like models, are being trained on? One such aspect is goals.

Speaker 2: 5:17

It has been trained to serve information readily. That is when we are in seeking mode. You would like to explore a little bit. This lock and key does not really fit in well, so I'm starting to look a little bit more into how can we kind of understand how this mismatch in training is going to affect, if we have these exploratory or epistemic needs, the search for knowledge and how can we make the best out of it. How can we actually align it for this kind of a use case? How can we resolve this mismatch between serving information and seeking information?

Speaker 4: 5:54

And I think this kind of keys in on this little zeitgeist we're having where people are using chat, gpt as a therapist and even in more principled cases, we have researchers trying to build mental health bots, agents that you can speak with to do mental health diagnoses and treatment plans. But, as you're saying, you know these struggle with the fact that LMs are meant to be very forthcoming with information and not all of them are well-tuned to actually, you know, work with you and discover your needs for a few weeks before coming up with any determination.

Speaker 2: 6:32

That's absolutely true. There was this recent work that expositioned this, that these models are over-eager to engage in problem solving and not let the patients develop a skill of problem solving. It's very cool work from Tim Althoff's group in the University of Washington. It also expressions a lot of other behavioral traits which are characteristic of good therapist behavior from real conversations and show that at what places do models exhibit good therapist behavior and what places they exhibit bad therapist behavior. So one of the things was this over-eager mode to produce problem solving, not engaging in the depth of the conversation to understand where the person is coming from.

Speaker 2: 7:12

Immediately solving this is how you need to deal with this, rather than spending time to understand what their problems are. Where is it coming? There's also, as we know, a lot of like overemphasis on empathetic language use. When a patient and a therapist are engaging in a session where they have moved to the point where they would like to engage in problem solving, you don't want the model to keep saying I am sorry to hear that at every turn, right? So essentially, when to say what seems to be a big problem and I think that is a grand challenge here, how do we actually fix that?

Speaker 2: 7:48

I think that is something that I'm looking forward to in the next two to five, six years that there'll be a lot of focus going into when to say what. How do we go beyond just training the model for what to say in response to the immediate query rather than, oh, this is the gold state of this conversation. This is the gold state of this conversation, this is the gold state of this relationship with this person. This would be the best for keeping the beneficence of the interlocutor, of the person, into account. So there are broader objectives and goals that goes into determining this behavior, the goodness of behavior, and I think these are, like some of the research horizons that are open and, frankly, very exciting to address in the next few years.

Speaker 4: 8:33

Absolutely. And even here in the short term, it feels like we're kind of fighting this problem right. You know, maybe you've heard of like with the GPT-5 launch. People are complaining like, oh well, they finally fixed the fact that not fixed, improved the fact that GPT was very sycophantic.

Speaker 4: 8:46

It would like sucking up to you, it would like to be very complimentary of you and it's like wow, you know, that's such a great observation of yourself. You must be really keen and insightful, which is not very useful for a therapist to keep giving you or to you know, continually validate your viewpoint on things, especially when you're dealing with cognitive distortions. Continually validate your viewpoint of things, especially when you're dealing with cognitive distortions. And we've seen some very tragic cases where the model has, you know, backed up people's opinions and thought processes on certain things that have put them down very dark holes. But then we see the problem that these types of models that are very, you know, straightforward lose a lot of that emotional capacity or at least the ability to mimic emotional capacity capacity, or at least the ability to mimic emotional capacity.

Speaker 2: 9:34

So I mean, it is not just that. I'm just, you know, talking about the places where it is not good. I think there are places, even within the realms of mental health, that could be a useful tool. For example, the cognitive reframing is one of the really challenging tasks for patients to carry out In the current setting. What they are given is a worksheet which looks like if this is the thought prep that you're dealing with, this is what you need to do. It is extremely challenging in that grid worksheet, worksheet.

Speaker 2: 10:09

What the recent language models have shown to be useful is making it easier for people to identify the thought trap, because identification of thought trap is not as fundamental as actually addressing it right. Some amount of help here. That, too, when you can think that the person is going through a rough patch in life or going through a difficult time cognitively, they are not that activated to be able to think through these things. This is where some amount of aid can come in. Again, really cool work from Ashish Sharma and Tim Althoff back in 2023. They show that they even go through this really cool evaluation framework of using language models for supporting cognitive reframing. They use experts to test this model and evaluate it on a bunch of really valid criteria, and that seems to be a place where it could be really useful.

Speaker 2: 11:02

And another place where we see is note-taking. That seems to be a place where it could be really useful. And another place where we see is note-taking. If you have a clinician talking to a patient, note-taking is something that you can definitely offload because it frees up some cognitive space for them to focus on the patient. You can reduce the number of tasks, which is almost mechanical at this level. So things like this, I think it would still be it is still of high impact integrated into our current pipelines so that we can focus on what is more important.

Speaker 4: 11:35

I really like that. I want to dig in a little bit deeper. So you talked about how these models are useful and have impact. When people are deploying these LLM systems, they're often just getting deployed because they're exciting to deploy, but we're not getting a lot of that validation or evaluation in these models. So when you want to claim that an LLM or a chat is useful for use case, how do you think, through a research mindset, that we could validate the model's usefulness or impact?

Speaker 2: 12:05

Right, this is a very complex question, right? How do we validate these language models with the research mindset, even within the broad realm of mental health? This is where I think some amount of scoping would really help, and I've been reading papers which I've been trying to get at. How do we set this up? So the first fundamental question is they try to break this down into what are the attributes that would be good to have for an agent in this system, right? Of course, there is also the other breakdown of what are the different tasks, and it looks like most of the existing evaluation frameworks focus on privacy, they focus on security, they focus on effectiveness, Although effectiveness is a little bit of a difficult metric when you think about it.

Speaker 2: 12:54

Is the patient feeling really good after the conversation? Well, somewhat effective, but then in therapy, you would want to build a long-term sense of effectiveness in patient, right? I'm not saying there are not works that look at it, but it needs to be more thought out. Some factors that keeps getting overlooked is engagement and how much it integrates into real world. You can imagine a sort of a Freudian utopia where you have an assistant in your pocket giving you emotional support, but I can also think of a case where the person becomes too dependent on the scratch, where they are not learning the aspects that underlie emotional regulation and how to deal with problems. So engagement cannot be over-engagement. Right now we are probably dealing a little bit more of under-engagement because it is not grounding, asking enough questions to understand. But it is also possible, if you are fixing under-engagement, if it is not knowing when to stop and when to let the patient have autonomy over their growth process, it can go into over-engagement.

Speaker 2: 14:08

So these considerations have been very well-written in the recent paper from Elizabeth Stade. It's called the READY Framework. I've attached the link. You can look into that. So this is at the level where you can also think about it as three stages. You can do extrinsic evaluations, you can have these scaffolding within this evaluation framework and you can make the agents face patients in a randomized control trial, evaluate them. This is going to be the closest measure to how effective it is in the real world.

Speaker 2: 14:51

You could go a little. I mean before that I would suppose you would do something like making it face the experts and simulating this environment and experts assess it how well it is done before it starts facing the patients. It's still expensive because you're using experts' valuable time and feedback. It's still expensive because you're using experts' valuable time and feedback. Even before that. You can do something like you have these archived data and you simulate the situation within these models and compare how models respond versus how expert psychologists or good behaviors of therapists or good assessment models behave, and juxtapose and compare and contrast.

Speaker 2: 15:25

What is lacking I kind of mentioned this in the decreasing order of cost and also how it should be done is in the reversed order. There's a really cool paper again from Elizabeth Stade a year ago on the steps we can go through for evaluation and development as a proposal and they draw parallels to how we thought about self-driving L1 through L5. First you want to keep it at the level of taking notes. Then you could maybe, once it starts understanding things a little bit better, you start putting the model to interface with the patient before the interview takes place so that it does some pre-interview survey, and then you can interface it with the patient, with the psychologist in the steering, and then finally you kind of let the. Once it starts passing these different levels and stages, you can finally be more confident about letting these models interface with the patients with minimum amount of supervision from the psychologist, so I think that is also a really cool paper.

Speaker 3: 16:39

Well, thank you for that. One thing I really like the illustration really from this podcast as well is, like, the amount of depth you're thinking about, like the validation. Is it fit for purpose? And you know the famous quote from George Box of all models are wrong, but some are useful, right. So it's like you're very much looking at this though the juxtaposition of how the industry talks about Gen AI and LLMs versus like you're talking about the how do I specifically validate it for? How useful is it for this specific application? It's one model that's a tool in your toolbox. It's the most useful one for this application. That's very different and you're focusing very much on how do we validate it for this specific thing, versus saying it's great for everything and can replace all humans, like no, it's a great research tool for you to be solving this problem. So, just like how you're approaching, it is great and very much in juxtaposition, I think, to how Gen AI and LLMs are usually viewed.

Speaker 4: 17:31

And I think that kind of leads me to want to talk to a little bit about your recent research, adi, which I don't think is out in publication yet, but I think we can talk about at a high level, which that you can do pure LLM research right, that there is a tier that's even cheaper and easier to do, which is research where we discover and explore the kind of schema that these LLMs have about the human mind, where we can see how they think depression is an interconnected set of symptoms and behaviors and we don't I mean we would like to ground this against known psychological models which are validated. But we can do this type of pure llm research when we have a very strict and focused mindset grounded on principles. When we look at models in this very focused, principled way, we're able to do research using just LLMs, even in a pre-human scenario. So that's a very cheap and readily available research we could be doing. We're not doing a lot of.

Speaker 2: 18:33

Right, yeah, this is right. We are not doing a lot of that, and that was actually surprising to me, because so much of work that we have done over the last few decades in going evidence-based data-driven, we have so much information and methods as well in order to posit a structure, a latent structure, and there is so much of work that has been done with humans and with experts on this. There is a lot of work that gets at answering what is the schema of psychopathology, what is the schema of our mental map, which not only brings out data collection and experiments, but also brings out methods and findings. The most simplest way to go about it is actually simulate the model in a position where it is drawing inferences. It can be either based out of language or any rich behavioral signal that we as humans use to form assessments. Now, if this assessment is rich and it has some theoretical conceptualization of structure, let's take depression, for example. There are various models of depression. It's a very much simpler example to start with. So there is the PHQ-9 model of assessing depression, which has nine symptoms within it. Or you can take the GAD-7 for anxiety, but there are also other models of depression.

Speaker 2: 20:06

What you could be doing here is we don't stop at evaluating based on what the model says, as say, depression or not depression, we collect information or basis which surround that information. Typically in this case In this case it would be something like depression what are the symptoms? What leads into that decision? And we just don't go by what it says, but more so, taking the evidences into consideration and triangulating those signals based on how humans would have expressed it, how the experts would have judged it, and seeing how much of the internals of a decision are consistent within each other as well as between humans and the machine. This is a good way to actually triangulate these three signals what did the person say and what was their symptom severity? And now, on the other end, what did the expert view from what the person said? And what did the person say and what was their symptom severity? And now, on the other end, what did the expert view from what the person said and what did they think they were going through? And then there's on the same side, you have the model looking at the person, making the inference right.

Speaker 2: 21:16

You can do a three-way comparison of how a behavior in this case, language, in a specific case language is being related to these symptomatic markers or latent structures and being able to compare that tells us where does this model fit. Is it able to understand or map ecological behaviors, the behaviors that we go through to symptom characterization and how it occurs in us? One of the cool things that we found was turns out these models are very poor at detecting psychomotor retardation or agitation, essentially this kind of nervousness and jitteriness from language. And also it has been shown in research that just based on language markers it's actually harder to detect it. It's more behaviorally embedded. Maybe with speech markers we would be able to say right. So this was very validating.

Speaker 2: 22:15

And another thing was it also under-represents suicidality. If a person says something, suicidality is almost very latent. When you ask a person how depressed they are feeling or how are they feeling, right, they don't directly talk about it in most cases. So the model actually is oversensitive about inferring suicidality from language, which is a representative of suicidality. This could be coming from a number of things, right? Is it because it is oversensitive, or is it because it is miscalibrated? If it doesn't understand language representative of suicidality, then it is miscalibration. But if it is oversensitive, then it is possible that there are some neurons within which is indicative that it is suicidality, but it is oversensitive to talking about it.

Speaker 2: 22:58

This is something that was this line between being sensitive to talk about it. This is being miscalibrated was brought out well in one of these recent theory of mind papers. It was a cool finding that these models can't detect faux pas, right? They gave a very simple test where they give a story and ask if this particular character exhibited faux pas, most of the models said no. If this particular character exhibited four bar, most of the models said no. But then when they re-questioned it, asking is it more likely to be four bar, then it said yes, it is possible. Right, it is a case where it drives the difference between oversensitivity and miscalibration.

Speaker 4: 23:39

Yes, sir, and let's dig into that a little bit, right? So you know you mentioned theory of mind, so we should break down what that is a little bit for the audience and then talk about how that's very relevant to all of us as users of LLMs converse with each other. I know what you know about me and you know what I know about you, and I know what you know about the world because I know that you're a PhD researcher. You know that I have a PhD in a very similar lab, and so we're not going to talk to each other the same way that we're going to talk to someone that we meet at a bar, right? So we have this theory of mind, or this theory of understanding what you think about the world and you know what I think about the world.

Speaker 4: 24:25

And we've tried to evaluate LMs if they understand this kind of deep context, right, which would be very useful for things like faux pas, right? They have to understand that. Oh, I know the cultural expectations of the world, so if I betray them, then you know that that's weird. But if I don't know them, you would think, okay, well, this person doesn't know what's going on. So let's talk a little bit about like this idea of context right In psychology. We love context because there's so much more to a human than just like what they're literally telling you. There's so much deeper context to a person and how that plays into theory of mind.

Speaker 2: 24:57

That's right For yeah for so many things we are doing right now. That's right. For so many things we are doing right now, we are so context-rich that it is probably being swept under the rug without us. We are conscious about it, but we are not making everything. It almost reflects at some point it is very intuition-guided some of the context that we absorb In this particular setting who's to talk when? Based on these passes and these signals, if you think about it, these models if I were to take a guess they're not guided by time or amount of gap that you leave in a conversation. It can be tuned, but right now, language models do not experience, or I shouldn't say experience, language models are not made to learn the stateful nature of information, that there is statefulness to information or what we experience in rough forms as passage of time beings.

Speaker 2: 25:59

And at the same time, uh, we have made a lot of simplicity assumptions, while both training the model as well as treating the model or inferencing from this model right. And with these assumptions come a lot of problems or things that it is going to be limited by upon framing inferences. So one of the things that we think about is theory of mind. And why is it important? Well, in very impactful applications, when you think about about not just this conversation, even in mental health and therapy settings, the therapist trying to form a theory of mind instead of explicitly asking every single question, because, a the therapy session is time bound, you can't ask infinite number of questions. B they are not interested in exact thing that they want to know, but rather form an inference based on that right.

Speaker 2: 26:50

So this is where theory of mind comes into play, where, for an expert in those settings, they are trying to get a sense of why this person might be in a certain trajectory of behavior. The why is sort of important to be able to build a solution to address that. So with this in mind, we want these models to be able to exercise a sense of what the person on the other side is coming from. Why is it the case that they are observing a certain pattern of behavior? It kind of also ties a little bit neatly into abduction. If you think about it, that abduction is a process where you frame plausible hypothesis for an observation right. This is a place where a sense of theory of mind needs. Abduction is how I see it, and right now, with respect to research, mostly theory of mind gets at oh, here's a story, and they give the narrative perspective and they ask the model according to this particular character in this story what is the state of a certain object?

Speaker 4: 28:02

So I'll give the practical example of this, since this can be a little hard to visualize, but this is like a very classic test that we give to children, and we give this test to children to see if they understand theory of mind. So in this experiment we'll have a child watch a video and in the video we have Alice and Bob in the room. Bob leaves the room, so only Alice is in the room. Alice takes a ball that was on the floor and puts it inside of a box and then Alice leaves the room and then Bob returns and they ask the child where is Bob going to look for the ball? And children who have not yet developed this theory of mind will say oh well, they're going to look in the box, but that's not possible because Bob wouldn't know the ball was in the box.

Speaker 4: 28:46

Children have this misconception that everyone has the same amount of information about everything at all times and they don't understand that different people or agents have different understandings of the world that's around them. And so if we want LLMs to have this theory of mind, we want to evaluate their ability to do this kind of task and we try these kinds of tasks and they're pretty good at these tasks at very simple levels. They can handle small amounts of theory of mind and they can handle maybe three or four levels of hiding balls in boxes. But how might this fall apart in the real world with rich context? How do we differentiate this Sally Ann type test of theory of mind versus real theory of mind?

Speaker 2: 29:31

I think that is a challenge here, to even identify where or how much we can quantify that real theory of mind taking place in a very real setting. To me it's still a challenge because there could be an explicit use of theory of mind. Where do you think the ball was? A very explicit use. If that is the goal of Bob, right. But if that is not the goal of Bob, bob's goal is to say you know, shine the ball right. It involves knowing where the ball is. Maybe that's not a great example, but something that you use in terms of information in order to form a conclusion and move forward.

Speaker 2: 30:18

And these are cases which are kind of very hard. And we kind of start with more better experiments than just you know, giving it stories and asking it if it can track the state of the people, we should move from it holding a third person perspective to a second person perspective, right? What would theory of mind experiments be like in a setting where it is directly interfacing with the person who it has to understand and has to make decision and has to, you know, respond to in a way that the user is satisfied at the end of the day? Sometimes it is as simple as asking them. It could be very basic information.

Speaker 2: 31:02

Sometimes it's easier to ask people those information. Sometimes it's easier to ask people those information. Sometimes it might not be possible. In a therapy setting there is the trust inertia. You can't ask a lot of questions in the first session, right? Especially if they are more deeper and kind of hard. Or you can ask them but you might not be able to elicit the most authentic response from them. This is a place where you exercise some amount of theory of mind, but I still think it's an open research question to be able to frame these experiments and to exactly measure or frame methods to measure theory of mind within models and more real settings?

Speaker 4: 31:52

Yeah, absolutely. And I think, like you know, this extends all the way to, like you know from our work in psychology research, to just like fundamentally does that theory of mind, to even stuff like business use cases where you know you'll give an LM your entire code base and all of your JIRA documentation and Confluence pages and you say, okay, do you understand my business now? And of course it doesn't. It doesn't understand directives, it doesn't understand goals, it doesn't understand deeper meanings of, like fields and database tables.

Speaker 4: 32:18

So these types of rich contexts are really elusive and I think they're difficult to scope out on large scales and they're even difficult for us to scope out on scales like building a chatbot for a therapist because, like what you're saying, like you know, do they understand the trusting model between yourself and this agent? Does it understand, uh, what this person even understands about themselves? Does this person understand that this type of sadness is not standard, not normal, or it's crippling? These types of models of deep and rich understanding through context are very difficult for LLMs. Not impossible and maybe not intractable, but it's posed a real challenge beyond just simple QA systems you know simple QA systems.

Speaker 1: 33:08

So you know, sid and Adi, with everything that you just said, what do we say to the people who, like, really operate on a good enough basis when they're either trying to use a chatbot for self-help or in any of these other situations like you were talking about, with other business use cases?

Speaker 2: 33:25

Yeah, so understanding the limits of these models what are they good at, what are they doing? What is going to be the effectiveness in treating the problem it definitely helps. Being aware of the limits of the model is very helpful Because if we know that the model is going to be, in this case, okay, let's say, if the model is not going to spend time learning about me before trying to address my problem, it is somewhat fixable through other measures like prompting, like engage with me for a while before we get into one problem right. In many cases we would like to chat with the person to open the third or the fourth wall. Chatting just helps opening up. Opening up just helps seeing the problem a little bit more clearly.

Speaker 2: 34:22

But some of these problems are addressable by having some amount of awareness of where these models are failing, some amount of awareness of where these models are failing. Clearly, having some help or doing these models doing more good than harm is good, given the current situation of the rising trends in anxiety, the growing uncertainty with just our day-to-day and the overall trend of depression and stress has been increasing over years, and having these kinds of support and aid is definitely helpful, but knowing the limits of it is what is going to make the person who's using it know when to stop and when to take it more seriously, having a sort of a theory of mind of when that information is useful, when the model is serving something.

Speaker 4: 35:19

And I think that this good enough mindset has a very interesting play with the researchers, because the goal of the researcher is basically to push at the very bleeding edge and say like is this possible? At the very bleeding edge, to say like, is this possible? And then the good enough almost belongs to the world of the, the deployer, the developer, who looks at the situation and says is this problem unsolvable without an LLM? Is it impossible to read 1 billion tweets and generate depression scores for a community? Is it impossible to serve every single citizen in America with a therapist on demand? And then we can get into these discussions of like are we generating more net good than net bad? And you know, in conversations with therapists, they have obviously very mixed feelings about using LLMs for these things. They have a lot of problems and we've identified a lot of these problems.

Speaker 4: 36:13

But if we're under the impression that you know, the mental health system is broken and people don't have access to any resources, is it right then to say that you should not have access to even this one resource?

Speaker 3: 36:25

Well, I think you just to even generalize that I think that's really helpful of like LLMs can be helpful if you understand where, in their context, like in this case, you know you're not replacing a professional. However, it might definitely be better than nothing, Right? So it's just as long as you know it in that context.

Speaker 4: 36:46

I think this has been a really interesting and valuable discussion on. You know a little bit of what it looks like to do LM research on the other side, right? Often people are working with LMs to solve some specific business use case, but we're talking about, like, very specific use cases with very specific evaluation frameworks, with goals that are, you know, very lofty and unattempted thus far. You know very lofty and unattempted thus far, I guess, if I had to ask you know both Andrew and Adi, you know what are some of your overall thoughts about how LLMs slot into the world and you know what is a good use of them, what are the limitations of them and how do you feel about their future in the space.

Speaker 3: 37:27

You know. That's why I'm really excited that, adi, you took the time to chat with us today. I really think that it is one valuable tool in the toolbox. It's just right-sizing where it fits, because it is definitely a valuable tool and I think your research and CIS research is really showing there are some very good uses for this, as well as analyzing large sets of data and things like that. They're language-based, I think there's it's it's a net positive when used properly.

Speaker 3: 37:55

But what I really like about both of your research is you're very much you understand the limitations and you're using it to try and solve a problem, but knowing that it's not thinking, it's not reasoning, it's not replacing you human. However, it can be augmenting and or filling a gap that's not currently filled. So I really I like that, like I hope the systems continue to get better. I think there's some very material gaps now and a lot of people are trying to use them irresponsibly. However, I very much love the research you guys are doing, like there actually are some valuable uses if they're used responsibly and it is a sensitive area. So just making sure that you're really doing those guardrails and knowing how the system is performing and making sure it stays as a net positive.

Speaker 2: 38:32

Right? Yeah, I share the same sentiment. It's a useful piece of technology and very promising as well. In fact, I hold the view that it is promising to solve some of the most challenging tasks in our personal day-to-day. Not even it has to be business in a personal day-to-day, but, just like any other technology, we thought about the same about phones or computers. These are going to be disruptive, but it came with its own problems. In fact, we suited ourselves.

Speaker 2: 39:05

We do change a lot. We do have an understanding of what is the. We found an understanding of what was the capability of a phone. It was not a supercomputer. It could not solve all the tasks, but it is useful in many other aspects.

Speaker 2: 39:20

Phones was also thought about as being able to link people, but it also came with problems like spams and predatory behaviors, but we were able to address those. We were able to come up with ways to address those challenges and, like any of those technologies that we have lived through, in terms of evolutionary period, we are going through a similar sort of situation, except that there's a lot of hype. We have going on calling that it is already capable of doing many things, whereas it is not really the case. I am optimistic about its abilities being improved by us over the period of time in the future, but I would not go as far in saying that all the problems would be solved by this technology. We will adapt ourselves, we will build a sense, we will be able to calibrate our trust on this technology over this period. I think there will be a lot of interesting and difficult problems that we have faced that this could be a part of the solution, that this could be a part of the solution.

Speaker 4: 40:31

Well, thank you for coming on, Adi. We really enjoyed your time. It was really great to talk to you again. Also great to just talk about research and what we do, how it's important and how we were using language models for good, which can feel like it's a very distant dream, but I think that what we were doing in the lab was really important and really potentially helping people.

Speaker 2: 40:57

Thanks a lot for having me here. I really enjoyed talking these things through with you, and I always I mean I'm going to miss being around Sid in the lab. It's always a very enriching conversation to have with him about what aspects of research the nuances that get revealed in research gets completely overshadowed by the hype outside, and oftentimes the nuances and the details are what helps us reconcile. What can we do to make things better? So I will definitely miss doing that in the lab, but I'm sure we'll continue talking about these things. But yeah, thanks a lot. It was my pleasure to be here and talk about things that I'm really passionate about work every day and I also learned quite a lot talking with you guys about this. I hope to see you guys once again in the future.

Speaker 1: 42:03

And you can all keep up to date with the latest research from Adi at his Google Scholar profile, which will be linked in the show notes. Thank you, everybody for joining. Until next time.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Shifting Privacy Left Podcast Artwork

The AI Fundamentalists