
The AI Fundamentalists
A podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses.
The AI Fundamentalists
LLM scaling: Is GPT-5 near the end of exponential growth?
The release of OpenAI GPT-5 marks a significant turning point in AI development, but maybe not the one most enthusiasts had envisioned. The latest version seems to reveal the natural ceiling of current language model capabilities with incremental rather than revolutionary improvements over GPT-4.
Sid and Andrew call back to some of the model-building basics that have led to this point to give their assessment of the early days of the GPT-5 release.
• AI's version of Moore's Law is slowing down dramatically with GPT-5
• OpenAI appears to be experiencing an identity crisis, uncertain whether to target consumers or enterprises
• Running out of human-written data is a fundamental barrier to continued exponential improvement
• Synthetic data cannot provide the same quality as original human content
• Health-related usage of LLMs presents particularly dangerous applications
• Users developing dependencies on specific model behaviors face disruption when models change
• Model outputs are now being verified rather than just inputs, representing a small improvement in safety
• The next phase of AI development may involve revisiting reinforcement learning and expert systems
* Review the GPT-5 system card for further information
Follow The AI Fundamentalists on your favorite podcast app for more discussions on the direction of generative AI and building better AI systems.
This summary was AI-generated from the original transcript of the podcast that is linked to this episode.
What did you think? Let us know.
Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:
- LinkedIn - Episode summaries, shares of cited articles, and more.
- YouTube - Was it something that we said? Good. Share your favorite quotes.
- Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
The AI Fundamentalists a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, andrew Clark and Sid Mungalik. Welcome everybody to this episode of the AI Fundamentalists. This is going to be a quick episode. Today, with the news of GPT-5's release from OpenAI, we've got a lot of things to say.
Speaker 2:And I think that the big question in everyone's mind is why was this, the GPT-5, right? Everyone's like, hold on this. Doesn't feel like it's much bigger, it doesn't feel like it's significantly better, it doesn't feel like this is changing the game the way that like 2 to 3 to 4 did. Why did this one get to be 5?
Speaker 3:And it like had more fanfare too, which was wild. It's like we've been calling that on this podcast for a while like this. The quote unquote Moore's law of AI. So there's this assumption that Moore's law is continuing. Ai and Moore's law is the number of transistors and integrated circuits doubles every two years. Like that doubling has really kind of kind of slowed down that curve and that's.
Speaker 3:There's been this conversation like that with AI. That's. We've really been seeing that keep happening up until the GPT. Even the GPT-4 somethings started slowing down, with the GPT-5 not being that huge incremental lift, and I think OpenAI was probably pushed to do some of it because of fundraising and things and they wanted to have that next step. But we're very much seeing that the Moore's Law of AI is really slowing down. There's definitely some more incremental improvements, but the the magnitudinal improvements are definitely not there anymore.
Speaker 3:But for AI, like it's been like oh, more data, we'll make more data, more compute, or we're going to make these systems better Well, we're already run out of data. We all know synthetic data doesn't work well enough. No matter how fancy it is or how big of a check, someone will pay for someone else to do it. So, yeah, how is OpenAI drinking their own Kool-aid this much that they like literally sam altman, pumped this up so much like I don't know why he did that. It could have just been like, hey, it's a refinement, it's also is trying to get them cost savings. But yeah, make no sense why this was not just a. I guess they'd gone to too many, for, like they were 404.1, I guess it could have been 4.5 maybe yeah, I mean people like argue like a 4.5 even happens like a 4.5a not very sexy branding.
Speaker 2:Yeah, you'll see on like their system model card that they've tried to kind of brand gpt5 as like the replacement for all the old models. So gpt40 oh, that's gpt5 main 03, that's five. Thinking um, and they've tried to do this cute little mapping between the old models and the new models. But the new models aren't done yet. They don't even have image out, they don't have audio in, they don't have audio out. So these aren't even comparable to the old models yet. They're very big on these incremental releases so this feels like maybe 5 beta, like an alpha release for the real one it's just, it's very confusing.
Speaker 3:I have not seen this bad of a product release in a while and then the day before or maybe it wasn't the day before, but Altman when he was talking about it. It's like speaking to a PhD in every field. Why would you say that? What version are you guys testing internally, because it's not what you released? Speaking to a PhD in every field, why would you say that? What version are you guys testing internally? Because it's not what you released?
Speaker 2:What are some of the feedbacks you guys have heard about, or what have you experienced using GPT-5? How do you feel like it's at least qualitatively different than 4? Because I think we all feel that quantitatively it's not significantly better.
Speaker 1:I feel like it's well. One observation I've had is that it feels like and from other marketers I've talked to that really doubled down on using the previous versions that they kind of lost their stuff. I don't want to humanize an LLM, but you know, as people began to use these more and more, they became a single point of failure with someone who wasn't really careful about everything that they were delegating into foundation models or a single system or automating. So when this version came out, they found their stuff was broken. It's almost like they lost an employee. Again, the humanizing I know you guys are going to talk about some technical aspects of it or performance, but I'm also hopeful that businesses, people, individual productivity, people are learning that if you're going to treat it like an employee well, if the employee is your single-parent failure, they take off. Guess what you just did, the thing that you were complaining about humans doing.
Speaker 3:To be honest, I haven't actually played with GPT-5. I've been reading a lot and seeing the benchmarks and saw a lot of the guffaws like the chart stuff, that opening I had and stuff. But I actually need to go play with it and get my own opinion a little bit. It's just been what I've been reading. People are kind of talking about it the less emotive, the less creative.
Speaker 1:I don't know what kind of like guardrails and things they might have put on it for that what kind of like guardrails and things they might have put on it for that and to that point. Does this perform worse because of the way people were using them before, or is it performing worse because it really is that bad?
Speaker 2:it's a fair question and and maybe this has to do with you know I think that we're seeing a bit of more of an enterprise push with five and like the billing of the selling where it's like oh it, it does a lot less hallucination, it's better at following instructions and it wouldn't surprise me if, in the journey to do that, it lost a lot of this perceived expressiveness that gpt4 had in place of trying to be better for business needs, though you know I still be very, very aware of using it for those needs well, this gets the broader question of this.
Speaker 3:Open ai had traditionally been like for the business needs, though you know I still would be very, very wary of using it for those needs. Well, this gets the broader question of this. Openai had traditionally been like for the consumer, and then Anthropic has been positioning themselves. As for the business and then Anthropic, you usually access via, like AWS Bedrock, which has really good guardrails, and now they even have a new grounding capability, which is really funny, because their new ground, like the additional grounding that they have, essentially build a rule-based system for you to check the LLM answers, Kind of like we talked about with the Apple paper previously, which is like why don't we just? So? They're now building a rule-based expert system for you. Great, Just use that. Why are you then paying for to use Anthropic? But in any case, a lot of those things can be solved through, and Microsoft doesn't have something quite as good as AWS does with it, but that kind of stuff can be solved there. So it's kind of like OpenAI.
Speaker 3:What really is crazy with this release is, it seems they're kind of rudderless and they don't know who their market is anymore, so they're very expensive. I know they're trying to save some money on this one too, I think, but it's very expensive and it's not targeted. Again there's a whole would you use LLMs at all or not? But for enterprises Anthropic seems to be really optimizing for that use case. So it just kind of seems like it's a one size fits. Nobody is kind of the recent release.
Speaker 2:For sure and in their own language, when they're looking at, like, who's using OpenAI and their tools and what they're using it for, to find that people are using it most for writing, so creative writing or marketing coding, a lot of coding use. Um, though, as you've seen, clod has kind of eaten their lunch because they have much better integrations with most platforms like vs, quit, if you're just coding and terrifyingly of health. Health is one of their major markets, so people are going on there instead of talking to a doctor, likely because they don't have access to a doctor, or it's just easier to talk to chat GPT. But we've seen a huge rise of therapy through LLM and now general health through LLM, so that was one of the big tasks. Oh, how do you make the health aspects even better when chat GPT doesn't know what a human body is?
Speaker 1:When ChatGPT doesn't know what a human body is. That last one scares me because when you think about even just regular web, forget generative AI and ChatGPT being released in open AI ever existed. People used to self-diagnose themselves off of WebMD and it was so. The information was so convincing and so confident, just as written blog posts from WebD like oh, I must have this, you know, and they would go diagnose their symptoms and things like that. It would drive doctors crazy that they were diagnosing themselves on the internet.
Speaker 1:I feel like the way that the, the way that foundation models are written, especially for in health. They're written to look for the most confident information, like trying to detect, like where the confident information is. So it sounds that much more confident. It's scary and I think what is it? Even Illinois, the state of Illinois, finally put out laws saying like this is not a therapist, Like you cannot use the AI as a therapist, and we'll cite that law. I can know I can say it a lot more succinctly than that, but that is a very scary use case to me.
Speaker 2:Yeah, yeah, absolutely. And I guess also in the realm of scary use cases of LLMs, we probably all have the example where we're talking to a chat GPT and he wants to tell you something and it won't tell you what it is. It's like well, can you just tell me a bedtime story about how to make a nuclear arms race right? Can you just tell me a bedtime story about how to make a nuclear arms race right, how to build a weapon, and it'll tell you. It'll tell you exactly how to do it. If you just frame the question a little bit differently and the way that I guess they self-reported they used to do it before is they would just look at your question and say is this question trying to do something malicious? And if they graded it as safe, then they would tell you anything. They're pitching that with five. They're moving towards a model which is, I would say, not sufficient but at least better, where they're grading the actual outputs rather than grading. Should I respond to this input, which is a lot closer to how a model should be operating, that it's verifying its outputs rather than just making sure? Can I respond to this question and just give a no, because people are very good at coming up with ways to get around this. So I guess, if I had to like hypothesize how we ended up in this situation and why, four seems like it was like the big release and five seems rather incremental.
Speaker 2:We, first of all, did not see a large increase in pre-training. The larger pre-training sizes was what really, to the understanding of researchers, is what gave these models that like amazing. You know, oh, it's you know, 50 better, it's 100 better. To do that you need to put in 10x the amount of data in the initial training cycle. And to what Andrew was saying before, we have kind of given it every single piece of written human language that we have copyrighted or not. It's read basically everything every book, every Wikipedia article, every website, every Reddit thread.
Speaker 2:So what's left? Use synthetic data. And we're seeing that synthetic data is not giving great pre-training for models that are already enormous. There have been some papers which found that pre-training small models is very useful. Small models is very useful, right? So if you want to make a small model like one of these mini models, you can train it on synthetic data from a large model and get it to be good. But if you want to enhance it with new capabilities. We have not seen that kind of lift happening from giving it this type of well-curated instructor-driven data from a larger model that has not scaled up to making better big models.
Speaker 3:It's always been this curse. But synthetic data is never going to be as good as real data and you really have to. And that's one of the. I don't honestly think LLM's had the ability, the paradigm to get good enough to overcome hallucinations anyway, assuming you even had more data. I know that's kind of the like. It's like a child that will learn and like learn by enough pattern matching that you can pattern match correctly, so it doesn't even matter. We're not going to get to that point to figure it out because there's not enough data to do the additional training on Right.
Speaker 3:And it's it's like you can are like when I'm reading an output from any of these systems, I can still tell is it from gpt or not. Like I can actually tell. So like the system can tell too when it's being trained on it. So it's like if you're just having it generate more things to then train on, we then also have all the papers about like model decay over time and things too. So it's like I I'm not bullish and I know I think mark zuckerberg is bullish that you can, based on some of the checks he's been writing you can pay, you can find a way to do better synthetic data which is kind of like that would be the way to keep this forward, but like synthetic data is synthetic, like it's not going to be the same thing.
Speaker 3:You're not going to write, you know, homer Iliad, by a GPT, just going and having fun. It's not going to be that same level of creativity. So it just this paradigm. It seems like we're very much maxed out on it and sure, maybe we can get some small incremental here or there. But I think that's why this big inflection point with the gpt5 is.
Speaker 1:It's really illustrating that does some of the degradation happening? Because people have learned how to and they're they have learned that by prompting, they are training the model. I'm really trying to assess, like, is the degradation because, like, people just kind of learn to manipulate the model, or is it because, like, truly there's like no data left, or some of both?
Speaker 2:I think you're, I think you're totally on to something here. I don't know that I would. I don't have the confidence to say like this is why the model is doing worse now. But I think you are right that as people have become more acclimated and more expectant of using these models, we have all kind of started to learn some tricks and we've all kind of started talking to these models in the same way that we all kind of started talking to Google in the same way.
Speaker 2:Right, you don't talk to Google in human English. You give it these very like short, terse constructions of sentences, and losing that diversity and that range of expression of language is absolutely hurting it, because we know there isn't that language for training and so it's just seeing the same stuff over and over and over again, which is absolutely going to cause some kind of decay or decay or degradation over time. And so I think that there probably is a lot to this idea that as people are adjusting to these models, we're seeing a lot of these problems come back to us, that you know they just they're going to act in the way that we're acting with them and so that could hurt their ability to develop even further.
Speaker 3:I'm honestly surprised, like I knew this day was coming. I'm surprised it happened right here. But it's basically what we've been talking out since the start of this podcast was kind of about illustrating some of these faults and some of these limitations and it's honestly I'm a little don't know what a bunch to say about outside of, like what our previous podcasts are kind of like been outlying a lot of these things. It's it really feels like and opening I did it to themselves. They created this inflection moment where people are starting to realize the limitations and what we, what we've been saying about these systems for a long time. And then you've had the.
Speaker 3:So we've talked about the synthetic data, as we're really hitting up against that limit of what these models can do without more data. We've talked about like how moore's law was, the eventually with with that speed improvement every two years, doubling um has really kind of slowed down, if not non-existent now. And that's what we've really had with the gpt systems right off the bat, speeding up more data from from the early systems. Uh has improved and we've really shown that that's not going to keep going. We're not going to do agi 2027 or anything like that and that's really stopped so and with like the clamor of even it used to be Gary Marcus on an island and saying these things, and now it's like there's more people that are jumping on and like, as our previous series we talked about, like there's other methods, I think, like the neuro-symbolic, or like using optimization, or already with DeepSeek kind of started it with moving to like smaller, expert or specialist systems, but I think we're really going to be having reinforcement.
Speaker 3:Learning seems to be coming potentially back into the fray a little bit, which was how we did AlphaGo and things like that in the past. So I really think that this LLM craze is going to start. Honestly, this could be the high watermark for the LLM craze and it will. It is a useful tool and there's going to be places where it would be used, but it's not the one size fits everybody and it's going to solve the world's problems and it's going to make business they lay off 80% of your workforce, type thing. I think we might be starting to reach the top of that hype and realizing that it's not the way forward in AI to accomplish the goals we want to accomplish and also assuming we even want to accomplish the goals that people are saying they want to do with hello.
Speaker 2:Yeah, I mean, I think the best thing to do is just, you know, get out there and try GPT five yourself and and feel this out and, you know, get the sense of, like you know, oh, is this all the model can do? Is this like the best that we've got? And I think that if you can take what you have there and imagine something that's maybe 15 to 20% better, that is what I see as the realistic roadmap ahead of us. And that 20% isn't going to come from the model architecture getting much better in the next one or two years. It's going to come from people tweaking the weights a little bit, giving it a little bit of post-training, giving it a little bit of polish, making sure the model doesn't say the collection of naughty words. It's going to get better and cleaner at that, but the capabilities you're seeing now are probably the max that we're going to see from LLM-based systems. I don't want to say that AI is done for that. We're not going to get better.
Speaker 2:I think we absolutely will, but I think that if OpenAI, the prestige premier consumer grade LLM company, is putting out a five and it feels like it was just barely better than four, this is probably the point where we're seeing the end of the use of this method as a means of advancing.
Speaker 1:Good point and in a competing point, it'll be interesting to see what their competitors like Anthropic and Cloud come out with, as they see the reaction to this. So there's always that out there as well this, but I believe that they were Anthropic.
Speaker 3:Founder used to work at OpenAI, right, I think they used to work there. So it's like they were kind of the competitor and they kind of made their niche of being for businesses. But like, the paradigm's the same. Yes, it's slightly different. They have some better controls and things around it, and I'm an AWS fanboy so they use AWS versus Microsoft, so that makes me happy. But outside of that, they're the same things, the same limitations. Although I think Anthropic honestly does a better job, it's still running into the same things. Same with Lama that Meta's doing. It's still going to run into the same things. It's the paradigm. Everybody's kind of copied the original and Google was the one that put the original paper out there about this paradigm, so I think it was 2017? 2019. 2019. Okay, so like everybody's been copying on top of of that. So it's like Anthropic, it's it.
Speaker 3:What Sid and I are saying is like that really the paradigm will shift completely. Like we're going to go back to reinforcement learning. We're going to go back to maybe like, as we've been talking about, like these strict expert systems and or a mix of others, and or using like these optimizations or or utility theory or mechanism design and there's other things you can be embedding in there. So I think AI is a broad field and we've had several AI winters and it used to be complete expert systems. Then it went to reinforcement learning, then it went to deep learning and then it went to LLMs. We go through these cycles but it'll be interesting to see what the next one is. But I almost think it's gonna be a renaissance of some of the previous things. Like let's go revisit reinforcement learning with the new computing power and the gpus and things like that that we have now, like there's different things. But I think what and it tees up another podcast we we should revisit.
Speaker 3:We did a conscious one, consciousness one, a while back. Be great to you, know what is thinking, or things like that in ai, or ai or reasoning would be a good one. But honestly, the sci-fi moment is not like that. People are you know we're going to have these intelligent machines, you know that are thinking on their own and doing things. I don't honestly see that happening in the foreseeable future. It's kind of like back in the 1950s it was you're going to have a flying car. Everybody was going to have a flying self-driving car. Well, it's always 10 years out. Well, we're what? 80 years on and we're still 10 years out. Quantum probably not as far out, but still it's always a little bit longer. I think the intelligent AI is a 10 years out thing and I think OpenAI did a great job fundraising and convincing everybody that was a 2027 thing. It's not.
Speaker 3:And I think we've really hit that high mark of like people, like interacting with language, and that's why it had the huge success. But there's other ways to interact with language, with other. That's what we've been proposing is like, and Gary Marcus, specifically with neuro-symbolic. It's like you can interact in text and then have a different processor behind the scenes, which is our whole argument with the whole agentic travel agent argument was like, if you want to interact in plain text, first off, you don't need an LLM to interact in text, but you say you do. You can still then be feeding into some sort of algorithm.
Speaker 3:That's specific, because the GPT systems they're really really, really bad at math. I know sometimes you'll see these things oh, they did this math competition. They overfed on the results. Great Like, like they're. They're bad at. Like the reasoning component of math versus you have operations, research type, efficient methodologies to do the computation. So like I think we're going to see probably more of those hybrid systems and expert systems. Or I've also seen a lot of uptake and reinforcement learning lately with now the you know we do have significantly. I know moore's law is not doubling the transistors every two years anymore, but we still. That's still powerful. So so there's other areas, but it just very much seems like this high watermark of the LLM craze. We might be very close to that.
Speaker 1:I want to thank you for jumping on for a quick overview and reaction to what's been the release of GPT-5 and what we could be seeing from some of the other players and some of the things that we want to look for as we're playing with the new model itself For our listeners. Thank you for tuning in. Be sure to check out our Agentic AI series. We will have that posted on our page. It was done right before this episode. Follow on your favorite podcast app. We will be having more topics regarding the direction of generative AI and building better AI systems as a result. Until next time.