Back to resources

GPT-5 Review: Everything You Need To Know | In The Loop Episode 27

GPT-5 Review: Everything You Need To Know | In The Loop Episode 27

Published by

Jack Houghton
Anna Kocsis

Published on

August 13, 2025
August 14, 2025

Read time

9
min read

Category

Podcast
Table of contents

After what feels like years of waiting, GPT-5 is finally here. About a week on, opinion is still divided. Many are calling it revolutionary, while just as many find it disappointing.

At the launch event, OpenAI showed bar charts that were clearly misleading, and prediction markets started betting against them within minutes.

If you’re sitting there thinking, “I have no idea what’s going on,” that’s probably the most reasonable response you could have.

In today’s episode, you’ll learn everything worth knowing about this new model release — what’s new, what’s improved, and what was simply great marketing. We’ve spent hours gathering research and reviews so you don’t have to.

This is In The Loop with Jack Houghton. I hope you enjoy the show.

Before we get into the episode, it’s important to set the scene and give a bit of context for OpenAI’s GPT-5 release.

For the past year, we’ve been living in what’s basically model chaos. It’s felt like OpenAI has been throwing spaghetti at the wall when it comes to language models. We’ve had GPT-4, 4.5, the O Series with various specialized models—and much of it seemed aimed at justifying a $200 per month pro membership.

Image source: Facebook

People were often confused about which model to use for a specific task, to the point where OpenAI had to release white papers just to explain it. Meanwhile, Anthropic has been steadily eating into their enterprise market. In coding, especially, Claude has become the go-to choice for developers worldwide.

The general pattern is clear: most people use ChatGPT for general purposes, but if you’re doing serious coding, you turn to Claude.

This context frames the GPT-5 release perfectly. It wasn’t just about delivering a better model—it was about cleaning up the chaotic mess of overlapping models and trying to win back the market share they’ve been losing over the past six to twelve months.

With that in mind, let’s get into it.

Stay In The Loop

Feature highlights: How is GPT-5 better?

Let’s take a look at the most important differences compared to previous models and the features that stand out.

Hybrid model

One of the first things you’ll notice with GPT-5 is that it’s a hybrid model. It has both “thinking” and “non-thinking” modes built into the same system.

Previously, unless you specifically selected the O Series or another reasoning model, you wouldn’t get any reasoning in your responses. Free-tier users only had a limited number of reasoning queries, and switching between models was clunky.

With GPT-5, those capabilities are built in, and OpenAI initially deprecated several older models—removing GPT-4.1 and the entire O Series. That sparked such a strong backlash that Sam Altman posted a lengthy thread addressing the decision, and OpenAI brought many of those models back for $200/month subscribers. The emotional attachment to certain models, and the reaction to losing them, could almost be an episode of its own.

Thinking mode short circuit

Another welcome feature is the ability to “short-circuit” thinking mode. If the model goes into an unnecessary reasoning loop for a simple question, you can now stop it and get an immediate answer.

Free user access

GPT-5 is available to everyone, including free users. OpenAI seems to have learned from the DeepSeek saga last year that the best models should be widely accessible. There are three versions—Standard, Mini, and Nano—and if you hit usage limits on one, it will automatically switch you to another.

Big context window

The context window is now 400,000 tokens—about 300,000 words. That’s 600–800 pages of text the model can hold in short-term memory during a conversation.

For comparison, the previous limit was 128,000 tokens (about 96,000 words). This leap makes it far easier to work with large documents without hitting frustrating context limits.

If you listened to the episode on Context Engineering, you know how important this is.

Coding, design & front-end creation

A major upgrade is GPT-5’s coding and front-end creation capabilities, clearly aimed at taking market share from Anthropic’s Claude. The model now has a strong grasp of spacing, typography, and white space in design.

It can generate complete user interfaces via code with ease. In OpenAI’s press release—worth reading—they share prompts that can build impressive applications. This is why Sam Altman recently called this the “fast fashion era” of technology.

Affordable on the face of it

Pricing is aggressive. You now pay around $10 per million output tokens—compared to $75 for Claude’s Opus 1. That’s 10 times cheaper, making it hard to resist, especially for companies.

API usage (where businesses integrate GPT-5 into their software and workflows) reportedly doubled in the first 48 hours after launch and has continued climbing. This is clearly a strategic move to regain share in the coding market, where Anthropic has grown rapidly.

Unified system & router

Finally, OpenAI has introduced what they call a unified system, essentially a router that decides which model to use for a given task. It considers the task’s complexity, whether a tool like web search is needed, and the type of question being asked.

From the launch video alone—without any background—you might think OpenAI is primarily a coding company. As we move into what the independent benchmarks say, it’s worth keeping that in mind.

GPT-5’s performance according to independent benchmarks

I’m not a big fan of benchmarks. To me, it’s a game everyone plays, and on launch day, they all try to make themselves look as good as possible. Still, they do give us an initial gauge of whether there’s been a meaningful leap.

Artificial Analysis—often considered the gold standard for independent AI benchmarks—found that when ChatGPT used reasoning, it was the best model available. However, without reasoning, GPT-5 performed closer to GPT-4.1.

This matters because it shows that OpenAI isn’t really selling a single model—they’re selling a new packaging and pricing system. The “minimal” version most people will end up using is essentially an older model, while the high-reasoning version (the one that earns all the flashy headlines for topping benchmarks) will be used far less often.

Through clever pricing and packaging, OpenAI is boosting the perceived value across all tiers. For competition like Anthropic, which positions Claude as consistently high-quality at a premium price, OpenAI can now undercut them by using its router system to automatically direct users to different models.

OpenAI has also announced they’ll soon let you see which model is being used more often—a big win in my view. I’ve often wondered if I’m getting a lower-tier model while still paying for premium access.

On coding benchmarks, Vellum’s analysis found Grok-4, GPT-5, and Claude Opus all performing at a similar level on the SW Bench, though ChatGPT came first in many other coding tests, such as Design Arena.

Still, Grok-4—Elon Musk’s language model—takes the top score on some of the hardest assessments. On the ARC-AGI 2 test, which measures abstract, human-like reasoning on problems the model has never seen before, Grok-4 scored 15.9%, while GPT-5 lagged far behind at 9.9%. That’s a big gap.

In short: GPT-5 is faster, better at coding, and cheaper—but in many reasoning benchmarks, it’s not the best performer.

The market’s reaction to GPT-5

Benchmarks aside, language models are just as much about the vibes. And GPT-5 has generated plenty of conversation—and division—around those vibes.

On Polymarket, a decentralized prediction market, odds before the demo gave OpenAI an 80% chance of having the leading AI model by August 2026, with Google at 20%. But by the time Sam Altman and the OpenAI team walked off stage, those numbers had flipped—Google surged to 85%, and OpenAI dropped to 14%. That’s a huge swing in confidence, and it reflects the split opinions on GPT-5’s release.

During the launch, OpenAI also committed what can only be described as ‘chart crime’. They showed bar charts with mismatched proportions—bars with 69% and 30% results were shown at identical heights, while GPT-5’s 52.8% result looked much bigger. Sam Altman later called it a “mega chart screw-up,” though many joked they must have had GPT generate the visuals.

Image source: The Verge

Reviews from AI thought leaders

Professor Ethan Mollick predicted we’d see very mixed reviews because GPT-5 uses multiple models in its responses—some excellent, some just average. Without transparency on which model you’re getting, the experience can feel inconsistent. That’s been my experience too.

OpenAI also had to push a major patch shortly after release when demand spiked. The automatic model switcher failed, sending everyone to the lowest-quality model.

Matt Shumer summed up the broader reaction well in his blog: when he first used GPT-5, he wasn’t blown away—he felt let down by the hype. His point was that if you use GPT-5 for tasks that older O Series models or Claude Sonnet can already do well, you won’t see much difference. To truly see what it can do, you have to push it into areas AI still struggles with—like coding.

And that’s where OpenAI has clearly focused its effort. The vibe coding revolution is here, and it’s about creating things all the time.

Matthew Berman, who has 480,000 YouTube subscribers and focuses on AI coding, got early access to GPT-5. He built several applications to test its vibe-coding abilities:

  • Jumping ball runner game – a simple but functional game
  • Pixel art tool – a drawing app with color changes (a bit laggy but worked)
  • Typing speed game – measured accuracy and speed, complete with feedback and scoring
  • Drum simulator – an interactive music app for creating beats
  • Music visualizer for lo-fi – generated visual effects in sync with music

The big shift here is you no longer have to jump between multiple tools—GPT-5 can build and run these applications from a single prompt. That’s the essence of vibe coding: every app is now a vibe coding app.

One of the most thoughtful reviews came from Latent Space, who described GPT-5 as “the Stone Age.” This wasn’t criticism—it was a reflection on where we are in human civilization.

They see GPT-5 as one of the first models that excels at using tools during conversations, much like humans did in the actual Stone Age. As they put it, the Stone Age marked the dawn of human intelligence because we shaped tools—and our tools shaped us.

GPT-5’s impact on Anthropic

The threat to Anthropic is significant. Reports suggest that Cursor and GitHub Copilot together drive roughly $1.2 billion of Anthropic’s $4 billion revenue—around 30%—just from these two applications using Claude every day.

With GPT-5’s new model offering exceptional coding abilities at a cost seven times lower than Claude, it’s hard for enterprise customers to say no. Many coding applications—like Replit, Cursor, and others—have already made GPT-5 their default model.

Part of this shift is because Anthropic released Claude Code, a direct competitor to some of the companies using their model. As a result, several have moved straight to OpenAI. This doesn’t mean developers will abandon Claude entirely, but it’s a major risk to Anthropic’s market share.

Stay In The Loop

Closing thoughts

GPT-5 represents both impressive technical achievements and, perhaps more importantly, an aggressive competitive strategy against Anthropic.

For coding, it’s made a major leap forward. But for other applications, the improvements feel incremental—and, in some cases, underwhelming. This release was as much a business model change as it was a technical upgrade. OpenAI has built a system that delivers cheaper, minimal-reasoning models to most users, while still claiming the crown for “best model” with its high-reasoning tier.

And it’s working. Cursor switched overnight, prediction markets lost confidence in OpenAI, and Anthropic is watching as 30% of its revenue becomes vulnerable to pricing pressure.

The “chart crimes,” mixed reviews, and missed expectations might be symptoms of today’s AI market—where grand, revolutionary narratives don’t land quite like they used to.

The real question is whether GPT-5 can consistently write better code, create better content, and solve real problems for everyday people.

That’s it for today. I hope you enjoyed this deep dive into GPT-5. Thanks for listening, and I’ll see you next week.

Become an AI expert

Subscribe to our newsletter for the latest AI news.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Table of contents
Share this:
Related

Articles

Stay tuned for the latest AI thought leadership.

View all

Book a demo today.

Book a demo