GPT-5 Review: Everything You Need To Know | In The Loop Episode 27

Published by

Jack Houghton

Anna Kocsis

Published on

August 13, 2025

August 19, 2025

Read time

min read

Feature highlights: How is GPT-5 better?

Let’s take a look at the most important differences compared to previous models and the features that stand out.

‍

Hybrid model

One of the first things you’ll notice with GPT-5 is that it’s a hybrid model. It has both “thinking” and “non-thinking” modes built into the same system.

Previously, unless you specifically selected the O Series or another reasoning model, you wouldn’t get any reasoning in your responses. Free-tier users only had a limited number of reasoning queries, and switching between models was clunky.

With GPT-5, those capabilities are built in, and OpenAI initially deprecated several older models—removing GPT-4.1 and the entire O Series. That sparked such a strong backlash that Sam Altman posted a lengthy thread addressing the decision, and OpenAI brought many of those models back for $200/month subscribers. The emotional attachment to certain models, and the reaction to losing them, could almost be an episode of its own.

‍

Thinking mode short circuit

Another welcome feature is the ability to “short-circuit” thinking mode. If the model goes into an unnecessary reasoning loop for a simple question, you can now stop it and get an immediate answer.

‍

Free user access

GPT-5 is available to everyone, including free users. OpenAI seems to have learned from the DeepSeek saga last year that the best models should be widely accessible. There are three versions—Standard, Mini, and Nano—and if you hit usage limits on one, it will automatically switch you to another.

‍

Big context window

The context window is now 400,000 tokens—about 300,000 words. That’s 600–800 pages of text the model can hold in short-term memory during a conversation.

For comparison, the previous limit was 128,000 tokens (about 96,000 words). This leap makes it far easier to work with large documents without hitting frustrating context limits.

If you listened to the episode on Context Engineering, you know how important this is.

‍

Coding, design & front-end creation

A major upgrade is GPT-5’s coding and front-end creation capabilities, clearly aimed at taking market share from Anthropic’s Claude. The model now has a strong grasp of spacing, typography, and white space in design.

It can generate complete user interfaces via code with ease. In OpenAI’s press release—worth reading—they share prompts that can build impressive applications. This is why Sam Altman recently called this the “fast fashion era” of technology.

‍

Affordable on the face of it

Pricing is aggressive. You now pay around $10 per million output tokens—compared to $75 for Claude’s Opus 1. That’s 10 times cheaper, making it hard to resist, especially for companies.

API usage (where businesses integrate GPT-5 into their software and workflows) reportedly doubled in the first 48 hours after launch and has continued climbing. This is clearly a strategic move to regain share in the coding market, where Anthropic has grown rapidly.

‍

Unified system & router

Finally, OpenAI has introduced what they call a unified system, essentially a router that decides which model to use for a given task. It considers the task’s complexity, whether a tool like web search is needed, and the type of question being asked.

From the launch video alone—without any background—you might think OpenAI is primarily a coding company. As we move into what the independent benchmarks say, it’s worth keeping that in mind.

‍

GPT-5’s performance according to independent benchmarks

I’m not a big fan of benchmarks. To me, it’s a game everyone plays, and on launch day, they all try to make themselves look as good as possible. Still, they do give us an initial gauge of whether there’s been a meaningful leap.

Artificial Analysis—often considered the gold standard for independent AI benchmarks—found that when ChatGPT used reasoning, it was the best model available. However, without reasoning, GPT-5 performed closer to GPT-4.1.

This matters because it shows that OpenAI isn’t really selling a single model—they’re selling a new packaging and pricing system. The “minimal” version most people will end up using is essentially an older model, while the high-reasoning version (the one that earns all the flashy headlines for topping benchmarks) will be used far less often.

Through clever pricing and packaging, OpenAI is boosting the perceived value across all tiers. For competition like Anthropic, which positions Claude as consistently high-quality at a premium price, OpenAI can now undercut them by using its router system to automatically direct users to different models.

OpenAI has also announced they’ll soon let you see which model is being used more often—a big win in my view. I’ve often wondered if I’m getting a lower-tier model while still paying for premium access.

On coding benchmarks, Vellum’s analysis found Grok-4, GPT-5, and Claude Opus all performing at a similar level on the SW Bench, though ChatGPT came first in many other coding tests, such as Design Arena.

Still, Grok-4—Elon Musk’s language model—takes the top score on some of the hardest assessments. On the ARC-AGI 2 test, which measures abstract, human-like reasoning on problems the model has never seen before, Grok-4 scored 15.9%, while GPT-5 lagged far behind at 9.9%. That’s a big gap.

In short: GPT-5 is faster, better at coding, and cheaper—but in many reasoning benchmarks, it’s not the best performer.

‍

The market’s reaction to GPT-5

Benchmarks aside, language models are just as much about the vibes. And GPT-5 has generated plenty of conversation—and division—around those vibes.

On Polymarket, a decentralized prediction market, odds before the demo gave OpenAI an 80% chance of having the leading AI model by August 2026, with Google at 20%. But by the time Sam Altman and the OpenAI team walked off stage, those numbers had flipped—Google surged to 85%, and OpenAI dropped to 14%. That’s a huge swing in confidence, and it reflects the split opinions on GPT-5’s release.

During the launch, OpenAI also committed what can only be described as ‘chart crime’. They showed bar charts with mismatched proportions—bars with 69% and 30% results were shown at identical heights, while GPT-5’s 52.8% result looked much bigger. Sam Altman later called it a “mega chart screw-up,” though many joked they must have had GPT generate the visuals.

‍

Reviews from AI thought leaders

Professor Ethan Mollick predicted we’d see very mixed reviews because GPT-5 uses multiple models in its responses—some excellent, some just average. Without transparency on which model you’re getting, the experience can feel inconsistent. That’s been my experience too.

You are likely going to see a lot of very varied results posted online from GPT-5 because it is actually multiple models, some of which are very good and some of which are meh.

Since the underlying model selection isn’t transparent, expect confusion.
— Ethan Mollick (@emollick) August 7, 2025

OpenAI also had to push a major patch shortly after release when demand spiked. The automatic model switcher failed, sending everyone to the lowest-quality model.

Matt Shumer summed up the broader reaction well in his blog: when he first used GPT-5, he wasn’t blown away—he felt let down by the hype. His point was that if you use GPT-5 for tasks that older O Series models or Claude Sonnet can already do well, you won’t see much difference. To truly see what it can do, you have to push it into areas AI still struggles with—like coding.

And that’s where OpenAI has clearly focused its effort. The vibe coding revolution is here, and it’s about creating things all the time.

Matthew Berman, who has 480,000 YouTube subscribers and focuses on AI coding, got early access to GPT-5. He built several applications to test its vibe-coding abilities:

Jumping ball runner game – a simple but functional game
Pixel art tool – a drawing app with color changes (a bit laggy but worked)
Typing speed game – measured accuracy and speed, complete with feedback and scoring
Drum simulator – an interactive music app for creating beats
Music visualizer for lo-fi – generated visual effects in sync with music

The big shift here is you no longer have to jump between multiple tools—GPT-5 can build and run these applications from a single prompt. That’s the essence of vibe coding: every app is now a vibe coding app.

One of the most thoughtful reviews came from Latent Space, who described GPT-5 as “the Stone Age.” This wasn’t criticism—it was a reflection on where we are in human civilization.

They see GPT-5 as one of the first models that excels at using tools during conversations, much like humans did in the actual Stone Age. As they put it, the Stone Age marked the dawn of human intelligence because we shaped tools—and our tools shaped us.

‍

GPT-5’s impact on Anthropic

The threat to Anthropic is significant. Reports suggest that Cursor and GitHub Copilot together drive roughly $1.2 billion of Anthropic’s $4 billion revenue—around 30%—just from these two applications using Claude every day.

With GPT-5’s new model offering exceptional coding abilities at a cost seven times lower than Claude, it’s hard for enterprise customers to say no. Many coding applications—like Replit, Cursor, and others—have already made GPT-5 their default model.

Part of this shift is because Anthropic released Claude Code, a direct competitor to some of the companies using their model. As a result, several have moved straight to OpenAI. This doesn’t mean developers will abandon Claude entirely, but it’s a major risk to Anthropic’s market share.

‍

‍

Closing thoughts

GPT-5 represents both impressive technical achievements and, perhaps more importantly, an aggressive competitive strategy against Anthropic.

For coding, it’s made a major leap forward. But for other applications, the improvements feel incremental—and, in some cases, underwhelming. This release was as much a business model change as it was a technical upgrade. OpenAI has built a system that delivers cheaper, minimal-reasoning models to most users, while still claiming the crown for “best model” with its high-reasoning tier.

And it’s working. Cursor switched overnight, prediction markets lost confidence in OpenAI, and Anthropic is watching as 30% of its revenue becomes vulnerable to pricing pressure.

The “chart crimes,” mixed reviews, and missed expectations might be symptoms of today’s AI market—where grand, revolutionary narratives don’t land quite like they used to.

The real question is whether GPT-5 can consistently write better code, create better content, and solve real problems for everyday people.

That’s it for today. I hope you enjoyed this deep dive into GPT-5. Thanks for listening, and I’ll see you next week.

‍

Table of contents

Articles

Stay tuned for the latest AI thought leadership.

GPT-5 Review: Everything You Need To Know | In The Loop Episode 27

Published by

Published on

Read time

Category

Feature highlights: How is GPT-5 better?

Hybrid model

Thinking mode short circuit

Free user access

Big context window

Coding, design & front-end creation

Affordable on the face of it

Unified system & router

GPT-5’s performance according to independent benchmarks

The market’s reaction to GPT-5

Reviews from AI thought leaders

GPT-5’s impact on Anthropic

Closing thoughts

Become an AI expert

Articles

AI Agent Memory: Why Your AI Agents Keep Forgetting Everything (And How We Fixed It)

Meta Ray-Ban Display Smart Glasses: Yay Or Nay? | In The Loop Episode 32

What are AI Companions & Should They Be Legal? | In The Loop Episode 31

The Real Cost Of AGI—According To OpenAI | In The Loop Episode 30

Is The AI Bubble About To Burst? | In The Loop Episode 28

What’s Replacing SCORM—And Should SCORM Be Replaced Or “Just” Transformed?

Top Four AI Trends & Predictions Of Summer 2025 | In The Loop Episode 26

Why Is Corporate E-Learning So Bad & How To Fix It With AI? | In The Loop Episode 25

GPT-5 Review: Everything You Need To Know | In The Loop Episode 27

What’s The Future Of SCORM With AI?

Why do people use SCORM?

What Is Context Engineering And Why Should You Care? | In The Loop Episode 23

How Do I Integrate AI Into My Product—Ideally By Yesterday

What Jobs Will AI Create—And Do The Luddites Have A Point? | In The Loop Episode 22

New Release: Mindset AI SDK 2.4 - Fonts Customization

How Enterprise CIOs Build & Buy Gen AI In 2025 | In The Loop Episode 21

Three Reasons Why Apple Is Cooked | In The Loop Episode 20

New Release: Mindset AI SDK 2.2 Multi-Tenancy Agents & Session Control

New Release: Mindset AI SDK 2.1 Theme Customization

Mary Meeker AI Trends 2025: Three Reasons Why AI Is Different From Any Other Tech In History | In The Loop Episode 19

What Happens To Entry-Level Jobs In The AI Era? | In The Loop Episode 18

What Is The Difference Between A2A And MCP? [With Videos]

Mindset AI Appoints Pip White as Non-Executive Director

Google I/O & Microsoft Build In 10 Minutes: What We Learned From The Two Biggest AI Conferences | In The Loop Episode 17

The Top Five AI Features SaaS Companies Are Shipping In 2025 (And Why They Work) | In The Loop Episode 16

New Release: Mindset AI SDK 2.0

Google, OpenAI, Meta, Anthropic & The Three Battles To Own All AI | In The Loop Episode 15

Should Conversational AI Agents Get Priority On Your E-Learning Platform’s Roadmap?

In The Loop Episode 14 | The Real State Of AI Adoption In 2025: What's AI Actually Used For?

In The Loop Episode 13 | Cluely: The AI App That Made Cheating Viral—And Maybe Acceptable?

The New Playbook For Shipping AI Agents — Why Companies are Building on Mindset AI

In The Loop Episode 12 | Google Agent2Agent (A2A): The Future Of AI Agent Protocols Or A Flop?

How To Turn Your E-Learning Business Into An AI Coaching Solution

In The Loop Episode 11 | Shopify Memo: No Humans Hired Without AI Approval—Tobias Lütke's Vision

Mindset AI Raises £4.3 Million To Meet Growing Demand For Embedded AI Agents For SaaS Businesses

In The Loop Episode 10 | Does ChatGPT's Viral Image Generator & The Ghibli Craze Spell The End Of Art & Creativity?

How To Monetize Your AI Agents: A Product Leader's Guide To Revenue Generation In EdTech

In The Loop Episode 9 | Apple’s AI Crisis Exposed: Is It Having A Nokia Moment?

In The Loop Episode 8 | Model Context Protocol (MCP): The Newest AI Buzzword Explained

In The Loop Episode 7 | Vibe Coding: Will Developers Be Out Of A Job In Six Months? Dario Amodei’s Take

When To Use Agentic RAG—And What Is It Anyway?

In The Loop Episode 6 | Multi-Agent Systems: The Next Big Shift In AI—Yet People Have No Clue About Them

Agentic AI 101: Everything You Ever Wanted To Know About AI Agents But Never Dared Ask

In The Loop Episode 5 | The Rise Of Vertical AI Agents: Why SaaS Companies Should Be Worried

In The Loop Episode 4 | Why Microsoft's CEO Thinks Everyone's Wrong About AI Agents & AGI

AI Expert Interview: The Benefits And Drawbacks Of Agentic AI

In The Loop Episode 3 | The Real AI Challenge: Designing Human-Agent Interfaces That Work

AI Agents vs. Everything AI: All The Definitions You'll Ever Need

In The Loop Episode 2 | The Future of AI Agents: What’s Real, What’s Hype & What’s Next

When Did AI Agents Become A Thing? The History & Evolution Of Agentic AI

In The Loop Episode 1 | DeepSeek’s AI Breakthrough: Hype or Game-Changer? A No-Nonsense Breakdown

What Is The Future Of Agentic AI: Eight Predictions From A CPO

How to use AI agents to fix broken search in learning platforms

The OpenAI announcement will transform the way Mindset AI agents engage with users and knowledge

How AI can support self-guided employee onboarding and reduce ramp times

The future of knowledge management: How AI will change the way we manage knowledge

How AI can make your video content more interactive and engaging

The future of HR: AI-powered knowledge assistants

Why learners choose Google over your learning platform and how AI can change that

ChatGPT Intellectual Property Issues: How To Protect IP