Back to resources

Everything You Need To Know About GPT-5.2 In 10 Minutes | In The Loop Episode 43

Everything You Need To Know About GPT-5.2 In 10 Minutes | In The Loop Episode 43

Published by

Jack Houghton
Anna Kocsis

Published on

December 18, 2025
December 23, 2025

Read time

5
min read

Category

Podcast
Table of contents

Last week we predicted OpenAI would release a new language model codenamed "GPT Garlic." That prediction came true: GPT-5.2 is here, along with an improved image-generation model built on top of it.

This is the final episode of 2025. Instead of hype, let's break down what 5.2 actually means for you as an everyday user and what you should be looking out for.

This is In The Loop with Jack Houghton. I hope you enjoy the show. 

OpenAI's GPT-5.2 release

Earlier in December, Sam Altman issued a Code Red memo, which last week's episode discussed in detail. We covered the real problems OpenAI faces over the next 6–24 months. As predicted, they released GPT Garlic—now called GPT-5.2.

To cut to the chase: they've matched many of Gemini's most significant model updates in benchmark performance and image generation. Whether the press or market feels that yet, I can't say. But from a technical perspective, they've gone beyond what I expected.

GPT-5.2 comes in three versions

  1. Instant is their super-fast model with no thinking mode—just rapid responses. 
  2. Thinking is their standard model with reasoning built in. It pauses, works through the problem, creates a plan, and then responds. 
  3. Pro Extended Thinking is the new tier. Matt Schumer, who had access since November 25th, reported it was thinking for over an hour on some hard problems. Pro is only available on the $200 monthly subscription and frustratingly isn't available via API yet.

There's also a new Reasoning Effort setting within Thinking mode—Standard, High, or Extra High. The higher you set it, the longer it thinks and theoretically the better the output.

From a technical perspective, the context window is now 400,000 tokens compared to GPT-5.1's 128,000. That's a substantial uplift. For those wondering what context windows are: it's the amount of information you can give the model before it gets poor, stupid, and annoying, or just says "Sorry, limit reached, move to a new chat."

There's an auto setting that's supposed to be better at choosing between Instant and Extended Thinking. I'd recommend ignoring it. From what I've read and limited testing—I recently cancelled my ChatGPT license—the model often automatically thinks for a couple seconds and gets an answer that's poor or wrong. You'll need Thinking for most professional work.

How GPT-5.2 is better: what has been improved

The most important thing with every new model release, of course, is what has been improved—and what users say about the new model. 

Spreadsheets and documents

Let's talk spreadsheets and presentations—areas where OpenAI clearly put marketing energy and made real improvements.

Simon Willison said this is the first time ChatGPT has created spreadsheets and presentations that are actually presentable. The YouTube reviewer Skill Leap gave it a web link and asked for a full slideshow. It took 28 minutes, but the output was really impressive—good layouts, information pulled correctly, professional-looking slides. His words: "shockingly good compared to 5.1."

Another tester fed 10,000 rows of spreadsheet data into it and told it to create a PowerPoint. It made an excellent set of slides.

For those doing this work constantly, this is music to your ears. However, there are fantastic tools like Gamma that do much the same thing.

OpenAI's benchmark for this is called GDP Valuing—essentially, well-specified knowledge work tasks across 44 occupations. They claim 5.2 Thinking mode beats or ties with human experts 70.9% of the time, up from 38.8%.

"Well-specified" is doing a lot of work in that sentence. It means the model gets handed everything up front—super clear instructions, all relevant context, and defined success criteria. Real professional work isn't like that. You often have to figure out what information you need, go find it, make judgment calls, craft good prompts.

That benchmark covers well-specified knowledge work with perfect prompts. Most people aren't giving such well-thought-out, structured prompts. 70.9% doesn't mean GPT-5.2 can suddenly do 71% of a person's job. It means for tasks where everything is perfectly articulated and handed to the model on a plate, it performs at expert level most of the time.

Code generation improvements

As I said last week, they're not trying to win in the code arena. That said, they've made fantastic leaps in coding. Maybe they were working on a better coding model, realized they'd made a better model generally, and released it—because most of their marketing focuses on professional work.

On the SWE Pro benchmark, which tests software engineering across four programming languages, 5.2 Thinking scored 55.6%—a new state-of-the-art. On another similar benchmark called SWE Benchmark Verified, it hit 80%, essentially matching Claude's best model at 80.9%.

Vision and long context

Vision capabilities have improved significantly. On chart understanding from scientific papers, accuracy jumped from 80% to 88%. On user interface understanding—that agent mode of reading your screen and making clicks—it jumped from 64% to 86%. Error rates have been cut in half.

On context windows, there's been massive improvement. With 5.1, accuracy started degrading as the amount of information you gave it grew—around 90% at 8,000 tokens, dropping under 50% at 256,000 tokens. With GPT-5.2, accuracy stays at almost 100% across the entire context window, even when nearly maxed out.

This is one of the first models to achieve near-perfect accuracy on the four needle challenge—essentially recalling four specific pieces of information scattered across 200,000 words.

Hallucinations

OpenAI claims they've reduced hallucinations by 30%—from 8.8% in 5.1 to 6.2% in 5.2. However, more independent benchmark reviews have given more modest scoring. Vectara found GPT-5.2 had an 8.4% hallucination rate, which trails DeepSeek at 6.3%.

A massive improvement, but still not a leading model.

What still needs work: speed and writing quality

Speed is still a real problem. Matt Schumer said standard 5.2 Thinking is extremely slow—very slow for most questions, even straightforward ones, which changes how he works. Quick questions mean he goes to Claude Opus. Deep reasoning now goes to 5.2 Pro. Quite interesting because it used to be the other way round for me.

For those who do a lot of writing, quality still lags behind Claude.

Dan Shipper's publication Every ran systematic tests and found Claude Opus 4.5 scored 80% in writing quality whereas GPT-5.2 scored 74%. Many testers noticed big personality changes. Alli Miller, another big commentator in the AI space, said a simple question turned into 58 bullet points and numbered points. Many people have been comparing 5.2 to a brilliant freelancer who over-formats everything.

As you know, I've said repeatedly that benchmarks aren't the be-all and end-all. Comparing models is getting much harder and performance is scaling very iteratively. It's important to test this out yourself and see if you like the improvements.

Stay In The Loop

Closing thoughts

OpenAI's messaging on this release has been very focused, which is unusual. Often they've been "we're the best at everything for everyone" with scattered messaging. This time, every executive in every interview and media appearance focused on professional work and economically valuable tasks.

They're clearly not trying to claim AGI breakthroughs. They're trying to win at professional work and going for the enterprise market.

The improvements are real. Structured outputs make this the most capable model GPT has ever produced. If your tasks involve slides and spreadsheets, 5.2 is a serious upgrade.

But if you zoom out, you're seeing incremental progress, not massive leaps anymore. Some people still hope for this big flash of inspiration—one model that conquers it all. But what we're seeing is a pattern of that not being quite true.

5.2 is a much better tool, but it's not a new era. The fact that OpenAI is marketing better spreadsheets tells us a lot about where we are with AI right now.

That's it for this week and for this year. I hope you found this episode interesting and I look forward to spending 2026 with you. Thank you and see you next year.

Become an AI expert

Subscribe to our newsletter for the latest AI news.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Table of contents
Share this:
Related

Articles

Stay tuned for the latest AI thought leadership.

View all

Book a demo today.