Jumping Bunnies and Shrinking Context – THEFUTURE

In this issue: OpenAI’s messy GPT-5 launch strips away the magic. The jumping bunnies are fake, and so is everything else in our feeds. Questions for you, the reader. Plus: I vibecoded a Strava data exporter and compared four AI models as marathon coaches.

What we’re talking about: There’s a new ChatGPT, and the launch was messy. OpenAI just flicked a switch, and gone were 4o, 4.5, o3, and all the other models. Replaced by GPT-5. A lot of us lamented the confusing names. I certainly have. But to just pull the trigger? Bold move for a company with hundreds of millions of users.

It didn’t help that the hidden mechanism that switches between GPT-5 and GPT-5 Thinking, for more complex queries, was broken at first.

On Reddit, OpenAI’s Sam Altman was answering questions and promising improvements, like the ability to manually switch to GPT-5 Thinking (and the additional GPT-5 Pro for the $200 subscription).

While some pointed out that users on the free tier get to experience a better “reasoning” model like the former o3 for the first time, others noted the rate limit. Pro subscribers got only a couple hundred “reasoning” requests per week. Altman raised it later to 3000.

And the vibes are off: ChatGPT-4o had this eager-to-please energy that bordered on clingy. GPT-5 has more matter-of-fact vibes. It might be a good thing in terms of mental health, but users with an addiction to their chat therapist were not happy. OpenAI brought 4o back for paying users within days, at least temporarily.

And what about the shrinking context window, half the size of o3? The oldest knowledge cutoff of the frontier models?

I get it, this stuff is expensive. But introducing limitations, not exactly outrunning the competition, and calling it a “significant leap”? Then again, the API is priced aggressively. As in: comparable to Gemini and many times cheaper than Claude.

Simon Willison‘s verdict: “It’s just good at stuff. It doesn’t feel like a dramatic leap ahead from other LLMs but it exudes competence.”

Looks like the magic is wearing off.

What else I’ve been reading:

Hands on: I’m tracking my runs in Strava and wanted to ask GPT-5 about my training. I did not want to enter all my runs manually, and Strava does not have an easy export. So I just asked:

Let's make an app that pulls activities from Strava. First, let's make a plan. I want to be able to select the type of activity (runs) and the starting date after which we pull every activity to a CSV file. I need everything but GPS data. I want to later analyze times, paces, intervals.

I got 536 lines of Python code and instructions on how to set up API access in Strava. I put the code into Google Colab. It threw an error, I copied the error into ChatGPT, and got instructions to fix it. I ran the script, I got my data. All of this took five minutes. Still kind of magical.

I gave the data file to GPT-5, o3, Claude Opus 4.1, and Gemini 2.5 Pro for an honest assessment:

You are an experienced marathon coach. I want to run the Berlin marathon on September 21st. This is a file of my latest training sessions. Note the interval sessions and look at the splits and laps. During my last long run, I picked up the pace in the last 3k - you should be able to see this in the data. How is my training going?

Reader, I’ll spare you the details. Let’s just say: The analysis from GPT-5 is extensive, with charts, and aligns with Strava’s prediction. Claude Opus is less detailed, o3 is more optimistic, and Gemini outright delusional.

AI & Journalism Links

THEFUTURE-Newsletter

<- more recent

previous ->