Are the last 3 months the start of an AI acceleration?
While most are debating whether AI has hit a plateau, in Silicon Valley they’re debating whether progress is exponential or superexponential.
Claude Opus 4.6 was released less than 3 months after Opus 4.5, but was clearly better at real world agentic tasks. Then Mythos, with dramatic cyber hacking capabilities, was released just two months after that. It feels like an acceleration.
Anthropic and OpenAI’s stated aim is to automate AI R&D to bring about an acceleration in AI capabilities, causing an intelligence explosion. Has that process already started?
Let’s review the evidence.
In a nutshell
We could be seeing the start of an acceleration driven by Anthropic, but it’s too early to tell:
Mythos might be an acceleration especially on agentic tasks, but it’s just a single data point and might be caused by an unusually large increase in training compute that can’t be sustained.
Frontier AI revenue seems to be accelerating due to Anthropic, but now that Anthropic has caught up to OpenAI, its growth rate might slow to the field’s as a whole.
AI has made AI researchers noticeably more productive, but probably not enough to cause a large acceleration in progress.
Compute prices might be trending up as we’d expect to see if algorithms were improving rapidly relative to the supply of chips.
1. Benchmark results
An upward curve in benchmark results would be the clearest signal of an acceleration. Epoch ECI is a combination of 37 benchmarks into a single index. Epoch believes a new faster trend started in early 2024 (ironically when people were saying pretraining was hitting a wall).
But does Mythos represent a break from the faster post-2024 trend? Epoch hasn’t released an official score, but external parties estimate Mythos is on trend on this index.1
Though there’s a complication. Anthropic has their own version of ECI, using a probably larger set of internal benchmarks. On the version in the Opus 4.7 system card, Mythos appears to be about 6 months of progress in only 2.2
Which version of ECI should we trust? I haven’t been able to get a clear explanation of the difference, but the best guess is that Anthropic’s index contains more agentic and coding tasks, while Epoch’s index is more driven by progress on math at the higher end. I think agentic coding skills are more important for starting a feedback loop, so would watch Anthropic’s index the most.
METR time horizon
If we were to look at just one benchmark, my favourite is still METR’s time horizon, which aims to measure the agentic coding and AI R&D tasks that are especially relevant to starting an algorithmic feedback loop.
Many think this benchmark should eventually go superexponential, since once AI learns the general planning and error-correction skills needed to complete multiweek tasks, it should be able to complete multimonth ones too. It also shows a post-2024 acceleration, but what about the recent releases?
The final dot shows Claude Opus 4.6 was slightly above the trend line for a 50% success rate, but well within confidence intervals (plots below by Alex Barry).
And for an 80% success rate, it looks exactly on trend, or slightly below.
What about Mythos? Results on METR correlate pretty well with Anthropic’s ECI (which makes sense if Anthropic’s ECI is also heavy on agentic coding tasks).
That correlation would suggest a 50% success rate horizon of around 40h – though the longest task in the benchmark is 30h, so this is off the scale.
The 80% success rate horizon should be around 6h, which also would be 6 months of progress in 2. Whether Mythos actually hits 6h at 80% success is a key thing to watch in the coming months.
What might explain an acceleration in benchmarks?
Mythos indeed seems to be ahead of trend on agentic coding. What could explain that?
First, it might be a fluke. Given the uncertainties involved, a single data point will have a minimal effect on the best guess trend. Anthropic was also lagging on ECI before, so may have simply caught up.
Second, Anthropic might have increased training compute an unusually large amount in this round of training. This brings future capabilities into the present, but they won’t be able to continue this rate of increase. (Some evidence is that Mythos costs about 5x more, suggesting the model is about 5 times larger.)
Third, AI might be successfully learning general agentic skills that will result in superexponential progress on agentic benchmarks. If that’s the case, we should expect the acceleration to continue.
Fourth, AI might be making Anthropic researchers so much more productive that they can now make progress three times as fast, which would make this the start of an algorithmic feedback loop. I’ll discuss why I don’t think this is what’s happening in section 3.
Overall the first and second explanations seem the most plausible to me, but we can’t rule any of them out.
2. Revenue
Revenue is my favourite ‘benchmark’, since it’s the hardest to game. If companies are willing to part with more cold hard cash to use AI, it’s probably doing something more useful for them. (Price can diverge from value, but is most likely to be lower, due to fierce competition from open source.) More revenue also means more money for compute, which keeps the flywheel going.
Here is revenue of frontier companies on a log-chart (excluding Gemini):
The grey line looks pretty linear, which would correspond to steady exponential growth. But if you break it down by year, you find:
2024: 3.2x growth
2025: 4.7x growth
2026: 8x annualised to date
Basically, OpenAI has been growing at 3-4x per year, while Anthropic has been growing at 10x. As Anthropic becomes a larger share of the total, the overall growth rate has been trending towards 10x.
The crucial question: after Anthropic becomes the majority of revenue, will it be able to maintain something closer to its longer term 10x per year trend, which would be an acceleration for AI as a whole, or will it converge to OpenAI’s growth rate since it can no longer take market share (a continuation of trend)? This is another key indicator to watch the next 3-6 months.
What about Gemini? It’s hard to disentangle Gemini’s revenue from the rest of Google, but growth in usage has probably been in between the two: faster than OpenAI but slower than Anthropic. If revenue has moved similarly, it would make the case for acceleration stronger.
In the first three months of the year, Anthropic grew revenue at an annualised rate of 81 times, probably the fastest a company of this size has ever grown. It’s unlikely this can be sustained, since there’s not enough compute available (and there’s only so much they will increase prices).
3. AI uplift
AI is making AI researchers more productive — but probably not enough to explain Mythos. Here’s the arithmetic.
In an internal survey of 18 researchers, one thought Anthropic Mythos Preview was already a drop-in replacement for an entry-level Research Scientist or Engineer, and 4 thought it had a 50% chance of qualifying as such with 3 months of scaffolding iteration, while no-one thought that was possible for Opus 4.6. (Though Anthropic say they suspect those numbers would go down if discussed further.)
In February, Anthropic researchers said Opus 4.6 made them 2x more productive at the median, and 2.5x at the mean. For Mythos, the geometric mean was 4x.
This is a rapid rate of progress (~16x per year), but I’m sceptical of the absolute size. A study by METR found that software engineers greatly overestimated how much more productive AI made them.3 It’s only an informal survey, biased towards the respondents who use AI the most.
Redwood Research’s Ryan Greenblatt agrees and estimates the true increase in labour productivity is around 1.6x rather than 4x. The AI Futures team have told me they have a similar estimate.
Since AI progress requires other inputs, especially compute, a 1.6x increase to labour productivity would increase the overall rate of AI progress about 1.2x. That’s just starting to get noticeable, but lower than needed for an intelligence explosion. In the default AI Futures model, it’s another ~2 years from this point to takeoff.4
Even if Anthropic’s researchers are indeed 4x more productive, Anthropic estimate this would result in less than a 2x increase in the overall rate of AI progress.
Either way, the uplift estimates for Claude 4.6 aren’t enough to have caused the acceleration represented by Mythos, which makes me more sceptical it’s part of an algorithmic acceleration.
Of course this is all very uncertain. If the Anthropic employees in the poll are right, then the intelligence explosion could be here much sooner.
4. Compute prices
As AI improves, the price of compute should converge towards the marginal value produced by the marginal AI worker. This could be driven either by extra AI workers being less useful, or the price of compute rising.
My guess is that if a true human-level AI remote worker were created in the next four years, the amount of compute is limited enough that there wouldn’t be large diminishing returns (the amount of compute in the world is only enough to output equivalent to about 100 million human workers with the abilities of GPT-5.)
The price of compute could therefore trend to the level of typical white collar wages in the US, or about $50/hour. The current cost to rent an H100 GPU is around $2/hour, and it can run about ten GPT-5 level workers, so the price could go up a lot. (In a race to superintelligence, the value of marginal compute might go even higher.)
Historically, the price of compute has dropped around 30% per year, as each generation of chips becomes more efficient.
In the last 4 months, however, we’ve seen the first sharp increase: up 30%.
Is this just a blip caused by Claude Code and Cowork (which can do 1h coding tasks for $0.30 you’d need to pay a human $30), or is it the start of an upwards trend in the price of compute? That’s another key indicator of a near-term takeoff – one that also enables even greater investment in datacentres, keeping the AI flywheel going.
Wrapping up
In short, there are signs of an acceleration driven by Anthropic, but it’s still too early to know for sure. Anthropic may just be catching up in market share, and Mythos might just be a catch up in certain benchmarks, an outlier or the result of an unusually large training run. AI researchers are starting to get noticeable uplift from AI, but not enough to cause a big acceleration in benchmark results.
In the next three months, the crucial indicators to watch are:
Where does Mythos fall on the METR time horizon benchmark at 80% reliability?
Are the next 1-2 big model releases also above trend on ECI?
Does Anthropic’s revenue continue on the faster trend, or converge to OpenAI’s trend?
Can we get any better AI uplift estimates?
Do compute prices keep rising?
Even without an acceleration, these trends remain insanely fast. A mere continuation would still likely get us to something like AGI and an intelligence explosion in 3-4 years. An acceleration could get us there in 1-2.
This estimate is based on scaling Anthropic’s ECI data to estimate Epoch ECI. I’ve also been told that this estimate of ~161 for Mythos is likely slightly too high, which would bring it even back closer to trend. This is because Anthropic incorrectly scaled their ECI by setting Sonnet 3.5 new to 130 instead of the original Sonnet 3.5, which leads to their numbers being too high, and this isn’t sufficiently corrected for in the tweet.
Or in 3, if we suppose it takes another month for Mythos to be fully released.
Another framing: Opus 4.6 can do 14h tasks with 50% reliability and 1h tasks with 80% reliability on the METR time horizon benchmark, how much should that speed researchers up? These tasks are also relatively well-defined, non-messy tasks compared to a lot of what researchers do. My sense is that these abilities should let researchers automate <50% of their work, which should mean their overall productivity speeds up <2x (unless they can switch to projects that can effectively use huge amounts of basic engineering).
That is, with Daniel Kokotajlo’s median parameters; this statistic depends on the parameter inputs.











Good post!
> Many think this benchmark should eventually go superexponential, since once AI learns the general planning and error-correction skills needed to complete multiweek tasks, it should be able to complete multimonth ones too.
I've updated against this position over the past 6 months, since wouldn't it naively imply that if a junior software developer gets good at completing multiweek tasks (eg. submitting a PR for a major feature with minimal supervision), they would automatically be roughly as good as senior developers at multimonth/multiyear tasks like planning and guiding the development of a 1M LoC major project? But that is clearly not how it works.
> Epoch hasn’t released an official score, but external parties believe Mythos is on trend on this index. […] Though there’s a complication. Anthropic has their own version of ECI, using a probably larger set of internal benchmarks. On the version in the Opus 4.7 system card, Mythos appears to be about 6 months of progress in only 2.2
I think there’s a misunderstanding here. Both of those use the same underlying data - the AECI datapoints from Anthropic. The ECI chart from Ramez Naam simply attempts to convert AECI values to ECI, so that we can contextualize Mythos with models from OpenAI and Google instead of just Anthropic models.