Discussion about this post

User's avatar
Herbie Bradley's avatar

Good post!

> Many think this benchmark should eventually go superexponential, since once AI learns the general planning and error-correction skills needed to complete multiweek tasks, it should be able to complete multimonth ones too.

I've updated against this position over the past 6 months, since wouldn't it naively imply that if a junior software developer gets good at completing multiweek tasks (eg. submitting a PR for a major feature with minimal supervision), they would automatically be roughly as good as senior developers at multimonth/multiyear tasks like planning and guiding the development of a 1M LoC major project? But that is clearly not how it works.

Point Estimate's avatar

> Epoch hasn’t released an official score, but external parties believe Mythos is on trend on this index. […] Though there’s a complication. Anthropic has their own version of ECI, using a probably larger set of internal benchmarks. On the version in the Opus 4.7 system card, Mythos appears to be about 6 months of progress in only 2.2

I think there’s a misunderstanding here. Both of those use the same underlying data - the AECI datapoints from Anthropic. The ECI chart from Ramez Naam simply attempts to convert AECI values to ECI, so that we can contextualize Mythos with models from OpenAI and Google instead of just Anthropic models.

2 more comments...

No posts

Ready for more?