Teaching AI to reason: this year's most…

Benjamin Todd

Feb 12

Most people think of AI as a pattern-matching chatbot – good at writing emails, terrible at real thinking.

Read →

14 Comments

Jorge I Velez

Feb 12

It drives me insane how very few people can extrapolate these scenarios based on what is happening today.

I have been trying to walk my social circle through these scenarios and I get either some denial pushback or outright dismissiveness. I don't think it comes from a place of pure denial, its just more they would rather not think about it.

It is really giving me Late February - Early March 2020 vibes.

Expand full comment

Reply (1)

Mark Taylor

Jun 14

Everyone is exhausted with extrapolating new trends into the future and trusting the experts. Especially after 2020. And by definition it’s useless to think about what’s after the Singularity anyway.

Expand full comment

Benjamin Todd

Apr 23

Important update on this post: https://substack.com/@benjamintodd/note/c-111372048?utm_source=activity_item

Expand full comment

JMMT 360

Feb 12

How can a young person prepare for this?

Expand full comment

Reply (1)

Pauliina Laine

Mar 18

Reading the 80 000 hours career guide would be a good start

Expand full comment

Josh

Jun 13

I found this really insightful. It all feels very straightforward but skeptics talk about the compute ceiling as a response. What’s the best criticism of your argument here? Let’s steelman this thing!

Expand full comment

Reply (1)

Benjamin Todd

Jun 16

The compute ceiling is real - it just probably doesn’t hit until 2030. Full discussion of counterarguments and bottlenecks here: https://benjamintodd.substack.com/p/the-case-for-agi-by-2030

Expand full comment

Reply (1)

Josh

Jun 16

Nice great.

Expand full comment

Michael Wiebe

Feb 26

>You can ask GPT-o1 to solve 100,000 math problems, then take only the correct solutions, and use them to train the next model.

If you already have the solutions (needed to grade the problems), why not train on those in the first place?

Expand full comment

Reply (1)

Benjamin Todd

Mar 1

You don't have the solutions to start with – you generate the solutions and then verify they're correct. It's easier to verify a problem is correct than to generate the solution in the first place. (But yes this method only works if it's relatively easy to verify the solutions.)

Also note my impression is there's significant value in generating the whole chain of reasoning leading to the solution (even if you already know what the solution is).

Expand full comment

Boogaloo

Feb 24

Would be curious to know what your thoughts are on Figure and Helix.

Expand full comment

Boogaloo

Feb 24

I think we are still with the problem where it aces all the tests but can't yet do the actual thing right?

Like crush competitive coding but fails at debugging properly a codebase in the real world (i believe openAI had some paper showing this).

Expand full comment

Reply (1)

Boogaloo

Feb 24Edited

The claim that this thing is at a PhD level at anything is just false as of right now.

Having said that. I do believe it's possible to RL all these different tasks and eventually it could naturally start to learn to generalize over all these tasks. It's probably only a matter of cost really.

Does that cost 1 trillion, 10, or 100 trillion?

Furthermore, I think innovation aka genuine PhD level work is nothing more than probing the space of concepts. So that should be solvable as well.

In principle I see no walls, but we shouldn't say these things are at PhD level at anything currently.

Expand full comment

Reply (1)

Benjamin Todd

Feb 25

I'm claiming they're at PhD level at answering these questions. But clearly you're right PhDs do a lot more than answer well defined 1h questions.

What fraction of work is captured by benchmarks is a key uncertainty we face right now.

Also agree we'll be able to make a lot more progress from here, and that's maybe the key thing.

Expand full comment