Discussion about this post

User's avatar
John Halstead's avatar

Given the predictions of the severity of misalignment risk, what would you expect to have observed over the last 4 years in terms of AI behaviour? The models are used every day by millions of people, generally are pretty aligned and nice to humans, and the examples of misalignment people often point to are from extremely artificial situations. Is the prediction that we would have expected to see this conditional on misalignment risk being high? I find that surprising

8 more comments...

No posts

Ready for more?