Reward Hacking Reloaded Concrete Problems In Ai Safety Part 3 5

Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for Three different approaches that might help to prevent Instrumental Convergence: Scalable Supervision: Why can't we just have humans overseeing our AI systems? The This is a follow-up to this earlier video: There's another