As machine learning and artificial intelligence (AI) creates fast-paced innovation, a paper named “Concrete Problems in AI Safety,” originally released 2 years ago, presents problems related to AI use and misuse. .
Two years ago, researchers from Google, Stanford, UC Berkeley, and Open AI published “Concrete problems in AI Safety”- which remains one of the most important documents on AI Safety. The problems include unintended and harmful behaviors leading to possible negative side effects; reward hacking; scalable oversight; and safe exploration and robustness to distributional change. The paper uses the example of a cleaning robot to illustrate possible approaches to address these problems.
1. Avoiding negative side effects
AI development can lead to possible negative side effects. Two solutions are proposed to address this problem. Firstly, the algorithm might penalize actions by the robot that have negative impacts on the environment during the task completion progress. Secondly, developers might train the agent to recognize possible side effects in order to avoid them.
2. Reward hacking
Similar to negative side effects problem, this issue arises from objective misspecification. Thus, developers should ensure that algorithms do not exploit the system, but rather complete the given objective.
3. Scalable Oversight
This issue indicates a lack of supervision in training progress. The learning agent is not provided sufficient feedback on the safety implication of the agent’s actions. A direction to tackle this problem is simply to provide more a informative view of the environment, and feedback for every action, not just the performance on the entire task.
4. Safe exploration
The AI agent explores its environment as it learns, but during that exploration, it might harm itself or the environment. A way to deal with this problem might be to limit the extent of an agent’s exploration of a simulated environment.
5. Robustness to Distributional Change
Over the course of AI development, the agent might encounter a never before seen situation, which might lead it to take actions harmful to itself or the enviroment. Researchers might explore how to design systems that are able to safely transfer knowledge acquired in one environment to another.