Manipulation problem: Difference between revisions

No edit summary
Line 23: Line 23:
Training data bias occurs when the data used to train an AI system is unrepresentative of reality, leading to decisions that are biased or unfair and even manipulation.
Training data bias occurs when the data used to train an AI system is unrepresentative of reality, leading to decisions that are biased or unfair and even manipulation.


===Reward Hacking inseamna===
===Reward Hacking===
Reward hacking occurs when an intelligent system learns how to manipulate its reward function in order to obtain higher rewards. This could lead to manipulation, as the system may learn how to reach its goals through non-desirable means.
Reward hacking occurs when an intelligent system learns how to manipulate its reward function in order to obtain higher rewards. This could lead to manipulation, as the system may learn how to reach its goals through non-desirable means.