Humans have tendency to learn from mistakes. Though they make mistakes or sometimes fail, they learn from it and try to do better next time. The similar trial-and-error technique can be used for robots to teach them new tasks. The widely used approach for training robots for a specific task is reinforcement learning. When a robot uses different approaches to accomplish a task and get closer to the goal, they get rewarded for that approach. The reward offers reinforcement to do same sort of thing until the task is accomplished. But humans differ from these robots. They can learn from failures and mistakes. After accomplishing a task, humans come to know how to do it in a right way they also learn what does not work. This learning can be applied to accomplish a goal. But robots were not trained to learn in this way.
Identifying the need to teach robots to learn from mistakes, OpenAI, an AI research company based in San Francisco, released an open source algorithm known as Hindsight Experience Replay (HER). This algorithm projects failures as successes to make robots learn from mistakes. HER uses what is known as ‘sparse rewards’ for learning. In this type of learning, every attempt that robot makes toward accomplishing a task is treated as a success at something.
Most of the reinforcement algorithms use ‘dense rewards’, in which robot is rewarded with a cookie of size depending upon how close it gets toward accomplishing a task. It involves rewarding on individual aspects of a task and sometimes, helping the robot to learn in a way researcher want to. These rewards are intricate to program and not realistic in real world applications in many cases.
Most of the applications are result-focused. Either robot succeeds at them or not. Sparse rewards mean offering a cookie to robot only if it succeeds. This is easier to program, measure, and implement. But there is a limitation to that. If robot is failing and not reaching the specified goal, it becomes difficult to train it. This is where HER comes into picture. It offers cookie for each attempt. Dense rewards need a lot of samples in the database to teach robots. On the other hand, sparse rewards are simpler to program as giving the reward by determining robot is successful or not is simpler than figuring out the appropriate dense reward at every step. Moreover, comprehensive tuning of reward function is not required in case of sparse rewards.
OpenAI has developed HER to make robots more like humans. It has released an open source version along with a set of simulated robot environments. These environments are based on real robot platforms such as a Fetch research and a Shadow hand robot. It is interesting to see how intelligent robots will become after trained through sparse rewards.