Implementations of Hierarchical Reinforcement Learning

谁说我不能喝 提交于 2019-12-11 08:51:57

问题


Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them?

I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with an object or opponent. It also can do some simple dead reckoning to estimate its position in the world using its starting position as a reference. So all the state features available to it are:

edge_detected=0|1
edge_left=0|1
edge_right=0|1
edge_both=0|1
sonar_detected=0|1
sonar_left=0|1
sonar_left_dist=near|far|very_far
sonar_right=0|1
sonar_right_dist=near|far|very_far
sonar_both=0|1
contact_detected=0|1
contact_left=0|1
contact_right=0|1
contact_both=0|1
estimated_distance_from_edge_in_front=near|far|very_far
estimated_distance_from_edge_in_back=near|far|very_far
estimated_distance_from_edge_to_left=near|far|very_far
estimated_distance_from_edge_to_right=near|far|very_far

The goal is to identify the state where the reward signal is received, and learn a policy to acquire that reward as quickly as possible. In a traditional Markov model, this state space represented discretely would have 2985984 possible values, which is far too much to explore each and every one using something like Q-learning or SARSA.

Can anyone recommend a reinforcement library appropriate for this domain (preferably with Python bindings) or an unimplemented algorithm that I could potentially implement myself?


回答1:


Your actual state is the robot's position and orientation in the world. Using these sensor readings is an approximation, since it is likely to render many states indistinguishable.

Now, if you go down this road, you could use linear function approximation. Then this is just 24 binary features (12 0|1 + 6*2 near|far|very_far). This is such a small number that you could even use all pairs of features for learning. Farther down this road is online discovery of feature dependencies (see Alborz Geramifard's paper, for example). This is directly related to your interest in hierarchical learning.

An alternative is to use a conventional algorithm to track the robot's position and use the position as input to RL.



来源:https://stackoverflow.com/questions/26600905/implementations-of-hierarchical-reinforcement-learning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!