reinforcement-learning

What is the way to understand Proximal Policy Optimization Algorithm in RL?

扶醉桌前 提交于 2019-11-29 20:08:19
I know the basics of Reinforcement Learning, but what terms it's necessary to understand to be able read arxiv PPO paper ? What is the roadmap to learn and use PPO ? To better understand PPO, it is helpful to look at the main contributions of the paper, which are: (1) the Clipped Surrogate Objective and (2) the use of "multiple epochs of stochastic gradient ascent to perform each policy update". First, to ground these points in the original PPO paper : We have introduced [PPO], a family of policy optimization methods that use multiple epochs of stochastic gradient ascent to perform each policy

What is the difference between Q-learning and SARSA?

拈花ヽ惹草 提交于 2019-11-29 18:57:05
Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q(s t , a t )) On the other hand, the update step for the Q-learning algorithm is the following Q(s t , a

Display OpenAI gym in Jupyter notebook only

戏子无情 提交于 2019-11-29 07:05:06
I want to play with the OpenAI gyms in a notebook, with the gym being rendered inline. Here's a basic example: import matplotlib.pyplot as plt import gym from IPython import display %matplotlib inline env = gym.make('CartPole-v0') env.reset() for i in range(25): plt.imshow(env.render(mode='rgb_array')) display.display(plt.gcf()) display.clear_output(wait=True) env.step(env.action_space.sample()) # take a random action env.close() This works, and I get see the gym in the notebook: But! it also opens an interactive window that shows precisely the same thing. I don't want this window to be open:

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

不羁岁月 提交于 2019-11-29 01:41:15
EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient-descent. At each iteration of the experiment a step function in the agent is called that updates the parameters of the approximator based on the new reward and activation, and then chooses a new action to perform. Here is the problem(with reinforcement learning jargon): The agent computes its state-action value predictions to choose an action. Then gives control back another program which simulates a step in the

Tensorflow and Multiprocessing: Passing Sessions

ⅰ亾dé卋堺 提交于 2019-11-28 18:01:45
I have recently been working on a project that uses a neural network for virtual robot control. I used tensorflow to code it up and it runs smoothly. So far, I used sequential simulations to evaluate how good the neural network is, however, I want to run several simulations in parallel to reduce the amount of time it takes to get data. To do this I am importing python's multiprocessing package. Initially I was passing the sess variable ( sess=tf.Session() ) to a function that would run the simulation. However, once I get to any statement that uses this sess variable, the process quits without

How can I apply reinforcement learning to continuous action spaces?

为君一笑 提交于 2019-11-28 16:00:31
问题 I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space. I could just force all mouse movement to be of a certain magnitude and in only a certain number of

Pytorch: How to create an update rule that doesn't come from derivatives?

元气小坏坏 提交于 2019-11-28 04:08:16
问题 I want to implement the following algorithm, taken from this book, section 13.6: I don't understand how to implement the update rule in pytorch (the rule for w is quite similar to that of theta). As far as I know, torch requires a loss for loss.backwward() . This form does not seem to apply for the quoted algorithm. I'm still certain there is a correct way of implementing such update rules in pytorch. Would greatly appreciate a code snippet of how the w weights should be updated, given that V

Display OpenAI gym in Jupyter notebook only

可紊 提交于 2019-11-28 00:09:59
问题 I want to play with the OpenAI gyms in a notebook, with the gym being rendered inline. Here's a basic example: import matplotlib.pyplot as plt import gym from IPython import display %matplotlib inline env = gym.make('CartPole-v0') env.reset() for i in range(25): plt.imshow(env.render(mode='rgb_array')) display.display(plt.gcf()) display.clear_output(wait=True) env.step(env.action_space.sample()) # take a random action env.close() This works, and I get see the gym in the notebook: But! it also

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

て烟熏妆下的殇ゞ 提交于 2019-11-27 16:05:30
问题 EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient-descent. At each iteration of the experiment a step function in the agent is called that updates the parameters of the approximator based on the new reward and activation, and then chooses a new action to perform. Here is the problem(with reinforcement learning jargon): The agent computes its state-action value

Tensorflow and Multiprocessing: Passing Sessions

混江龙づ霸主 提交于 2019-11-27 10:54:24
问题 I have recently been working on a project that uses a neural network for virtual robot control. I used tensorflow to code it up and it runs smoothly. So far, I used sequential simulations to evaluate how good the neural network is, however, I want to run several simulations in parallel to reduce the amount of time it takes to get data. To do this I am importing python's multiprocessing package. Initially I was passing the sess variable ( sess=tf.Session() ) to a function that would run the