reinforcement-learning


How to accumulate and appy gradients for Async n-step DQNetwork update in Tensorflow?

梦想与她 提交于 2020-02-27 22:22:21
问题 I am trying to implement Asynchronous Methods for Deep Reinforcement Learning and one of the steps requires to accumulate the gradient over different steps and then apply it. What is the best way to achieve this in tensorflow? I got so far as to accumulate the gradient and I don't think is the fastest way to achieve it (lots of transfers from tensorflow to python and back). Any suggestions are welcome. This is my code of a toy NN. It does not model or compute anything it just exercise the

How to accumulate and appy gradients for Async n-step DQNetwork update in Tensorflow?

ⅰ亾dé卋堺 提交于 2020-02-27 22:20:13
问题 I am trying to implement Asynchronous Methods for Deep Reinforcement Learning and one of the steps requires to accumulate the gradient over different steps and then apply it. What is the best way to achieve this in tensorflow? I got so far as to accumulate the gradient and I don't think is the fastest way to achieve it (lots of transfers from tensorflow to python and back). Any suggestions are welcome. This is my code of a toy NN. It does not model or compute anything it just exercise the

Is it possible to modify OpenAI environments?

你离开我真会死。 提交于 2020-02-22 08:42:28
问题 There are some things that I would like to modify in the OpenAI environments. If we use the Cartpole example then we can edit things that are in the class init function but with environments that use Box2D it doesn't seem to be as straight forward? For example, consider the BipedalWalker environment. In this case how would I edit things like the SPEED_HIP or SPEED_KNEE variables? Thanks 回答1: Yes, you can modify or create new environments in gym. The simplest (but not recommended) way is to

Is it possible to modify OpenAI environments?

与世无争的帅哥 提交于 2020-02-22 08:36:54
问题 There are some things that I would like to modify in the OpenAI environments. If we use the Cartpole example then we can edit things that are in the class init function but with environments that use Box2D it doesn't seem to be as straight forward? For example, consider the BipedalWalker environment. In this case how would I edit things like the SPEED_HIP or SPEED_KNEE variables? Thanks 回答1: Yes, you can modify or create new environments in gym. The simplest (but not recommended) way is to

Pygame and Open AI implementation

佐手、 提交于 2020-01-24 14:12:47
问题 Me and my classmate have decided to try and implement and AI agent into our own game. My friend have done most of the code, based on previous projects, and I was wondering how PyGame and OpenAI would work together. Have tried to do some research but can't really find any useful information about this specific topic. Some have said that it is hard to implement but some say it works. Either way, I'd like your opinion on this project and how you'd approach this if it was you. The game is very

Pygame and Open AI implementation

谁说我不能喝 提交于 2020-01-24 14:11:10
问题 Me and my classmate have decided to try and implement and AI agent into our own game. My friend have done most of the code, based on previous projects, and I was wondering how PyGame and OpenAI would work together. Have tried to do some research but can't really find any useful information about this specific topic. Some have said that it is hard to implement but some say it works. Either way, I'd like your opinion on this project and how you'd approach this if it was you. The game is very

Neural Network and Temporal Difference Learning

限于喜欢 提交于 2020-01-24 05:25:07
问题 I have a read few papers and lectures on temporal difference learning (some as they pertain to neural nets, such as the Sutton tutorial on TD-Gammon) but I am having a difficult time understanding the equations, which leads me to my questions. -Where does the prediction value V_t come from? And subsequently, how do we get V_(t+1)? -What exactly is getting back propagated when TD is used with a neural net? That is, where does the error that gets back propagated come from when using TD? 回答1:

Epsilon and learning rate decay in epsilon greedy q learning

浪子不回头ぞ 提交于 2020-01-23 00:24:03
问题 I understand that epsilon marks the trade-off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As you learn about future rewards, epsilon should decay so that you can exploit the higher Q-values you've found. However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay. How do we set our epsilon and alpha such that values converge? 回答1:

Epsilon and learning rate decay in epsilon greedy q learning

◇◆丶佛笑我妖孽 提交于 2020-01-23 00:21:35
问题 I understand that epsilon marks the trade-off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As you learn about future rewards, epsilon should decay so that you can exploit the higher Q-values you've found. However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay. How do we set our epsilon and alpha such that values converge? 回答1:

AttributeError: module '_Box2D' has no attribute 'RAND_LIMIT_swigconstant'

此生再无相见时 提交于 2020-01-13 17:01:07
问题 I am trying to run a lunar_lander on reinforcement learning, but when I run it, it occurs an error. Plus my computer is osx system. Here is the code of lunar lander: import numpy as np import gym import csv from keras.models import Sequential from keras.layers import Dense, Activation, Flatten from keras.optimizers import Adam from rl.agents.dqn import DQNAgent from rl.policy import BoltzmannQPolicy, EpsGreedyQPolicy from rl.memory import SequentialMemory import io import sys import csv #

工具导航Map