reinforcement-learning

How do neural networks use genetic algorithms and backpropagation to play games?

℡╲_俬逩灬. 提交于 2019-12-19 17:15:09
问题 I came across this interesting video on YouTube on genetic algorithms. As you can see in the video, the bots learn to fight. Now, I have been studying neural networks for a while and I wanted to start learning genetic algorithms.. This somehow combines both. How do you combine genetic algorithms and neural networks to do this? And also how does one know the error in this case which you use to back-propagate and update your weights and train the net? And also how do you think the program in

Q-Learning values get too high

依然范特西╮ 提交于 2019-12-19 04:04:48
问题 I've recently made an attempt to implement a basic Q-Learning algorithm in Golang. Note that I'm new to Reinforcement Learning and AI in general, so the error may very well be mine. Here's how I implemented the solution to an m,n,k-game environment: At each given time t , the agent holds the last state-action (s, a) and the acquired reward for it; the agent selects a move a' based on an Epsilon-greedy policy and calculates the reward r , then proceeds to update the value of Q(s, a) for time t

How can I register a custom environment in OpenAI's gym?

。_饼干妹妹 提交于 2019-12-18 17:53:06
问题 I have created a custom environment, as per the OpenAI Gym framework; containing step , reset , action , and reward functions. I aim to run OpenAI baselines on this custom environment. But prior to this, the environment has to be registered on OpenAI gym. I would like to know how the custom environment could be registered on OpenAI gym? Also, Should I be modifying the OpenAI baseline codes to incorporate this? 回答1: You do not need to modify baselines repo. Here is a minimal example. Say you

Running Keras model for prediction in multiple threads

为君一笑 提交于 2019-12-18 04:12:38
问题 similar to this question I was running an asynchronous reinforcement learning algorithm and need to run model prediction in multiple threads to get training data more quickly. My code is based on DDPG-keras on GitHub, whose Neural Network was build on top of Keras & Tensorflow. Pieces of my code are shown below: Asynchronous Thread creation and join: for roundNo in xrange(self.param['max_round']): AgentPool = [AgentThread(self.getEnv(), self.actor, self.critic, eps, self.param['n_step'], self

keras model.evaluate() does not show loss

眉间皱痕 提交于 2019-12-13 14:25:11
问题 I've created a neural network of the following form in keras : from keras.layers import Dense, Activation, Input from keras import Model input_dim_v = 3 hidden_dims=[100, 100, 100] inputs = Input(shape=(input_dim_v,)) net = inputs for h_dim in hidden_dims: net = Dense(h_dim)(net) net = Activation("elu")(net) outputs = Dense(self.output_dim_v)(net) model_v = Model(inputs=inputs, outputs=outputs) model_v.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse']) Later, I train it on

Cartpole-v0 loss increasing using DQN

拈花ヽ惹草 提交于 2019-12-13 03:47:44
问题 Hi I'm trying to train a DQN to solve gym's Cartpole problem. For some reason the Loss looks like this (orange line). Can y'all take a look at my code and help with this? I've played around with the hyperparameters a decent bit so I don't think they're the issue here. class DQN(nn.Module): def __init__(self, input_dim, output_dim): super(DQN, self).__init__() self.linear1 = nn.Linear(input_dim, 16) self.linear2 = nn.Linear(16, 32) self.linear3 = nn.Linear(32, 32) self.linear4 = nn.Linear(32,

State dependent action set in reinforcement learning

廉价感情. 提交于 2019-12-12 13:15:15
问题 How do people deal with problems where the legal actions in different states are different? In my case I have about 10 actions total, the legal actions are not overlapping, meaning that in certain states, the same 3 states are always legal, and those states are never legal in other types of states. I'm also interested in see if the solutions would be different if the legal actions were overlapping. For Q learning (where my network gives me the values for state/action pairs), I was thinking

DDPG (Deep Deterministic Policy Gradients), how is the actor updated?

你说的曾经没有我的故事 提交于 2019-12-11 13:37:52
问题 I'm currently trying to implement DDPG in Keras. I know how to update the critic network (normal DQN algorithm), but I'm currently stuck on updating the actor network, which uses the equation: so in order to reduce the loss of the actor network wrt to its weight dJ/dtheta, it's using chain rule to get dQ/da (from critic network) * da/dtheta (from actor network). This looks fine, but I'm having trouble understanding how to derive the gradients from those 2 networks. Could someone perhaps

Implementations of Hierarchical Reinforcement Learning

谁说我不能喝 提交于 2019-12-11 08:51:57
问题 Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them? I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with

openai gym env.P, AttributeError 'TimeLimit' object has no attribute 'P'

我只是一个虾纸丫 提交于 2019-12-11 03:25:57
问题 I'm currently reading Hands-On Reinforcement Learning with Python by Sudharsan Ravichandiran and on one of the first examples I run into this AttributeError: AttributeError 'TimeLimit' object has no attribute 'P' raised by the following line: for next_sr in env.P[state][action]: I can't find any documentation regarding env.P , but I found a similar example written in python2 here: https://gym.openai.com/evaluations/eval_48sirBRSRAapMjotYzjb6w/ I suppose env.P is part of an outdated library