reinforcement-learning | 易学教程

How do neural networks use genetic algorithms and backpropagation to play games?

阅读更多关于 How do neural networks use genetic algorithms and backpropagation to play games?

问题 I came across this interesting video on YouTube on genetic algorithms. As you can see in the video, the bots learn to fight. Now, I have been studying neural networks for a while and I wanted to start learning genetic algorithms.. This somehow combines both. How do you combine genetic algorithms and neural networks to do this? And also how does one know the error in this case which you use to back-propagate and update your weights and train the net? And also how do you think the program in

Q-Learning values get too high

阅读更多关于 Q-Learning values get too high

问题 I've recently made an attempt to implement a basic Q-Learning algorithm in Golang. Note that I'm new to Reinforcement Learning and AI in general, so the error may very well be mine. Here's how I implemented the solution to an m,n,k-game environment: At each given time t , the agent holds the last state-action (s, a) and the acquired reward for it; the agent selects a move a' based on an Epsilon-greedy policy and calculates the reward r , then proceeds to update the value of Q(s, a) for time t

How can I register a custom environment in OpenAI's gym?

阅读更多关于 How can I register a custom environment in OpenAI's gym?

问题 I have created a custom environment, as per the OpenAI Gym framework; containing step , reset , action , and reward functions. I aim to run OpenAI baselines on this custom environment. But prior to this, the environment has to be registered on OpenAI gym. I would like to know how the custom environment could be registered on OpenAI gym? Also, Should I be modifying the OpenAI baseline codes to incorporate this? 回答1: You do not need to modify baselines repo. Here is a minimal example. Say you

Running Keras model for prediction in multiple threads

阅读更多关于 Running Keras model for prediction in multiple threads

问题 similar to this question I was running an asynchronous reinforcement learning algorithm and need to run model prediction in multiple threads to get training data more quickly. My code is based on DDPG-keras on GitHub, whose Neural Network was build on top of Keras & Tensorflow. Pieces of my code are shown below: Asynchronous Thread creation and join: for roundNo in xrange(self.param['max_round']): AgentPool = [AgentThread(self.getEnv(), self.actor, self.critic, eps, self.param['n_step'], self

keras model.evaluate() does not show loss

阅读更多关于 keras model.evaluate() does not show loss

问题 I've created a neural network of the following form in keras : from keras.layers import Dense, Activation, Input from keras import Model input_dim_v = 3 hidden_dims=[100, 100, 100] inputs = Input(shape=(input_dim_v,)) net = inputs for h_dim in hidden_dims: net = Dense(h_dim)(net) net = Activation("elu")(net) outputs = Dense(self.output_dim_v)(net) model_v = Model(inputs=inputs, outputs=outputs) model_v.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse']) Later, I train it on

Cartpole-v0 loss increasing using DQN

阅读更多关于 Cartpole-v0 loss increasing using DQN

问题 Hi I'm trying to train a DQN to solve gym's Cartpole problem. For some reason the Loss looks like this (orange line). Can y'all take a look at my code and help with this? I've played around with the hyperparameters a decent bit so I don't think they're the issue here. class DQN(nn.Module): def __init__(self, input_dim, output_dim): super(DQN, self).__init__() self.linear1 = nn.Linear(input_dim, 16) self.linear2 = nn.Linear(16, 32) self.linear3 = nn.Linear(32, 32) self.linear4 = nn.Linear(32,

State dependent action set in reinforcement learning

阅读更多关于 State dependent action set in reinforcement learning

问题 How do people deal with problems where the legal actions in different states are different? In my case I have about 10 actions total, the legal actions are not overlapping, meaning that in certain states, the same 3 states are always legal, and those states are never legal in other types of states. I'm also interested in see if the solutions would be different if the legal actions were overlapping. For Q learning (where my network gives me the values for state/action pairs), I was thinking

DDPG (Deep Deterministic Policy Gradients), how is the actor updated?

阅读更多关于 DDPG (Deep Deterministic Policy Gradients), how is the actor updated?

问题 I'm currently trying to implement DDPG in Keras. I know how to update the critic network (normal DQN algorithm), but I'm currently stuck on updating the actor network, which uses the equation: so in order to reduce the loss of the actor network wrt to its weight dJ/dtheta, it's using chain rule to get dQ/da (from critic network) * da/dtheta (from actor network). This looks fine, but I'm having trouble understanding how to derive the gradients from those 2 networks. Could someone perhaps

Implementations of Hierarchical Reinforcement Learning

阅读更多关于 Implementations of Hierarchical Reinforcement Learning

问题 Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them? I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with

openai gym env.P, AttributeError 'TimeLimit' object has no attribute 'P'

阅读更多关于 openai gym env.P, AttributeError 'TimeLimit' object has no attribute 'P'

问题 I'm currently reading Hands-On Reinforcement Learning with Python by Sudharsan Ravichandiran and on one of the first examples I run into this AttributeError: AttributeError 'TimeLimit' object has no attribute 'P' raised by the following line: for next_sr in env.P[state][action]: I can't find any documentation regarding env.P , but I found a similar example written in python2 here: https://gym.openai.com/evaluations/eval_48sirBRSRAapMjotYzjb6w/ I suppose env.P is part of an outdated library