reinforcement-learning

Tensorflow 2 ValueError: Shapes (20, 1) and (20, 2) are incompatible in gym environment

心不动则不痛 提交于 2021-01-28 05:34:43
问题 Just for learning I wanted to test this code. But there is a problem in it. I do not understand the problem. It says: ValueError: Shapes (20, 1) and (20, 2) are incompatible from the line loss = network.train_on_batch(states, discounted_rewards) Maybe there is something new in Tensorflow that was not there, the it was implemented. The code from the website: https://adventuresinmachinelearning.com/policy-gradient-tensorflow-2/ import gym import tensorflow as tf from tensorflow import keras

tf_agents custom time_step_spec

a 夏天 提交于 2021-01-27 19:10:31
问题 I'm tinkering with tf-agents but I'm having trouble making a custom time_step_spec . I'm trying to train a tf-agent in gym 'Breakout-v0', I've made a function to preprocess the observation (game pixels) and now I want to modify the time_step and time_step_spec to reflect the new data. original time_step_spec.observation() is: BoundedTensorSpec(shape=(210, 160, 3), dtype=tf.uint8, name='observation', minimum=array(0, dtype=uint8), maximum=array(255, dtype=uint8)) mine would be:

neural network does not learn (loss stays the same)

我与影子孤独终老i 提交于 2021-01-27 13:14:35
问题 My project partner and I are currently facing a problem in our latest university project. Our mission is to implement a neural network that plays the game Pong. We are giving the ball position the ball speed and the position of the paddles to our network and have three outputs: UP DOWN DO_NOTHING. After a player has 11 points we train the network with all states, the made decisions and the reward of the made decisions (see reward_cal()). The problem we are facing is, that the loss is

Low GPU utilisation when running Tensorflow

寵の児 提交于 2021-01-27 07:10:20
问题 I've been doing Deep Reinforcement Learning using Tensorflow and OpenAI gym. My problem is low GPU utilisation. Googling this issue, I understood that it's wrong to expect much GPU utilisation when training small networks ( eg. for training mnist). But my Neural Network is not so small, I think. The architecture is similar to the given in the original deepmind paper (more or less). The architecture of my network is summarized below Convolution layer 1 (filters=32, kernel_size=8x8, strides=4)

Low GPU utilisation when running Tensorflow

狂风中的少年 提交于 2021-01-27 07:09:29
问题 I've been doing Deep Reinforcement Learning using Tensorflow and OpenAI gym. My problem is low GPU utilisation. Googling this issue, I understood that it's wrong to expect much GPU utilisation when training small networks ( eg. for training mnist). But my Neural Network is not so small, I think. The architecture is similar to the given in the original deepmind paper (more or less). The architecture of my network is summarized below Convolution layer 1 (filters=32, kernel_size=8x8, strides=4)

How does DQN work in an environment where reward is always -1

删除回忆录丶 提交于 2021-01-05 07:14:05
问题 Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code that proves it. By working, I mean that when I train the agent, the agent quickly (within 300-500 episodes) learns how to solve the mountaincar problem. Below is an example from my trained agent. It is my understanding that ultimately there needs to be a "sparse reward"

python OpenAI gym monitor creates json files in the recording directory

只愿长相守 提交于 2021-01-02 07:56:32
问题 I am implementing value iteration on the gym CartPole-v0 environment and would like to record the video of the agent's actions in a video file. I have been trying to implement this using the Monitor wrapper but it generates json files instead of a video file in the recording directory. This is my code: env = gym.make('FrozenLake-v0') env = gym.wrappers.Monitor(env, 'recording', force=True) env.seed(0) optimalValue = valueIteration(env) st = time.time() policy = cal_policy(optimalValue) policy

python OpenAI gym monitor creates json files in the recording directory

五迷三道 提交于 2021-01-02 07:56:02
问题 I am implementing value iteration on the gym CartPole-v0 environment and would like to record the video of the agent's actions in a video file. I have been trying to implement this using the Monitor wrapper but it generates json files instead of a video file in the recording directory. This is my code: env = gym.make('FrozenLake-v0') env = gym.wrappers.Monitor(env, 'recording', force=True) env.seed(0) optimalValue = valueIteration(env) st = time.time() policy = cal_policy(optimalValue) policy

python OpenAI gym monitor creates json files in the recording directory

☆樱花仙子☆ 提交于 2021-01-02 07:55:38
问题 I am implementing value iteration on the gym CartPole-v0 environment and would like to record the video of the agent's actions in a video file. I have been trying to implement this using the Monitor wrapper but it generates json files instead of a video file in the recording directory. This is my code: env = gym.make('FrozenLake-v0') env = gym.wrappers.Monitor(env, 'recording', force=True) env.seed(0) optimalValue = valueIteration(env) st = time.time() policy = cal_policy(optimalValue) policy

Can tf.agent policy return probability vector for all actions?

倾然丶 夕夏残阳落幕 提交于 2020-12-31 07:44:45
问题 I am trying to train a Reinforcement Learning agent using TF-Agent TF-Agent DQN Tutorial. In my application, I have 1 action containing 9 possible discrete values (labeled from 0 to 8). Below is the output from env.action_spec() BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(8, dtype=int64)) I would like to get the probability vector contains all actions calculated by the trained policy, and do further processing in other application