reinforcement-learning | 易学教程

Tensorflow 2 ValueError: Shapes (20, 1) and (20, 2) are incompatible in gym environment

阅读更多关于 Tensorflow 2 ValueError: Shapes (20, 1) and (20, 2) are incompatible in gym environment

问题 Just for learning I wanted to test this code. But there is a problem in it. I do not understand the problem. It says: ValueError: Shapes (20, 1) and (20, 2) are incompatible from the line loss = network.train_on_batch(states, discounted_rewards) Maybe there is something new in Tensorflow that was not there, the it was implemented. The code from the website: https://adventuresinmachinelearning.com/policy-gradient-tensorflow-2/ import gym import tensorflow as tf from tensorflow import keras

tf_agents custom time_step_spec

阅读更多关于 tf_agents custom time_step_spec

问题 I'm tinkering with tf-agents but I'm having trouble making a custom time_step_spec . I'm trying to train a tf-agent in gym 'Breakout-v0', I've made a function to preprocess the observation (game pixels) and now I want to modify the time_step and time_step_spec to reflect the new data. original time_step_spec.observation() is: BoundedTensorSpec(shape=(210, 160, 3), dtype=tf.uint8, name='observation', minimum=array(0, dtype=uint8), maximum=array(255, dtype=uint8)) mine would be:

neural network does not learn (loss stays the same)

阅读更多关于 neural network does not learn (loss stays the same)

问题 My project partner and I are currently facing a problem in our latest university project. Our mission is to implement a neural network that plays the game Pong. We are giving the ball position the ball speed and the position of the paddles to our network and have three outputs: UP DOWN DO_NOTHING. After a player has 11 points we train the network with all states, the made decisions and the reward of the made decisions (see reward_cal()). The problem we are facing is, that the loss is

Low GPU utilisation when running Tensorflow

阅读更多关于 Low GPU utilisation when running Tensorflow

问题 I've been doing Deep Reinforcement Learning using Tensorflow and OpenAI gym. My problem is low GPU utilisation. Googling this issue, I understood that it's wrong to expect much GPU utilisation when training small networks ( eg. for training mnist). But my Neural Network is not so small, I think. The architecture is similar to the given in the original deepmind paper (more or less). The architecture of my network is summarized below Convolution layer 1 (filters=32, kernel_size=8x8, strides=4)

Low GPU utilisation when running Tensorflow

阅读更多关于 Low GPU utilisation when running Tensorflow

How does DQN work in an environment where reward is always -1

阅读更多关于 How does DQN work in an environment where reward is always -1

问题 Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code that proves it. By working, I mean that when I train the agent, the agent quickly (within 300-500 episodes) learns how to solve the mountaincar problem. Below is an example from my trained agent. It is my understanding that ultimately there needs to be a "sparse reward"

python OpenAI gym monitor creates json files in the recording directory

阅读更多关于 python OpenAI gym monitor creates json files in the recording directory

问题 I am implementing value iteration on the gym CartPole-v0 environment and would like to record the video of the agent's actions in a video file. I have been trying to implement this using the Monitor wrapper but it generates json files instead of a video file in the recording directory. This is my code: env = gym.make('FrozenLake-v0') env = gym.wrappers.Monitor(env, 'recording', force=True) env.seed(0) optimalValue = valueIteration(env) st = time.time() policy = cal_policy(optimalValue) policy

python OpenAI gym monitor creates json files in the recording directory

阅读更多关于 python OpenAI gym monitor creates json files in the recording directory

python OpenAI gym monitor creates json files in the recording directory

阅读更多关于 python OpenAI gym monitor creates json files in the recording directory

Can tf.agent policy return probability vector for all actions?

阅读更多关于 Can tf.agent policy return probability vector for all actions?

问题 I am trying to train a Reinforcement Learning agent using TF-Agent TF-Agent DQN Tutorial. In my application, I have 1 action containing 9 possible discrete values (labeled from 0 to 8). Below is the output from env.action_spec() BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(8, dtype=int64)) I would like to get the probability vector contains all actions calculated by the trained policy, and do further processing in other application