reinforcement-learning | 易学教程

Can tf.agent policy return probability vector for all actions?

阅读更多关于 Can tf.agent policy return probability vector for all actions?

问题 I am trying to train a Reinforcement Learning agent using TF-Agent TF-Agent DQN Tutorial. In my application, I have 1 action containing 9 possible discrete values (labeled from 0 to 8). Below is the output from env.action_spec() BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(8, dtype=int64)) I would like to get the probability vector contains all actions calculated by the trained policy, and do further processing in other application

Slow training on CPU and GPU in a small network (tensorflow)

阅读更多关于 Slow training on CPU and GPU in a small network (tensorflow)

问题 Here is the original script I'm trying to run on both CPU and GPU, I'm expecting a much faster training on GPU however it's taking almost the same time. I made the following modification to main() (the first 4 lines) because the original script does not activate / use the GPU. Suggestions ... ? def main(): physical_devices = tf.config.experimental.list_physical_devices('GPU') if len(physical_devices) > 0: tf.config.experimental.set_memory_growth(physical_devices[0], True) print('GPU activated

Slow training on CPU and GPU in a small network (tensorflow)

阅读更多关于 Slow training on CPU and GPU in a small network (tensorflow)

OpenAI Gym: Understanding `action_space` notation (spaces.Box)

阅读更多关于 OpenAI Gym: Understanding `action_space` notation (spaces.Box)

问题 I want to setup an RL agent on the OpenAI CarRacing-v0 environment, but before that I want to understand the action space. In the code on github line 119 says: self.action_space = spaces.Box( np.array([-1,0,0]), np.array([+1,+1,+1])) # steer, gas, brake How do I read this line? Although my problem is concrete wrt CarRacing-v0 I would like to understand the spaces.Box() notation in general 回答1: Box means that you are dealing with real valued quantities. The first array np.array([-1,0,0] are

OpenAI Gym: Understanding `action_space` notation (spaces.Box)

阅读更多关于 OpenAI Gym: Understanding `action_space` notation (spaces.Box)

Deep Q Learning WITHOUT OpenAI Gym

阅读更多关于 Deep Q Learning **WITHOUT** OpenAI Gym

来源： https://stackoverflow.com/questions/61526437/deep-q-learning-without-openai-gym

Deep Q Learning WITHOUT OpenAI Gym

阅读更多关于 Deep Q Learning **WITHOUT** OpenAI Gym

来源： https://stackoverflow.com/questions/61526437/deep-q-learning-without-openai-gym

pytoch RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1

阅读更多关于 pytoch RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1

来源： https://stackoverflow.com/questions/63072770/pytoch-runtimeerror-dimension-out-of-range-expected-to-be-in-range-of-1-0

'UnityEnvironment' object has no attribute 'behavior_spec'

阅读更多关于 'UnityEnvironment' object has no attribute 'behavior_spec'

问题 I followed this link to doc to create environment of my own. But when i run this from mlagents_envs.environment import UnityEnvironment env = UnityEnvironment(file_name="v1-ball-cube-game.x86_64") env.reset() behavior_names = env.behavior_spec.keys() print(behavior_names) Game window pop up and then terminal show error saying Traceback (most recent call last): File "index.py", line 6, in <module> behavior_names = env.behavior_spec.keys() AttributeError: 'UnityEnvironment' object has no

Problem Using Keras Sequential Model for “reinforcelearn” Package in R

阅读更多关于 Problem Using Keras Sequential Model for “reinforcelearn” Package in R

问题 I am trying to use a keras (version 2.2.50) neural network / sequential model to create a simple agent in a reinforcement learning setting using the reinforcelearn package (version 0.2.1) according to this vignette: https://cran.r-project.org/web/packages/reinforcelearn/vignettes/agents.html . This is the code I use: library('reinforcelearn') library('keras') model = keras_model_sequential() %>% layer_dense(units = 10, input_shape = 4, activation = "linear") %>% compile(optimizer = optimizer