reinforcement-learning | 易学教程

Reinforcement Learning Using Multiple Stock Ticker’s Datasets?

阅读更多关于 Reinforcement Learning Using Multiple Stock Ticker’s Datasets?

问题 Here’s a general question that maybe someone could point me in the right direction. I’m getting into Reinforcement Learning with Python 3.6/Tensorflow and I have found/tweaked my own model to train on historical data from a particular stock. My question is, is it possible to train this model on more than just one stock’s dataset? Every single machine learning article I’ve read on time series prediction and RL uses one dataset for training and testing, but my goal is to train a model on a

Image to Text - Pytesseract struggles with digits on windows

阅读更多关于 Image to Text - Pytesseract struggles with digits on windows

问题 I'm trying to preprocess frames of a game in real-time for a ML project. I want to extract numbers from the frame, so I chose Pytesseract, since it looked quite good with text. Though, no matter how clear I make the text, it won't read it correctly. My code looks like this: section = process_screen(screen_image)[1] pixels = rgb_to_bw(section) #Makes the image grayscale pixels[pixels < 200] = 0 #Makes all non-white pixels black tess.image_to_string(pixels) => 'ye ml)' At best it outputs "ye ml

Reward function for learning to play Curve Fever game with DQN

阅读更多关于 Reward function for learning to play Curve Fever game with DQN

问题 I've made a simple version of Curve Fever also known as "Achtung Die Kurve". I want the machine to figure out how to play the game optimally. I copied and slightly modified an existing DQN from some Atari game examples that is made with Google's Tensorflow. I'm tyring to figure out an appropriate reward function. Currently, I use this reward setup: 0.1 for every frame it does not crash -500 for every crash Is this the right approach? Do I need to tweak the values? Or do I need a completely

Soft actor critic with discrete action space

阅读更多关于 Soft actor critic with discrete action space

问题 I'm trying to implement the soft actor critic algorithm for discrete action space and I have trouble with the loss function. Here is the link from SAC with continuous action space: https://spinningup.openai.com/en/latest/algorithms/sac.html I do not know what I'm doing wrong. The problem is the network do not learn anything on the cartpole environment. The full code on github: https://github.com/tk2232/sac_discrete/blob/master/sac_discrete.py Here is my idea how to calculate the loss for

Keras model: Input shape dimension error for RL agent

阅读更多关于 Keras model: Input shape dimension error for RL agent

问题 My goal is to develop a DQN-agent that will choose its action based on a certain strategy/policy. I previously worked with OpenAi gym-environments, but now I wanted to create my own RL environment. At this stage, the agent shall either choose a random action or choose his action based on the predictions given by a deep neural network (defined in the class DQN ). So far, I have setup both the neural net model and my environment . The NN shall receive states as its input. These states represent

Understanding the total_timesteps parameter in stable-baselines' models

阅读更多关于 Understanding the total_timesteps parameter in stable-baselines' models

问题 I'm reading through the original PPO paper and trying to match this up to the input parameters of the stable-baselines PPO2 model. One thing I do not understand is the total_timesteps parameter in the learn method. The paper mentions One style of policy gradient implementation... runs the policy for T timesteps (where T is much less than the episode length) While the stable-baselines documentation describes the total_timesteps parameter as (int) The total number of samples to train on

Understanding the total_timesteps parameter in stable-baselines' models

阅读更多关于 Understanding the total_timesteps parameter in stable-baselines' models

How to effectively make use of a GPU for reinforcement learning?

阅读更多关于 How to effectively make use of a GPU for reinforcement learning?

问题 Recently i looked into reinforcement learning and there was one question bugging me, that i could not find an answer for: How is training effectively done using GPUs? To my understanding constant interaction with an environment is required, which for me seems like a huge bottleneck, since this task is often non-mathematical / non-parallelizable. Yet for example Alpha Go uses multiple TPUs/GPUs. So how are they doing it? 回答1: Indeed, you will often have interactions with the environment in

How to effectively make use of a GPU for reinforcement learning?

阅读更多关于 How to effectively make use of a GPU for reinforcement learning?

Error: `callbacks` must be a callable method that returns a subclass of DefaultCallbacks, got <class 'ray.rllib.agents.callbacks.DefaultCallbacks'>

阅读更多关于 Error: `callbacks` must be a callable method that returns a subclass of DefaultCallbacks, got

问题 when I run some codes(DDPG - Deep Deterministic Policy Gradient), this error occurred: ValueError: callbacks must be a callable method that returns a subclass of DefaultCallbacks, got <class 'ray.rllib.agents.callbacks.DefaultCallbacks'> my code is here: import json def load_policy(): log_dir = "/root/ray_results/DDPG_SimpleSupplyChain_2020-07-15_02-37-48j2fjk67_" # this path needs to be set manually checkpoint_id = "200" with open(f"{log_dir}/params.json", "r") as read_file: config = json