reinforcement-learning

Reinforcement Learning Using Multiple Stock Ticker’s Datasets?

老子叫甜甜 提交于 2021-02-19 07:49:05
问题 Here’s a general question that maybe someone could point me in the right direction. I’m getting into Reinforcement Learning with Python 3.6/Tensorflow and I have found/tweaked my own model to train on historical data from a particular stock. My question is, is it possible to train this model on more than just one stock’s dataset? Every single machine learning article I’ve read on time series prediction and RL uses one dataset for training and testing, but my goal is to train a model on a

Image to Text - Pytesseract struggles with digits on windows

岁酱吖の 提交于 2021-02-11 12:03:28
问题 I'm trying to preprocess frames of a game in real-time for a ML project. I want to extract numbers from the frame, so I chose Pytesseract, since it looked quite good with text. Though, no matter how clear I make the text, it won't read it correctly. My code looks like this: section = process_screen(screen_image)[1] pixels = rgb_to_bw(section) #Makes the image grayscale pixels[pixels < 200] = 0 #Makes all non-white pixels black tess.image_to_string(pixels) => 'ye ml)' At best it outputs "ye ml

Reward function for learning to play Curve Fever game with DQN

会有一股神秘感。 提交于 2021-02-11 10:40:41
问题 I've made a simple version of Curve Fever also known as "Achtung Die Kurve". I want the machine to figure out how to play the game optimally. I copied and slightly modified an existing DQN from some Atari game examples that is made with Google's Tensorflow. I'm tyring to figure out an appropriate reward function. Currently, I use this reward setup: 0.1 for every frame it does not crash -500 for every crash Is this the right approach? Do I need to tweak the values? Or do I need a completely

Soft actor critic with discrete action space

[亡魂溺海] 提交于 2021-02-08 10:27:22
问题 I'm trying to implement the soft actor critic algorithm for discrete action space and I have trouble with the loss function. Here is the link from SAC with continuous action space: https://spinningup.openai.com/en/latest/algorithms/sac.html I do not know what I'm doing wrong. The problem is the network do not learn anything on the cartpole environment. The full code on github: https://github.com/tk2232/sac_discrete/blob/master/sac_discrete.py Here is my idea how to calculate the loss for

Keras model: Input shape dimension error for RL agent

谁说胖子不能爱 提交于 2021-02-08 09:20:07
问题 My goal is to develop a DQN-agent that will choose its action based on a certain strategy/policy. I previously worked with OpenAi gym-environments, but now I wanted to create my own RL environment. At this stage, the agent shall either choose a random action or choose his action based on the predictions given by a deep neural network (defined in the class DQN ). So far, I have setup both the neural net model and my environment . The NN shall receive states as its input. These states represent

Understanding the total_timesteps parameter in stable-baselines' models

别等时光非礼了梦想. 提交于 2021-02-07 06:26:04
问题 I'm reading through the original PPO paper and trying to match this up to the input parameters of the stable-baselines PPO2 model. One thing I do not understand is the total_timesteps parameter in the learn method. The paper mentions One style of policy gradient implementation... runs the policy for T timesteps (where T is much less than the episode length) While the stable-baselines documentation describes the total_timesteps parameter as (int) The total number of samples to train on

Understanding the total_timesteps parameter in stable-baselines' models

女生的网名这么多〃 提交于 2021-02-07 06:25:06
问题 I'm reading through the original PPO paper and trying to match this up to the input parameters of the stable-baselines PPO2 model. One thing I do not understand is the total_timesteps parameter in the learn method. The paper mentions One style of policy gradient implementation... runs the policy for T timesteps (where T is much less than the episode length) While the stable-baselines documentation describes the total_timesteps parameter as (int) The total number of samples to train on

How to effectively make use of a GPU for reinforcement learning?

走远了吗. 提交于 2021-02-06 08:51:05
问题 Recently i looked into reinforcement learning and there was one question bugging me, that i could not find an answer for: How is training effectively done using GPUs? To my understanding constant interaction with an environment is required, which for me seems like a huge bottleneck, since this task is often non-mathematical / non-parallelizable. Yet for example Alpha Go uses multiple TPUs/GPUs. So how are they doing it? 回答1: Indeed, you will often have interactions with the environment in

How to effectively make use of a GPU for reinforcement learning?

房东的猫 提交于 2021-02-06 08:48:33
问题 Recently i looked into reinforcement learning and there was one question bugging me, that i could not find an answer for: How is training effectively done using GPUs? To my understanding constant interaction with an environment is required, which for me seems like a huge bottleneck, since this task is often non-mathematical / non-parallelizable. Yet for example Alpha Go uses multiple TPUs/GPUs. So how are they doing it? 回答1: Indeed, you will often have interactions with the environment in

Error: `callbacks` must be a callable method that returns a subclass of DefaultCallbacks, got <class 'ray.rllib.agents.callbacks.DefaultCallbacks'>

亡梦爱人 提交于 2021-01-29 18:30:36
问题 when I run some codes(DDPG - Deep Deterministic Policy Gradient), this error occurred: ValueError: callbacks must be a callable method that returns a subclass of DefaultCallbacks, got <class 'ray.rllib.agents.callbacks.DefaultCallbacks'> my code is here: import json def load_policy(): log_dir = "/root/ray_results/DDPG_SimpleSupplyChain_2020-07-15_02-37-48j2fjk67_" # this path needs to be set manually checkpoint_id = "200" with open(f"{log_dir}/params.json", "r") as read_file: config = json