reinforcement-learning

Q-learning in neural network not 'learning'

南笙酒味 提交于 2019-12-11 00:56:39
问题 I have made a simple Tron game in C++ and a MLP with one hidden layer. I have implemented Q-learning in this neural network, however, it is not causing the agent to win more games over time (even after 1 million games). I will try to explain in text what I did, hopefully someone can spot a mistake, which might cause this problem. At every state there are four possible moves (north, east, south, west) and the rewards are at the end of the game (-1 for loss, 0 for draw, 1 for win). I initialise

python binning data openAI gym

南楼画角 提交于 2019-12-08 10:04:16
问题 I am attempting to create a custom environment for reinforcement learning with openAI gym. I need to represent all possible values that the environment will see in a variable called observation_space . There are 3 possible actions for the agent to use called action_space To be more specific the observation_space is a temperature sensor which will see possible ranges from 50 to 150 degrees and I think I can represent all of this by: EDIT, I had the action_space numpy array wrong import numpy

Named entity recognition with a small data set (corpus)

半城伤御伤魂 提交于 2019-12-08 07:12:52
问题 I want to develop a Named entity recognition system in Persian language but we have a small NER tagged corpus for training ans test. Maybe In the future we'll have a better and bigger corpus. By the way I need a solution that get incrementally the better performance whenever the new data added without with merge the new data with old data and training from scratch. Is there any solution ? 回答1: Yes. With your help: it is a work in progress. It is JS and "No training ..." Please see https:/

The huge amount of states in q-learning calculation

ぐ巨炮叔叔 提交于 2019-12-07 23:54:14
问题 I implemented a 3x3 OX game by q-learning ( it works perfectly in AI v.s AI and AI v.s Human), but I can't go one step further to 4x4 OX game since it will eat up all my PC memory and crash. Here is my current problem: Access violation in huge array? In my understanding, a 3x3 OX game has a total 3(space, white, black) ^ 9 = 19683 possible states. ( same pattern different angle still count ) For a 4x4 OX game, the total state will be 3 ^ 16 = 43,046,721 For a regular go game, 15x15 board, the

Tensorflow gradient with respect to matrix

♀尐吖头ヾ 提交于 2019-12-07 13:48:05
问题 Just for context, I'm trying to implement a gradient descent algorithm with Tensorflow. I have a matrix X [ x1 x2 x3 x4 ] [ x5 x6 x7 x8 ] which I multiply by some feature vector Y to get Z [ y1 ] Z = X [ y2 ] = [ z1 ] [ y3 ] [ z2 ] [ y4 ] I then put Z through a softmax function, and take the log. I'll refer to the output matrix as W. All this is implemented as follows (little bit of boilerplate added so it's runnable) sess = tf.Session() num_features = 4 num_actions = 2 policy_matrix = tf.get

Named entity recognition with a small data set (corpus)

﹥>﹥吖頭↗ 提交于 2019-12-07 07:55:31
I want to develop a Named entity recognition system in Persian language but we have a small NER tagged corpus for training ans test. Maybe In the future we'll have a better and bigger corpus. By the way I need a solution that get incrementally the better performance whenever the new data added without with merge the new data with old data and training from scratch. Is there any solution ? Yes. With your help: it is a work in progress. It is JS and "No training ..." Please see https://github.com/redaktor/nlp_compromise/ ! It is a fork where I worked on NER during the last days and it will be

The huge amount of states in q-learning calculation

不问归期 提交于 2019-12-06 13:01:35
I implemented a 3x3 OX game by q-learning ( it works perfectly in AI v.s AI and AI v.s Human), but I can't go one step further to 4x4 OX game since it will eat up all my PC memory and crash. Here is my current problem: Access violation in huge array? In my understanding, a 3x3 OX game has a total 3(space, white, black) ^ 9 = 19683 possible states. ( same pattern different angle still count ) For a 4x4 OX game, the total state will be 3 ^ 16 = 43,046,721 For a regular go game, 15x15 board, the total state will be 3 ^ 225 ~ 2.5 x 10^107 Q1. I want to know my calculation is correct or not. ( for

Tensorflow gradient with respect to matrix

房东的猫 提交于 2019-12-06 03:07:10
Just for context, I'm trying to implement a gradient descent algorithm with Tensorflow. I have a matrix X [ x1 x2 x3 x4 ] [ x5 x6 x7 x8 ] which I multiply by some feature vector Y to get Z [ y1 ] Z = X [ y2 ] = [ z1 ] [ y3 ] [ z2 ] [ y4 ] I then put Z through a softmax function, and take the log. I'll refer to the output matrix as W. All this is implemented as follows (little bit of boilerplate added so it's runnable) sess = tf.Session() num_features = 4 num_actions = 2 policy_matrix = tf.get_variable("params", (num_actions, num_features)) state_ph = tf.placeholder("float", (num_features, 1))

How to accumulate and appy gradients for Async n-step DQNetwork update in Tensorflow?

蓝咒 提交于 2019-12-05 04:04:13
I am trying to implement Asynchronous Methods for Deep Reinforcement Learning and one of the steps requires to accumulate the gradient over different steps and then apply it. What is the best way to achieve this in tensorflow? I got so far as to accumulate the gradient and I don't think is the fastest way to achieve it (lots of transfers from tensorflow to python and back). Any suggestions are welcome. This is my code of a toy NN. It does not model or compute anything it just exercise the operations that I want to use. import tensorflow as tf from model import * graph = tf.Graph() with graph

Reinforcement Learning With Variable Actions

≡放荡痞女 提交于 2019-12-04 19:24:20
问题 All the reinforcement learning algorithms I've read about are usually applied to a single agent that has a fixed number of actions. Are there any reinforcement learning algorithms for making a decision while taking into account a variable number of actions? For example, how would you apply a RL algorithm in a computer game where a player controls N soldiers, and each soldier has a random number of actions based its condition? You can't formulate fixed number of actions for a global decision