reinforcement-learning | 易学教程

Parallel processes in distributed tensorflow

阅读更多关于 Parallel processes in distributed tensorflow

I have neural network in tensorflow with trained parameters, it is "policy" for agent. Network is being updated in training loop in main tensorflow session in core program. In the end of each training cycle I need to pass this network to few parallel processes ("workers"), which will use it for collecting samples from interactions of agent's policy with environment. I need to do it in parallel, because simulating environment takes most of the time and runs only single-core. So, few parallel sampling processes are needed. I am struggling how to structure this in distributed tensorflow. What I

Training a Neural Network with Reinforcement learning

阅读更多关于 Training a Neural Network with Reinforcement learning

问题 I know the basics of feedforward neural networks, and how to train them using the backpropagation algorithm, but I'm looking for an algorithm than I can use for training an ANN online with reinforcement learning. For example, the cart pole swing up problem is one I'd like to solve with an ANN. In that case, I don't know what should be done to control the pendulum, I only know how close I am to the ideal position. I need to have the ANN learn based on reward and punishment. Thus, supervised

Markov Model descision process in Java

阅读更多关于 Markov Model descision process in Java

I'm writing an assisted learning algorithm in Java. I've run into a mathematical problem that I can probably solve, but because the processing will be heavy I need an optimum solution. That being said, if anyone knows a optimized library that will be totally awesome, but the language is Java so that will need to be taken into consideration. The idea is fairly simple: Objects will store combination of variables such as ABDC, ACDE, DE, AE. The max number of combination will be based on how many I can run without slowing down the program, so theoretically 100 lets say. The decision process will

Using Reinforcement Learning for Classfication Problems

阅读更多关于 Using Reinforcement Learning for Classfication Problems

问题 Can I use reinforcement learning on classification? Such as human activity recognition? And how? 回答1: There are two types of feedback. One is evaluative that is used in reinforcement learning method and second is instructive that is used in supervised learning mostly used for classification problems. When supervised learning is used, the weights of the neural network are adjusted based on the information of the correct labels provided in the training dataset. So, on selecting a wrong class,

Training only one output of a network in Keras

阅读更多关于 Training only one output of a network in Keras

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time. At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train. Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect? I'm using

What is the difference between Q-learning and Value Iteration?

阅读更多关于 What is the difference between Q-learning and Value Iteration?

问题 How is Q-learning different from value iteration in reinforcement learning? I know Q-learning is model-free and training samples are transitions (s, a, s', r) . But since we know the transitions and the reward for every transition in Q-learning, is it not the same as model-based learning where we know the reward for a state and action pair, and the transitions for every action from a state (be it stochastic or deterministic)? I do not understand the difference. 回答1: You are 100% right that if

When to use a certain Reinforcement Learning algorithm?

阅读更多关于 When to use a certain Reinforcement Learning algorithm?

问题 I'm studying Reinforcement Learning and reading Sutton's book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I'm reading about policy gradient methods and genetic algorithms for the resolution of decision problems. I have never had experience before in this topic and I'm having problems understanding when a technique should be preferred over another. I have a few ideas, but I'm not sure about them. Can someone briefly explain or tell me a source where I can

What is the difference between value iteration and policy iteration?

阅读更多关于 What is the difference between value iteration and policy iteration?

问题 In reinforcement learning, what is the difference between policy iteration and value iteration ? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly select a policy π, and find the reward of that policy. My doubt is that if you are selecting a random policy π in PI, how is it guaranteed to be the optimal policy, even if we are choosing several random policies. 回答1: Let's look at them side by side

What is the difference between Q-learning and Value Iteration?

阅读更多关于 What is the difference between Q-learning and Value Iteration?

How is Q-learning different from value iteration in reinforcement learning? I know Q-learning is model-free and training samples are transitions (s, a, s', r) . But since we know the transitions and the reward for every transition in Q-learning, is it not the same as model-based learning where we know the reward for a state and action pair, and the transitions for every action from a state (be it stochastic or deterministic)? I do not understand the difference. You are 100% right that if we knew the transition probabilities and reward for every transition in Q-learning, it would be pretty

What is a policy in reinforcement learning? [closed]

阅读更多关于 What is a policy in reinforcement learning? [closed]

I've seen such words as: A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. But still didn't fully understand. What exactly is a policy in reinforcement learning? The definition is correct, though not instantly obvious if you see it for the first time. Let me put it this way: a policy is an agent's strategy . For example, imagine a world where a robot moves across the room and the task is to get to the target point (x, y), where it gets a reward.