Q-learning in neural network not 'learning'
问题 I have made a simple Tron game in C++ and a MLP with one hidden layer. I have implemented Q-learning in this neural network, however, it is not causing the agent to win more games over time (even after 1 million games). I will try to explain in text what I did, hopefully someone can spot a mistake, which might cause this problem. At every state there are four possible moves (north, east, south, west) and the rewards are at the end of the game (-1 for loss, 0 for draw, 1 for win). I initialise