I\'m implementing a REINFORCE with baseline algorithm, but I have a doubt with the discount reward function.
I implemented the discount reward functi