Short answer: Yes.
Detailed answer: yes but it's an overkill. Reinforcement learning is useful when you don't have labeled dataset to learn the correct policy, so you need to develop correct strategy based on the rewards. This also allows to backpropagate through non-differentiable blocks (which I suppose is not your case). The biggest drawback of reinforcement learning methods is that thay are typically took a VERY large amount of time to converge. So, if you possess labels, it would be a LOT more faster and easier to use regular supervised learning.