I am doing a project about reinforcement learning and choose q-learning algorithm in this algorithm there are:
state (s) action (a)
if i choose on and off as a