RL algorithm

The steps involved in typical RL algorithm are as follows:

  1. First, the agent interacts with the environment by performing an action
  2. The agent performs an action and moves from one state to another
  3. And then the agent will receive a reward based on the action it performed
  4. Based on the reward, the agent will understand whether the action was good or bad
  5. If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process