What is the reward in Reinforcement Learning?

In summary, it seems that reinforcement learning algorithms rely on external rewards to motivate the agent to continue learning. However, I'm not sure how this could be programmed into an algorithm, and even if it could, how would it know to respond as a biological entity would.
  • #1
ngrunenberg
9
2
I know I'm not that bright and I realize that this is a silly question to anyone in the field, but I was curious what the reward is in reinforcement learning algorithms.

I understand the concept behind reinforcement learning, though I am unsure of how you could program a reward into a program. There is no limbic system that would respond positively because it has been rewarded with an influx of dopamine, and even if we could program this into an algorithm, how would it know to respond as a biological entity would; I imagine void of having a biological "purpose" to perpetuate ones genes, there would be no real reward that would bring the agent closer to said purpose.

Again, apologies for my ignorance and thanks in advance for taking the time to reply.
 
Computer science news on Phys.org
  • #2
Here’s a blog on reinforcement learning

https://vinodsblog.com/2018/04/16/reinforcement-learning-reward-for-learning/

My view is reinforcement learning is like course correction as you drive a car. You feel a sense of accomplishment if you stay within the lines. The reward is staying within the lines. But if your car veers left or right then you adjust to compensate and stay within the lines.

The algorithm does the same as it gets things right, the evaluation continues to apply rewards/adjustments to continue in that mode but if they evaluation decreases then a negative reward /adjustment is applied. The evaluation indicates if things are getting better, staying the same or getting worse and the algorithms adjusts weights accordingly. The reward algorithm evaluates how well things stayed within the bounds of the system. A learn as you go scheme aka continuous learning.
 
Last edited:
  • Like
Likes QuantumQuest and ngrunenberg
  • #3
One key point made in the blog referenced above is that there are three kinds of learning systems:

1) supervised learning where you train it and test it until it works well and then it goes into production and no changes are made until the next update.

It's good for identifying known patterns

2) unsupervised where it identifies new patterns using statistics to locate interesting clusters of data

It's good for finding hidden trends.

3) reinforced learning where the system is training itself continuously so it continually gets better and better doing the task at hand

It's good for learning a behavior. As an example, you might have an RL cruise control with various inputs and a driver. The driver sets the speed and turns on the system. The system tries to match the speed noting motor RPMs, uphill downhill positioning, LIDAR ... whatever cool gadget you can think of. However, every so often the driver does a correction that the system notes and it learns from it to match the driver's style of driving while at the same time maintaining the speed. So its reward is the driver leaves it alone and the punishment is when the driver brakes or hits the gas...

kind of like when you're driving on a long and lonely road while your spouse is sleeping peacefully in the passenger seat next to you and then you hit a bump... What happened? are you okay? Did you fall asleep? Can I drive? When will we get there? and then the kids wake up we're hungry... are we there yet?

The joys of driving!
 
  • Like
Likes ngrunenberg
  • #4
Google used to provide a service. I think it was called Google 411. It was a yellow pages service that worked with dumb flip phones or land line phones, not the Internet. You might say, "auto repair in scotia new york" or "quick, get me a lawyer" and it would give you a phone number. Google's goal was to learn to recognize speech from any user, regardless of voice or accent. That technology is heavily used in today's smart speakers.

If the user accepts the first suggestion and asks to be connected to that number, that is positive reinforcement.
If the user hangs up, that is negative reinforcement.
If the user asks for more suggestions, that is intermediate reinforcement.

That doesn't sound mysterious at all.
 
  • Like
Likes QuantumQuest, jedishrfu and ngrunenberg
  • #5
Thank you for the explanation and the link to the blog, that definitely cleared a few things up.

anorlunda said:
If the user accepts the first suggestion and asks to be connected to that number, that is positive reinforcement.
If the user hangs up, that is negative reinforcement.
If the user asks for more suggestions, that is intermediate reinforcement.

That doesn't sound mysterious at all.

It's not the process itself that is mysterious, I understand the concept; what is "mysterious" to me is how you can reward something that has no subjective interpretation of what a reward is. It makes sense in children and animals; the cessation of pain is a reward when learning to not walk into fire for example, but I fail to see the analogue in a system that has no reason to avoid mistakes. I'm not sure if I've articulated my issue well enough so sorry in advance for that.
 
  • #6
This is just a naming convention. We can relate to rewards and punishments as positive and negative but with a more viceral feeling. There are similar notions in electrical systems defining plugs and sockets in terms of male and female connectors but clearly there’s no procreation involved.
 
  • Like
Likes ngrunenberg
  • #7
ngrunenberg said:
It's not the process itself that is mysterious, I understand the concept; what is "mysterious" to me is how you can reward something that has no subjective interpretation of what a reward is. It makes sense in children and animals; the cessation of pain is a reward when learning to not walk into fire for example, but I fail to see the analogue in a system that has no reason to avoid mistakes. I'm not sure if I've articulated my issue well enough so sorry in advance for that.

You're trying to anthropomorphize it.

Many of these systems are neural nets with many adjustable parameters. A set of adjustments that work well, we keep. Those that fail, we discard. Then repeat with new test data. Continue until the machine works almost always. That is a way to apply of reward/punishment to machines. It is only an analogy to human reward/punishment.
 
  • Like
Likes ngrunenberg and jedishrfu
  • #8
Reward is something that the system receives at the end of the task that provides some information as to how well the task has been completed.

The subjective notion of reward still comes from the human designer of the system, since it is the human designer that specifies "how well the task has been completed".

In this respect, reinforcement learning is not any different from supervised learning, since it is the human designer that specifies "how well the task has been completed". What is different in reinforcement learning is that information about how well the task has been completed is provided with not so much detail (just various degrees of good or bad), and we do not know exactly which action performed some time before the reward was obtained was good or not.
 
  • #9
You may like to look at how reward helps to drive learning in the Rescorla-Wagner model, a fairly successful mathematical model describing biological reinforcement learning.
https://en.wikipedia.org/wiki/Rescorla–Wagner_model

The "reward prediction error" or "surprise" of the Rescorla-Wagner model is a simple form of the "reward prediction error" or "temporal difference error" that is a better description of some biological reinforcement learning, and also used in machine reinforcement learning (eg. in Tesauro's backgammon player) .
https://en.wikipedia.org/wiki/Temporal_difference_learning
https://medium.com/jim-fleming/before-alphago-there-was-td-gammon-13deff866197
 
  • Like
Likes ngrunenberg, QuantumQuest and anorlunda
  • #10
Thank you all for clearing up my confusion! I appreciate the help; especially the links.
 

Related to What is the reward in Reinforcement Learning?

1. What is the definition of reward in Reinforcement Learning?

The reward in Reinforcement Learning refers to a numerical value that is given to an agent based on its actions and the environment it is in. It serves as a measure of the agent's performance and is used to guide the agent towards achieving its goal.

2. How is the reward calculated in Reinforcement Learning?

The reward is calculated by the Reinforcement Learning algorithm, which takes into account the state of the environment, the actions taken by the agent, and the desired outcome. The algorithm assigns a value to each action and updates the reward based on the success or failure of the action.

3. What is the role of reward in Reinforcement Learning?

The reward serves as the incentive for the agent to take certain actions that will lead to a desired outcome. It guides the agent towards making decisions that will result in the maximum possible reward and helps the agent learn from its experiences.

4. Can the reward function be modified in Reinforcement Learning?

Yes, the reward function can be modified in Reinforcement Learning to change the behavior of the agent. For example, a higher reward can be given to actions that lead to a faster achievement of the goal, or a penalty can be given for actions that result in negative consequences.

5. How does the reward impact the learning process in Reinforcement Learning?

The reward plays a crucial role in the learning process of Reinforcement Learning. It helps the agent determine which actions are beneficial and which are not, and guides the agent towards making better decisions in the future. The ultimate goal of the agent is to maximize the cumulative reward over time.

Similar threads

Replies
10
Views
2K
  • STEM Academic Advising
Replies
1
Views
1K
  • Programming and Computer Science
2
Replies
63
Views
9K
  • STEM Academic Advising
Replies
6
Views
1K
Replies
4
Views
2K
  • Biology and Medical
Replies
1
Views
1K
Replies
4
Views
867
  • New Member Introductions
Replies
4
Views
658
Replies
4
Views
1K
Replies
6
Views
2K
Back
Top