The Death of a Beautiful Theory? Dopamine And Reward Prediction Error

Very early in the history of artificial intelligence research, it was apparent that cognitive agents needed to be able to maximize reward by changing their behavior. But this leads to a "credit-assignment" problem: how does the agent know which of its actions led to the reward?
An early solution was to select the behavior with the maximal predicted rewards, and to later adjust the likelihood of that behavior according to whether it ultimately led to the anticipated reward. These "temporal-difference" errors in reward prediction were first implemented in a 1950's checker-playing program, before exploding in popularity some 30 years later.

This repopularization seemed to originate from a tantalizing discovery: the brain's most ancient structures were releasing dopamine in exactly the way predicted by temporal-difference learning algorithms. Specifically, dopamine release in the ventral tegmental area (VTA) decreased in response to stimuli that were repeatedly paired without a reward—as though dopamine levels "dipped" to signal the overprediction (and under-delivery) of a reward. Secondly, dopamine release abruptly spikes in response to stimuli that are suddenly paired with a reward—as though dopamine is signaling the underprediction (and over-delivery) of a reward. Finally, when a previously-rewarded stimulus is no longer rewarded, dopamine levels dip, again suggesting overprediction and underdelivery of reward.

Thus, a beautiful computational theory was garnering support from some unusually beautiful data in neuroscience. Dopamine appeared to rise for items that predicted a reward, to drop for items that predict an absence of reward, and to show no response to neutral stimuli. But as noted by Thomas Huxley, in science "many a beautiful theory has been destroyed by an ugly fact."

Read full story in Developing Intelligence


User Rating:Your rating: None