{"pk":30561,"title":"A Dual Back-Propagation Scheme for Scalar Reward Learning","subtitle":null,"abstract":"Explicit supervised learning rules [e.g. the delta rule] require that each of the output network receive a training signal indicating the \"correct\" response value; the unit can then adjust itsparameters, so that its future response to the same stimulus is closer to the desired value. A much morerealistic assumption for the nature of a supervisory signal is a single scalar \"goodness-of-response\" or\"reward\" signal. This credit assignment problem is handled here by a supervisory network which monitorsthe activities of both the sensory and effector units, and learns to predict the value of the reward signalusing the generalized delta rule of Rumeihart, Hinton, and Williams (1986). The activity of a particular\"predictor unit\" thus comes to be associated with the expected reward. Having learned to mimic theenvironment's reward criteria, the supervisory network can provide each effector unit, by way of aback-propagation scheme, with an individualized correction signal that will lead to increased activity in thepredictor. The actual reward is hence enhanced to the extent that the predicted reward is reliable","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[],"section":"Connectionism I","is_remote":false,"remote_url":null,"frozenauthors":[{"first_name":"Paul","middle_name":"","last_name":"Munro","name_suffix":"","institution":"University of Pittsburgh","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"1987-01-01T18:00:00Z","render_galley":null,"galleys":[{"label":"PDF","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/30561/galley/20410/download/"}]}