{"pk":31367,"title":"A Connectionist Architecture for Sequential Decision Learning","subtitle":null,"abstract":"a connectionist architecture and learning algorithm for sequential decision learning are presented. The architecture provides representations for probabilities and utilities. The learning algorithm provides a mechanism to learn from longterm rewards/utilities while observing information available locally in time. The mechanism is based on gradient ascent on the current estimate of the long-term reward in the weight spju^e defined by a \"policy\" network. The learning principle can be seen as a generalization of previous methods proposed to implement \"policy iteration\" mechanisms with connectionist networks. The algorithm is simulated for an \"agent\" moving in an environment described as a simple one-dimensional random walk. Results show the agent discovers optimal moving strategies in simple caises and learns how to avoid short-term suboptimal rewards in order to maximize long-term rewards in more complex cases.","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[],"section":"Posters","is_remote":true,"remote_url":"https://escholarship.org/uc/item/16237234","frozenauthors":[{"first_name":"Yves","middle_name":"","last_name":"Chauvin","name_suffix":"","institution":"Stanford University","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"1992-01-01T18:00:00Z","render_galley":null,"galleys":[{"label":"PDF","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/31367/galley/22436/download/"}]}