Associative Reinforcement Learning using Linear Probabilistic Concepts

نویسندگان

  • Naoki Abe
  • Philip M. Long
چکیده

We consider the problem of maximizing the total number of successes while learning about a probability function determining the likelihood of a success. In particular, we consider the case in which the probability function is represented by a linear function of the attribute vector associated with each action/choice. In the scenario we consider, learning proceeds in trials and in each trial, the algorithm is given a number of alternatives to choose from, each having an attribute vector associated with it, and for the alternative it selects it gets either a success or a failure with probability determined by applying a xed but unknown linear success probability function to the attribute vector. Our algorithms consist of a learning method like the Widrow-Ho rule and a probabilistic selection strategy which work together to resolve the so-called exploration-exploitation tradeo . We analyze the performance of these methods by proving bounds on the worst-case regret, or how many less successes they expect to get as compared to the ideal (but unrealistic) strategy that knows the target probability function. Our analysis shows that the worst-case (expected) regret for our methods is almost optimal: the upper bounds grow with the number m of trials and the number n of alternatives like O(m 3=4 n 1=2 ) and O(m 4=5 n 2=5 ), and the lower bound is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unifying Probabilistic View of Associative Learning

Two important ideas about associative learning have emerged in recent decades: (1) Animals are Bayesian learners, tracking their uncertainty about associations; and (2) animals acquire long-term reward predictions through reinforcement learning. Both of these ideas are normative, in the sense that they are derived from rational design principles. They are also descriptive, capturing a wide rang...

متن کامل

A Biologically Inspired Method for Conceptual Imitation Using Reinforcement Learning

Levels Observations are categorized to concepts with respect to some principles that depend on physical and=or functional characteristics of the items. From this perspective, Zentall et al. have categorized concepts to three levels of abstraction (Zentall et al. 2002) (see Figure 1): Perceptual: These concepts are formed solely by measuring similarity of instances in perceptual space. Such data...

متن کامل

Associative Neural Models for Biomimetic Multi- Modal Learning in a Mirror Neuron-based Robot

By using neurocognitive evidence on mirror neuron system concepts the MirrorBot project has developed neural models for intelligent robot behaviour. These models employ diverse learning approaches such as reinforcement learning, self-organisation and associative learning to perform cognitive robotic operations such as language grounding in actions, object recognition, localisation and docking. ...

متن کامل

Associative Reinforcement Learning - A Proposal to Build Truly Adaptive Agents and Multi-agent Systems

In this position paper we propose to enhance learning algorithms, reinforcement learning in particular, for agents and for multi-agent systems, with the introduction of concepts and mechanisms borrowed from associative learning theory. It is argued that existing algorithms are limited in that they adopt a very restricted view of what “learning” is, partly due to the constraints imposed by the M...

متن کامل

Reinforcement Learning using Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution

The reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward(Sutton & Barto, 1998). Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Temporal Difference (TD) learning is one of the re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999