Jordan Pollack
Brandeis University
Abstract
We work on learning agents who face an environment (a teacher,
training set or fitness function) which is dynamically modulated as
the agent adapts - often as the result of other adapting agents. We
use the term "co-evolution" to refer to an inspiring process from
nature which supports our goal of growing systems of great complexity.
Many scientists have worked on this in different applications, such as
prisoners dilemmas, predator/prey, code warriors, virtual robots,
self-playing game-learners, with wildly varying success. We will
present some of our results - in game learning, pattern classification,
and language induction. In particular I focus on a simple and surprising
hill-climbing replication of Tesauro's 1992 self-learning
backgammon player. Although learning often fails through early
convergence to unexpectedly stable situations, for backgammon the
simplest possible learning apparently succeeded.
To understand why, we describe the teacher and learner as agents in a
learning game, we call the "meta game of learning" (MGL). The teacher
iteratively tests the student, and the student can adapt or modify
itself (within some mutation space) to maximize its utility. Assuming
the payoff for the student is simply high scores, the payoff for the
teacher is quite problematic: If the teacher is paid for the student's
correct answers, the teacher may give easy tests and learning halts.
On the other hand, if the teacher is paid for the student's failures,
the teacher may give tests which are unlearnable within the mutation
space of the student, and learning halts again! When teacher and
student are the same, as in co-evolution or self-play, MSS's are
abundant.
My conclusions focus on features of backgammon, and perhaps of
life itself, which prevent MSS's in the MGL.
Oliver Guenther, Tad Hogg and Bernardo A. Huberman
Abstract
Embedding microscopic sensors, computers and actuators into materials
allows physical systems to actively monitor and respond to their
environments. This leads to the possibility of creating smart matter,
i.e., materials whose properties can be changed under program control to
suit varying constraints. A key difficulty in realizing the potential of
smart matter is developing the appropriate control programs. One
approach to this problem is a multiagent control system, which can
provide robust and rapid response to environmental changes. To improve
the performance of such systems, we describe how the agents'
organization can adapted through simple learning mechanisms. As a
specific example, we consider maintaining a physical system near an
unstable configuration, a particularly challenging application for smart
matter. This shows how the system's organization can adapt to the local
physical structure to improve performance.
The postscript will not be available on our web pages until July. For now
you can access
ftp://parcftp.xerox.com/pub/dynamics/smart.html
which has the relevant background material on multiagent control of smart
matter.
Tucker Balch
Abstract
This paper describes research investigating behavioral specialization
in learning robot teams. Each agent is provided a common set of
skills (motor schema-based behavioral assemblages) from which it
builds a task-achieving strategy using reinforcement learning. The
agents learn individually to activate particular behavioral
assemblages given their current situation and a reward signal. The
experiments, conducted in robot soccer simulations, evaluate the
agents in terms of performance, policy convergence, and behavioral
diversity. The results show that in many cases, robots will
automatically diversify by choosing heterogeneous behaviors. The
degree of diversification and the performance of the team depend on
the reward structure. When the entire team is jointly rewarded or
penalized (global reinforcement), teams tend towards heterogeneous
behavior. When agents are provided feedback individually (local
reinforcement), they converge to identical policies.
URL: http://www.cc.gatech.edu/grads/b/Tucker.Balch/papers/learningroles.ps.Z
Edwind de Jong
Abstract
Agents in Multi Agent Systems can coordinate their actions by
communicating. We investigate a minimal form of communication, where the
signals that agents send represent evaluations of the behavior of the
receiving agent. Learning to act according to these signals is a typical
Reinforcement Learning problem. The backpropagation neural network has
been used to predict rewards that will follow an action. The first
results made clear that a mechanism for balancing between exploitation
and exploration was needed. We introduce the Exploration Buckets
algorithm, a method that favors both actions with high prediction errors
and actions that have been ignored for some time. The algorithm's scope
is not restricted to a single learning algorithm, and its main
characterics are its insensibility to large (or even continuous) state
spaces and its appropriateness for online learning; the Exploration /
Exploitation balance does not depend on properties external to the
system, such as time.
Available as: http://arti.vub.ac.be/~edwin/publications/Exploration.ps
or http://arti.vub.ac.be/~edwin/publications/Exploration.ps.gz
Jose Vidal and Edmund H. Durfee
Abstract
We provide a framework for the study of learning in certain types of
multi-agent systems (MAS), that divides an agent's knowledge about
others into different ``types''. We use concepts from computational
learning theory to calculate the relative sample complexities of
learning the different types of knowledge, given either a supervised
or a reinforcement learning algorithm. These results apply only for
the learning of a fixed target function, which would probably not
exist if the other agents are also learning. We then show how a
changing target function affects the learning behaviors of the agents,
and how to determine the advantages of having lower sample
complexity. Our results can be used by a designer of a learning agent
in a MAS to determine which knowledge he should put into the agent and
which knowledge should be learned by the agent.
Peter Stone and Manuela Veloso
Abstract
Although Decision Trees are widely used for classification tasks, they
are typically not used for agent control. This paper presents a novel
technique for agent control in a complex multiagent domain based on
the confidence factors provided by the C4.5 Decision Tree algorithm.
Using Robotic Soccer as an example of such a domain, this paper
incorporates a previously-trained Decision Tree into a
full multiagent behavior that is capable of controlling agents
throughout an entire game. Along with using Decision Trees for
control, this behavior also makes use of the ability to reason about
action-execution time to eliminate options that would not have
adequate time to be executed successfully. The newly created behavior
is tested empirically in game situations.
http://www.cs.cmu.edu/afs/cs/user/pstone/public/papers/97aaai/final-workshop.ps.gz
http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/97aaai/final-workshop.html
Claudia V. Goldman and Jeffrey S. Rosenschein
Abstract
We investigate how agents can learn to become experts, and eventually
organize themselves appropriately for a range of tasks. Our aim is to
look at evolutionary processes that lead to organizations of experts.
The distributed artificial intelligence (DAI) community has dealt with
multiagent systems that organize themselves in order to achieve a specific
shared goal. Various organizations can arise that will effectively
balance the load on the agents and improve their performance.
We here look at the process of emergence of an organization as a step
that takes place prior to the execution of a task, and as a general
process related to a range of problems in a domain.
To explore the ideas set forward, we designed and implemented a
testbed based on the idea of the game of Life. We present experimental
results that show different patterns of organizations that might
evolve in a multiagent system.
URL: http://www.cs.huji.ac.il/~clag
David C Parkes and Lyle H Ungar
Abstract
The goal of a self-interested agent within a multiagent system is to
maximize its utility over time. In a situation of strategic
interdependence, where the actions of one agent may affect the
utilities of other agents, the optimal behavior of an agent must be
conditioned on the expected behaviors of the other agents in the
system. Standard game theory assumes that the rationality and
preferences of all the agents is common knowledge: each agent is then
able to compute the set of possible equilibria, and if there is a
unique equilibrium, choose a best-response to the actions that the
other agents will all play.
Real agents acting within a multiagent system face multiple problems:
the agents may have incomplete information about the preferences and
rationality of the other agents in the game, computing the equilibria
can be computationally complex, and there might be many equilibria
from which to choose. An alternative explanation of the emergence of a
stable equilibrium is that it arises as the long-run outcome of a
repeated game, in which bounded-rational agents adapt their strategies
as they learn about the other agents in the system. We review some
possible models of learning for games, and then show the pros and cons
of using learning in a particular game, the Compensation
Mechanism, a mechanism for the efficient coordination of actions
within a multiagent system.
http://www.cis.upenn.edu/~dparkes/final.ps
M V Nagendra Prasad, and Victor R Lesser
Abstract
The work presented here deals with ways to improve problem solving
control skills of cooperative agents. We propose situation-specific
learning as a way to learn cooperative control knowledge in complex
multi-agent systems. We demonstrate the power of situation-specific
learning in the context of dynamic choice of a coordination strategy
based on the problem solving situation. We present a learning
system, called COLLAGE, that uses meta-level information in the form
of abstract characterization of the coordination problem instance to
learn to choose the appropriate coordination strategy from among a
class of strategies.
ftp://dis.cs.umass.edu/pub/coop-learn-aaai97.ps
Maria Gordin, Narendra Puppala, and Sandip Sen
Abstract
Multi-agent systems require coordination of sources with distinct
expertise to perform complex tasks effectively. In this paper, we use
a co-evolutionary approach of using genetic algorithms (GAs) to evolve
multipleindividuals who can effectively cooperate to solve a common
problem. We concurrently run a GA for each individual in the
group. We experiment with a room painting domain which requires
cooperation of two agents. We have used two mechanisms for evaluating
an individual in one population: (a) pair it randomly with members
from the other population, (b) pair it with members of the other
population in a shared memory containing the best pairs found so far.
Both the approaches are successful in generating optimal behavior
patterns. However, our preliminary results exhibit a slight edge for
the shared memory approach.
Sandip Sen and Neeraj Arora
Abstract
Agents that learn about other agents and can exploit this information possess a distinct
advantage in competitive situations. Games provide stylized adversarial environments to
study agent learning strategies. Researchers have developed game playing programs that
learn to play better from experience. We have developed a learning program that does not
learn to play better, but learns to identify and exploit the weaknesses of a particular
opponent by repeatedly playing it over several games. We propose a scheme for learning
opponent action probabilities and a utility maximization framework that exploits this
learned opponent model. We show that the proposed expected utility maximization strategy
generalizes the traditional maximin strategy, and allows players to benefit by taking
calculated risks that are avoided by the maximin strategy. Experiments in the popular
board game of Connect-4 show that a learning player consistently outperforms a
non-learning player when pitted against another automated player using a weaker
heuristic. Though our proposed mechanism does not improve the skill level of a computer
player, it does improve its ability to play more effectively against a weaker opponent.
Thomas Haynes
Abstract
We have integrated the distributed search of genetic programming based
systems with collective memory to form a collective adaptation search
method. Such a system significantly improves search as problem
complexity is increased. However, there is still considerable scope
for improvement. In collective adaptation, search agents gather
knowledge of their environment and deposit it in a central information
repository. Process agents are then able to manipulate that focused
knowledge, exploiting the exploration of the search agents. We
examine the utility of increasing the capabilities of the centralized
process agents.
Caroline Claus and Craig Boutilier
Abstract
Reinforcement learning can provide a robust and natural means
for agents to learn how to coordinate their action choices in
cooperative multiagent systems. We examine some of the factors
that can influence the dynamics of the learning process in such
a setting. We first distinguish reinforcement learners that are
unaware of (or ignore) the presence of other agents from those
that explicitly attempt to learn the value of joint actions and
the strategies of their counterparts. We study Q-learning in
cooperative multiagent systems under these two perspectives, focusing
on the influence of partial action observability, game structure, and
exploration strategies on convergence to (optimal and suboptimal)
Nash equilibria and on learned Q-values. We also consider variants
of the usual exploration strategies that can induce convergence to
optimal equilibria in cases where they might not otherwise be
attained.