AAAI-97 Workshop on Multiagent Learning Abstracts

Invited talk: Co-evolution and Mediocrity in Learning

Jordan Pollack
Brandeis University
Abstract
We work on learning agents who face an environment (a teacher, training set or fitness function) which is dynamically modulated as the agent adapts - often as the result of other adapting agents. We use the term "co-evolution" to refer to an inspiring process from nature which supports our goal of growing systems of great complexity. Many scientists have worked on this in different applications, such as prisoners dilemmas, predator/prey, code warriors, virtual robots, self-playing game-learners, with wildly varying success. We will present some of our results - in game learning, pattern classification, and language induction. In particular I focus on a simple and surprising hill-climbing replication of Tesauro's 1992 self-learning backgammon player. Although learning often fails through early convergence to unexpectedly stable situations, for backgammon the simplest possible learning apparently succeeded. To understand why, we describe the teacher and learner as agents in a learning game, we call the "meta game of learning" (MGL). The teacher iteratively tests the student, and the student can adapt or modify itself (within some mutation space) to maximize its utility. Assuming the payoff for the student is simply high scores, the payoff for the teacher is quite problematic: If the teacher is paid for the student's correct answers, the teacher may give easy tests and learning halts. On the other hand, if the teacher is paid for the student's failures, the teacher may give tests which are unlearnable within the mutation space of the student, and learning halts again! When teacher and student are the same, as in co-evolution or self-play, MSS's are abundant. My conclusions focus on features of backgammon, and perhaps of life itself, which prevent MSS's in the MGL.

Learning in Multiagent Control of Smart Matter

Oliver Guenther, Tad Hogg and Bernardo A. Huberman
Abstract
Embedding microscopic sensors, computers and actuators into materials allows physical systems to actively monitor and respond to their environments. This leads to the possibility of creating smart matter, i.e., materials whose properties can be changed under program control to suit varying constraints. A key difficulty in realizing the potential of smart matter is developing the appropriate control programs. One approach to this problem is a multiagent control system, which can provide robust and rapid response to environmental changes. To improve the performance of such systems, we describe how the agents' organization can adapted through simple learning mechanisms. As a specific example, we consider maintaining a physical system near an unstable configuration, a particularly challenging application for smart matter. This shows how the system's organization can adapt to the local physical structure to improve performance. The postscript will not be available on our web pages until July. For now you can access ftp://parcftp.xerox.com/pub/dynamics/smart.html which has the relevant background material on multiagent control of smart matter.

Learning Roles: Behavioral Diversity in Robot Teams

Tucker Balch
Abstract
This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it builds a task-achieving strategy using reinforcement learning. The agents learn individually to activate particular behavioral assemblages given their current situation and a reward signal. The experiments, conducted in robot soccer simulations, evaluate the agents in terms of performance, policy convergence, and behavioral diversity. The results show that in many cases, robots will automatically diversify by choosing heterogeneous behaviors. The degree of diversification and the performance of the team depend on the reward structure. When the entire team is jointly rewarded or penalized (global reinforcement), teams tend towards heterogeneous behavior. When agents are provided feedback individually (local reinforcement), they converge to identical policies.
URL: http://www.cc.gatech.edu/grads/b/Tucker.Balch/papers/learningroles.ps.Z

An Accumulative Exploration Method for Reinforcement Learning

Edwind de Jong
Abstract
Agents in Multi Agent Systems can coordinate their actions by communicating. We investigate a minimal form of communication, where the signals that agents send represent evaluations of the behavior of the receiving agent. Learning to act according to these signals is a typical Reinforcement Learning problem. The backpropagation neural network has been used to predict rewards that will follow an action. The first results made clear that a mechanism for balancing between exploitation and exploration was needed. We introduce the Exploration Buckets algorithm, a method that favors both actions with high prediction errors and actions that have been ignored for some time. The algorithm's scope is not restricted to a single learning algorithm, and its main characterics are its insensibility to large (or even continuous) state spaces and its appropriateness for online learning; the Exploration / Exploitation balance does not depend on properties external to the system, such as time.
Available as: http://arti.vub.ac.be/~edwin/publications/Exploration.ps or http://arti.vub.ac.be/~edwin/publications/Exploration.ps.gz

Agents Learning about Agents: A Framework and Analysis

Jose Vidal and Edmund H. Durfee
Abstract
We provide a framework for the study of learning in certain types of multi-agent systems (MAS), that divides an agent's knowledge about others into different ``types''. We use concepts from computational learning theory to calculate the relative sample complexities of learning the different types of knowledge, given either a supervised or a reinforcement learning algorithm. These results apply only for the learning of a fixed target function, which would probably not exist if the other agents are also learning. We then show how a changing target function affects the learning behaviors of the agents, and how to determine the advantages of having lower sample complexity. Our results can be used by a designer of a learning agent in a MAS to determine which knowledge he should put into the agent and which knowledge should be learned by the agent.

Using Decision Tree Confidence Factors for Multiagent Control

Peter Stone and Manuela Veloso
Abstract
Although Decision Trees are widely used for classification tasks, they are typically not used for agent control. This paper presents a novel technique for agent control in a complex multiagent domain based on the confidence factors provided by the C4.5 Decision Tree algorithm. Using Robotic Soccer as an example of such a domain, this paper incorporates a previously-trained Decision Tree into a full multiagent behavior that is capable of controlling agents throughout an entire game. Along with using Decision Trees for control, this behavior also makes use of the ability to reason about action-execution time to eliminate options that would not have adequate time to be executed successfully. The newly created behavior is tested empirically in game situations.
http://www.cs.cmu.edu/afs/cs/user/pstone/public/papers/97aaai/final-workshop.ps.gz
http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/97aaai/final-workshop.html

Evolving Organizations of Agents

Claudia V. Goldman and Jeffrey S. Rosenschein
Abstract
We investigate how agents can learn to become experts, and eventually organize themselves appropriately for a range of tasks. Our aim is to look at evolutionary processes that lead to organizations of experts. The distributed artificial intelligence (DAI) community has dealt with multiagent systems that organize themselves in order to achieve a specific shared goal. Various organizations can arise that will effectively balance the load on the agents and improve their performance. We here look at the process of emergence of an organization as a step that takes place prior to the execution of a task, and as a general process related to a range of problems in a domain. To explore the ideas set forward, we designed and implemented a testbed based on the idea of the game of Life. We present experimental results that show different patterns of organizations that might evolve in a multiagent system.
URL: http://www.cs.huji.ac.il/~clag

Learning and Adaption in Multiagent Systems

David C Parkes and Lyle H Ungar
Abstract
The goal of a self-interested agent within a multiagent system is to maximize its utility over time. In a situation of strategic interdependence, where the actions of one agent may affect the utilities of other agents, the optimal behavior of an agent must be conditioned on the expected behaviors of the other agents in the system. Standard game theory assumes that the rationality and preferences of all the agents is common knowledge: each agent is then able to compute the set of possible equilibria, and if there is a unique equilibrium, choose a best-response to the actions that the other agents will all play. Real agents acting within a multiagent system face multiple problems: the agents may have incomplete information about the preferences and rationality of the other agents in the game, computing the equilibria can be computationally complex, and there might be many equilibria from which to choose. An alternative explanation of the emergence of a stable equilibrium is that it arises as the long-run outcome of a repeated game, in which bounded-rational agents adapt their strategies as they learn about the other agents in the system. We review some possible models of learning for games, and then show the pros and cons of using learning in a particular game, the Compensation Mechanism, a mechanism for the efficient coordination of actions within a multiagent system.
http://www.cis.upenn.edu/~dparkes/final.ps

Learning Problem Solving Control in Cooperative Multi-agent Systems

M V Nagendra Prasad, and Victor R Lesser
Abstract
The work presented here deals with ways to improve problem solving control skills of cooperative agents. We propose situation-specific learning as a way to learn cooperative control knowledge in complex multi-agent systems. We demonstrate the power of situation-specific learning in the context of dynamic choice of a coordination strategy based on the problem solving situation. We present a learning system, called COLLAGE, that uses meta-level information in the form of abstract characterization of the coordination problem instance to learn to choose the appropriate coordination strategy from among a class of strategies.
ftp://dis.cs.umass.edu/pub/coop-learn-aaai97.ps

Evolving Cooperative Groups: Preliminary Results

Maria Gordin, Narendra Puppala, and Sandip Sen
Abstract
Multi-agent systems require coordination of sources with distinct expertise to perform complex tasks effectively. In this paper, we use a co-evolutionary approach of using genetic algorithms (GAs) to evolve multipleindividuals who can effectively cooperate to solve a common problem. We concurrently run a GA for each individual in the group. We experiment with a room painting domain which requires cooperation of two agents. We have used two mechanisms for evaluating an individual in one population: (a) pair it randomly with members from the other population, (b) pair it with members of the other population in a shared memory containing the best pairs found so far. Both the approaches are successful in generating optimal behavior patterns. However, our preliminary results exhibit a slight edge for the shared memory approach.

Learning to take risks

Sandip Sen and Neeraj Arora
Abstract
Agents that learn about other agents and can exploit this information possess a distinct advantage in competitive situations. Games provide stylized adversarial environments to study agent learning strategies. Researchers have developed game playing programs that learn to play better from experience. We have developed a learning program that does not learn to play better, but learns to identify and exploit the weaknesses of a particular opponent by repeatedly playing it over several games. We propose a scheme for learning opponent action probabilities and a utility maximization framework that exploits this learned opponent model. We show that the proposed expected utility maximization strategy generalizes the traditional maximin strategy, and allows players to benefit by taking calculated risks that are avoided by the maximin strategy. Experiments in the popular board game of Connect-4 show that a learning player consistently outperforms a non-learning player when pitted against another automated player using a weaker heuristic. Though our proposed mechanism does not improve the skill level of a computer player, it does improve its ability to play more effectively against a weaker opponent.

Augmenting Collective Adaptation with Simple Process Agents

Thomas Haynes
Abstract
We have integrated the distributed search of genetic programming based systems with collective memory to form a collective adaptation search method. Such a system significantly improves search as problem complexity is increased. However, there is still considerable scope for improvement. In collective adaptation, search agents gather knowledge of their environment and deposit it in a central information repository. Process agents are then able to manipulate that focused knowledge, exploiting the exploration of the search agents. We examine the utility of increasing the capabilities of the centralized process agents.

The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems

Caroline Claus and Craig Boutilier
Abstract
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in cooperative multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of partial action observability, game structure, and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria and on learned Q-values. We also consider variants of the usual exploration strategies that can induce convergence to optimal equilibria in cases where they might not otherwise be attained.