IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems

Sandip Sen

Background and motivations for the workshop

Research in the area of multiagent systems have produced techniques for allowing multiple agents sharing common resources to coordinate their actions so that individually rational actions do not adversely affect overall system efficiency (Bond & Gasser, 1988). Whereas previous research efforts looked at offline design of agent organizations, behavioral rules, negotiation protocols, etc., it was recognized that agents operating in open, dynamic environments must be able to flexibly adapt to changing demands and opportunities (Lesser, 1995). In particular, individual agents are forced to engage with other agents with varying goals, abilities, composition, and lifespan. To effectively utilize opportunities presented and avoid pitfalls, agents need to learn about other agents and adapt local behavior based on group composition and dynamics. Though standard supervised, unsupervised, and reinforcement learning techniques can be used as starting points for exploring effective learning techniques in multiagent situations, one needs to augment them to match environmental demands and agent characteristics. For example, multiple agents learning at the same time present unique challenges for learning and adaptation techniques.

The goal of this workshop was to focus on research that will address unique requirements for agents learning and adapting to work in the presence of other agents. Recognizing the applicability and limitations of current machine learning research as applied to multiagent problems as well as developing new learning and adaptation mechanisms particularly targeted to these class of problems were the primary research issues that we wanted the authors to address. The call-for-paper for the workshop particularly welcomed new insights into this class of problems from other related disciplines, and emphasized the inter-disciplinary nature of the workshop. Among others, papers of the following kinds were sought and received:

Benefits of adaptive/learning agents over agents with fixed behavior in multiagent problems.
Exploration of the applicability of case-based, explanation-based, and inductive learning systems in novel multiagent problems.
Characterization of learning and adaptation methods in terms of modeling power, communication abilities, knowledge requirement, processing abilities of individual agents.
Developing learning and adaptation strategies for environments with cooperative agents, selfish agents, partially cooperative (will cooperate only if individual goals are not sacrificed) and for environments that can contain mixture of these types of agents.
Analyzing and constructing algorithms that guarantee convergence and stability of group behavior.
Study of adaptive behavior in team games, where one group of cooperative agents are pitted against another group of cooperative agents.

The need for learning in multiagent environments has been recently observed in both the Distributed AI and machine learning community. As mentioned above, there is ample motivation to study coordination mechanisms which allow agents to incrementally build models of the environment and of other agents, and which enables it to perform more effectively in a dynamic environment as compared to using static coordination knowledge. The organizers, contributors, and participants of the workshop felt that the timing of the workshop was very appropriate to focus discussion on research issues that will enable multiagent systems researchers to develop useful applications over the next few years.

The workshop was motivated by the above concerns and about 45 attendees of the workshop brought their own unique perspectives to bear upon these engaging and critical issues. The workshop schedule consisted of 10 oral and 6 poster presentations. The oral presentation sections consisted, with one exception, of 3 presentations on a common theme followed by a panel discussion where core issues were raised and relations between the different approaches were analyzed. The audience participation in the workshop was exemplary, and this contributed to the overall success of the workshop as measured by participant satisfaction.

Workshop sessions

In the first session, David Carmel and Shaul Markovitch discussed an approach to modeling opponents with an eye to developing optimal interaction strategies. They assume that agents' strategies can be modeled by finite automata models. The paper presents both a heuristic algorithm for inferring agent model from input/output behavior, and a method to find an optimal interaction strategy with the inferred model. This work develops on the work of Angluin (Angluin, 1978) on inferring automata model with an oracle that answers membership and equivalence queries. Goldman and Rosenschein discussed a more cooperative scenario, where two agents mutually supervise each other to evolve better coordination. In their work, agents interchange labeled samples with or without the intervention of a mediator, and try to infer an approximately correct description of the behavior of the other agent from these samples. The probabilistic concept learning schemes used in the paper are motivated by the model presented by Kearns and Shapire (Kearns and Shapire, 1990). The paper by Haynes and Sen addressed the issue of evolving coordination strategies for a group of agents using the genetic programming paradigm. They experiment with the well-known predator-prey domain (Benda et al., 1985), where four predator agents are trying to surround and capture a prey moving in a toroidal grid wold. They also address the interesting issue of co-evolving both cooperative and antagonistic agents.

The second session started with Ciara Byrne and Peter Edwards' paper on refining the knowledge bases of individual group members to improve the effectiveness of the entire group. They utilize a refinement facilitator agent that uses KQML messages to coordinate refinements that benefit the group. NagendraPrasad, Lesser, and Lander's paper deals with agents learning about their roles in an organization and about the local and joint search spaces in group decision making. They use different supervised learning schemes, including a form of instance-based learning (Aha et al., 1991), in building a group of agents that learn to effectively design artifacts. They conclude that even though learning by itself does not allow the agents to produce the same solution quality as can be obtained by direct negotiation, it does provide for significant savings in communication cost. Sen and Sekaran addressed the dilemma of an agent in deciding whether or not to help another agent in the environment. They showed that agents using a probabilistic reciprocity mechanism can form stable groups that perform at the optimum level. This shows interesting possibilities for designing agent societies where optimal system performance can be obtained even though individual agents are self-motivated (this is more representative of open systems rather than assuming that all agents are cooperative or benevolent by design).

In a one-of-a-kind paper, Larry Glicoes, Rich Staats, and Michael Huhns described their design of an intelligent agent based distribution system for the US Department of Defense to move personnel, equipments, and supplies. A system of static and mobile agents use historical data and real time data communicated via satellites to push shipments through to meet deadlines. The agents must learn to adjust their preference for other agents as well as modes of transportation under different system and environmental conditions so that efficient transportation of personnel and goods are achieved both for routine operations and for unforeseen contingencies.

The last session of oral presentations involved multiagent systems utilizing reinforcement learning techniques. The first paper in the group, by Sen and Sekaran, evaluated the classifier system approach based on genetic algorithms (Holland, 1986), and found it to be at least as effective as the more popular Q-learning approach (Watkins, 1989) on domains with varying agent coupling and feedback delays. In this work, the authors assume that agents learn from environmental feedback only and are not even aware of the presence of other agents. These assumptions, together with the fact that multiple agents are learning concurrently, makes it very difficult for individual agents to find optimal policies even after repeated interactions. Experiments presented, however, shows that very close to optimal performance can be produced under certain assumptions of agent coupling and feedback delays. Tuomas Sandholm and Robert Crites investigated the use of the Q-learning algorithm in the Iterated Prisoner's Dilemma game. The learning agent was able to develop optimal strategies against opponents with static strategies, but when both the players were learning concurrently, the learners were less effective. These two papers highlight the problem posed to traditional machine learning approaches by the non-stationarity of the environment created by concurrent learning by multiple agents. Maja Mataric also stressed the inadequacy of associated assumptions made in traditional reinforcement learning literature when an agent tries to cope with a real world with noisy perception and action and inconsistent reinforcement, particularly in the presence of other agents. She argued for the effective use of existent domain knowledge for designing heterogeneous reward functions and goal-specific progress estimators to speed-up the reinforcement learning process in situated domains. Her presentation also included a video of groups of robots learning to solve cooperative tasks.

The poster session in the early afternoon was informal but informative and provided sufficient opportunities for attendees to discuss mutual interests and ideas. Pan Gu and Anthony Maddox's poster presented a distributed reinforcement learning framework (DRLM) where agents share experience and provide feedback to peers. The DRLM is used in a real-time environment by distributed agents to process interrelated tasks. Anupam Joshi presented a scientific computing scenario, with the PYTHIA project, where agents use both supervised and unsupervised (using epistemic utility theory) learning mechanisms. Two noteworthy aspects of the paper was a multiagent extension to the previously existent single agent system, and the characterization of when agents in the PYTHIA system should or should not use learning mechanisms. Britta Lenzmann and Ipke Wachsmuth presented an application of the VIENA (Virtual Environments and Agents) system where agents learn user preferences for a 3D environment from direct feedback. The overall behavior of the system is determined by how agents, representing different perspectives of the environment, organize themselves based on feedback from the user. Yishay Mor, Claudia Goldman, and Jeff Rosenschein's poster analyzed the complexity of learning an opponent's model in game-theoretic negotiations. Even though learning the best response to a static strategy of an opponent, using a finite automata model, can take exponential time, for a restricted class of simple automata, a polynomial time learning algorithm was found. Takuya Ohko, Kazuo Hiraki, and Yuichiro Anzai's poster presented the LEMMING learning system that reduces communication cost in the Contract-Net Protocol (Smith, 1980) (used for task allocation in multiagent systems). Using case based reasoning (Kolodner, 1993), the LEMMING system can learn to send information selectively to relevant agents, and thus reduce waste of communication cost involved in broadcast communication. Andrea Schaerf, Yoav Shoham, and Moshe Tennenholtz's poster investigates a loosely coupled system where agents concurrently adapt to each other and to a changing environment. This paper analyzed the effects of adaptive behavior parameters and communication on system efficiency when a group of reinforcement learning agents try to balance the load in a distributed system.

The workshop concluded on a positive note, with the attendees voicing the need for similar workshops to be held in the future. A significant portion of the attendees expressed a desire to attend the 1996 AAAI Spring Symposium on Adaptation, Co-evolution and Learning in Multiagent Systems to be held in Stanford University between March 25-27, 1996.

Revised versions of selected papers from the workshop as well as additional material (including an introductory chapter and an extensive bibliography containing work in the area) will be published by Spring Verlag in their Lecture Notes in Computer Science series. This volume is edited by Gerhard Weiss and Sandip Sen, and is scheduled to appear in the Spring of 1996. The schedule of the workshop as well as abstracts of the presented papers can be accessed on the web at the following address: http://euler.mcs.utulsa.edu/~sandip/wshop/schedule.html.

What's next?

The workshop helped focus on several key issues in multiagent learning research. The following list presents some of the issues that we need to understand better before significant progress can be made in this nascent area of research:

Individual versus cooperative learning:

Agents can either individually try to model others using their experience and perception and with the purpose of personal gain, or they may actively share and participate in constructing a group model and plan of activities which will benefit the entire group. Very distinct forms of learning scheme will be suited for each of these two learning modes.

Concurrent versus staggered learning:

The number of agents learning/adapting at the same time will influence the rate of convergence of the learning processes used by individual agents.

Agent interactions:

Agents can interact frequently or infrequently; their interactions can be regulated and anticipated (as in a fixed organization) or completely unpredictable (as in open systems). The flux in agent groups and the length of the period over which agents interact determine how effectively agents can adapt to others.

Agent relationships:

Some agents may have more or less control over group activities or shared resources and can force some situations that will aid in their learning process.

Agent modeling:

Assumptions about the behavioral complexity of other agents or limitations in cognitive abilities will constrain the learning abilities or suggest learning schemes for agents.

Environmental feedback:

The rate and nature of environmental feedback is key to the kind of learning mechanisms that can be used.

References

Aha, D. W., Kibler, D., & Albert, M. K. (1991). "Instance-based learning algorithms." Machine Learning, 6, 37-66.
Angluin, D. (1978). "On the complexity of minimum inference of regular sets." Information and Control, 39, 337-350.
Benda, M., Jagannathan, V., & Dodhiawalla, R. (1985). "On optimal cooperation of knowledge sources." Technical Report BCS-G2010-28, Boeing AI Center, Boeing Computer Services, Bellevue, WA.
Bond, A. H., & Gasser, L. (Eds.). (1988). "Readings in Distributed Artificial Intelligence." San Mateo, CA: Morgan Kaufmann Publishers.
Holland, J. H. (1986). "Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems." In R. S> Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine Learning: an artificial intelligence approach, Volume II. Los Alamos, CA: Morgan Kaufmann.
Kearns, M. J., & Shapire, R. E. (1990). "Efficient distribution-free learning of probabilistic concepts." In Proceedings of the 31st Annual Symposium on Foundations of Computer Science (pp. 382-391).
Kolodner, J. (1993). "Case-Based Reasoning." San Mateo, CA: Morgan Kaufmann Publishers.
Lesser, V. (Ed.). (1995). "Proceedings of the First International Conference on Multi-Agent Systems." Menlo Park, CA: AAAI Press/The MIT Press.
Smith, R. G. (1980). "The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver," IEEE Transactions on Computers, C-29(12), 1104-1113.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Doctoral Dissertation, King's College, Cambridge University.