The goal of this workshop was to focus on research that will address unique requirements for agents learning and adapting to work in the presence of other agents. Recognizing the applicability and limitations of current machine learning research as applied to multiagent problems as well as developing new learning and adaptation mechanisms particularly targeted to these class of problems were the primary research issues that we wanted the authors to address. The call-for-paper for the workshop particularly welcomed new insights into this class of problems from other related disciplines, and emphasized the inter-disciplinary nature of the workshop. Among others, papers of the following kinds were sought and received:
The workshop was motivated by the above concerns and about 45 attendees of the workshop brought their own unique perspectives to bear upon these engaging and critical issues. The workshop schedule consisted of 10 oral and 6 poster presentations. The oral presentation sections consisted, with one exception, of 3 presentations on a common theme followed by a panel discussion where core issues were raised and relations between the different approaches were analyzed. The audience participation in the workshop was exemplary, and this contributed to the overall success of the workshop as measured by participant satisfaction.
The second session started with Ciara Byrne and Peter Edwards' paper on refining the knowledge bases of individual group members to improve the effectiveness of the entire group. They utilize a refinement facilitator agent that uses KQML messages to coordinate refinements that benefit the group. NagendraPrasad, Lesser, and Lander's paper deals with agents learning about their roles in an organization and about the local and joint search spaces in group decision making. They use different supervised learning schemes, including a form of instance-based learning (Aha et al., 1991), in building a group of agents that learn to effectively design artifacts. They conclude that even though learning by itself does not allow the agents to produce the same solution quality as can be obtained by direct negotiation, it does provide for significant savings in communication cost. Sen and Sekaran addressed the dilemma of an agent in deciding whether or not to help another agent in the environment. They showed that agents using a probabilistic reciprocity mechanism can form stable groups that perform at the optimum level. This shows interesting possibilities for designing agent societies where optimal system performance can be obtained even though individual agents are self-motivated (this is more representative of open systems rather than assuming that all agents are cooperative or benevolent by design).
In a one-of-a-kind paper, Larry Glicoes, Rich Staats, and Michael Huhns described their design of an intelligent agent based distribution system for the US Department of Defense to move personnel, equipments, and supplies. A system of static and mobile agents use historical data and real time data communicated via satellites to push shipments through to meet deadlines. The agents must learn to adjust their preference for other agents as well as modes of transportation under different system and environmental conditions so that efficient transportation of personnel and goods are achieved both for routine operations and for unforeseen contingencies.
The last session of oral presentations involved multiagent systems utilizing reinforcement learning techniques. The first paper in the group, by Sen and Sekaran, evaluated the classifier system approach based on genetic algorithms (Holland, 1986), and found it to be at least as effective as the more popular Q-learning approach (Watkins, 1989) on domains with varying agent coupling and feedback delays. In this work, the authors assume that agents learn from environmental feedback only and are not even aware of the presence of other agents. These assumptions, together with the fact that multiple agents are learning concurrently, makes it very difficult for individual agents to find optimal policies even after repeated interactions. Experiments presented, however, shows that very close to optimal performance can be produced under certain assumptions of agent coupling and feedback delays. Tuomas Sandholm and Robert Crites investigated the use of the Q-learning algorithm in the Iterated Prisoner's Dilemma game. The learning agent was able to develop optimal strategies against opponents with static strategies, but when both the players were learning concurrently, the learners were less effective. These two papers highlight the problem posed to traditional machine learning approaches by the non-stationarity of the environment created by concurrent learning by multiple agents. Maja Mataric also stressed the inadequacy of associated assumptions made in traditional reinforcement learning literature when an agent tries to cope with a real world with noisy perception and action and inconsistent reinforcement, particularly in the presence of other agents. She argued for the effective use of existent domain knowledge for designing heterogeneous reward functions and goal-specific progress estimators to speed-up the reinforcement learning process in situated domains. Her presentation also included a video of groups of robots learning to solve cooperative tasks.
The poster session in the early afternoon was informal but informative and provided sufficient opportunities for attendees to discuss mutual interests and ideas. Pan Gu and Anthony Maddox's poster presented a distributed reinforcement learning framework (DRLM) where agents share experience and provide feedback to peers. The DRLM is used in a real-time environment by distributed agents to process interrelated tasks. Anupam Joshi presented a scientific computing scenario, with the PYTHIA project, where agents use both supervised and unsupervised (using epistemic utility theory) learning mechanisms. Two noteworthy aspects of the paper was a multiagent extension to the previously existent single agent system, and the characterization of when agents in the PYTHIA system should or should not use learning mechanisms. Britta Lenzmann and Ipke Wachsmuth presented an application of the VIENA (Virtual Environments and Agents) system where agents learn user preferences for a 3D environment from direct feedback. The overall behavior of the system is determined by how agents, representing different perspectives of the environment, organize themselves based on feedback from the user. Yishay Mor, Claudia Goldman, and Jeff Rosenschein's poster analyzed the complexity of learning an opponent's model in game-theoretic negotiations. Even though learning the best response to a static strategy of an opponent, using a finite automata model, can take exponential time, for a restricted class of simple automata, a polynomial time learning algorithm was found. Takuya Ohko, Kazuo Hiraki, and Yuichiro Anzai's poster presented the LEMMING learning system that reduces communication cost in the Contract-Net Protocol (Smith, 1980) (used for task allocation in multiagent systems). Using case based reasoning (Kolodner, 1993), the LEMMING system can learn to send information selectively to relevant agents, and thus reduce waste of communication cost involved in broadcast communication. Andrea Schaerf, Yoav Shoham, and Moshe Tennenholtz's poster investigates a loosely coupled system where agents concurrently adapt to each other and to a changing environment. This paper analyzed the effects of adaptive behavior parameters and communication on system efficiency when a group of reinforcement learning agents try to balance the load in a distributed system.
The workshop concluded on a positive note, with the attendees voicing the need for similar workshops to be held in the future. A significant portion of the attendees expressed a desire to attend the 1996 AAAI Spring Symposium on Adaptation, Co-evolution and Learning in Multiagent Systems to be held in Stanford University between March 25-27, 1996.
Revised versions of selected papers from the workshop as well as additional material (including an introductory chapter and an extensive bibliography containing work in the area) will be published by Spring Verlag in their Lecture Notes in Computer Science series. This volume is edited by Gerhard Weiss and Sandip Sen, and is scheduled to appear in the Spring of 1996. The schedule of the workshop as well as abstracts of the presented papers can be accessed on the web at the following address: http://euler.mcs.utulsa.edu/~sandip/wshop/schedule.html.