Learning Cooperative Behavior in Multi-agent ... - Semantic Scholar

3 downloads 186 Views 201KB Size Report
operative behavior in multi-agent environment using the server. In this article, we report a result of case study of learning selection of play-plans in multi-agentĀ ...
to appear in Proc. of PRICAI'96 Cairns, Australia pp. 570{579 (Aug. 1996) Learning Cooperative Behavior in Multi-agent Environment | a case study of choice of play-plans in soccer |

NODA Itsuki1 , MATSUBARA Hitoshi1 and HIRAKI Kazuo1

Electrotechnical Laboratory, Tsukuba, Ibaraki 305, Japan Soccer, association football, is a typical team-game, and is considered as a standard problem of multi-agent system and cooperative computation. We are developing Soccer Server, a simulator of soccer, which provides a common test-bench to evaluate various multi-agent systems and cooperative algorithms. We are working on learning cooperative behavior in multi-agent environment using the server. In this article, we report a result of case study of learning selection of play-plans in multi-agent environment. Abstract.

Keywords:

1

Multi-agent System, Machine Learning, Neural Networks

Introduction

Soccer, the association football, is a typical team-game, in which each player is required to play cooperatively. Moreover, soccer is a real-time game in which situation changes dynamically. Because of those features, soccer can be considered as a standard problem of multi-agent systems and cooperative algorithms. We have developed Soccer Server, a simulator of soccer games, to provide a common test-bench to evaluate various multi-agent systems and cooperative algorithms. Using the server, a team of players written any kinds of programming system with facilities of UDP/IP can play a match with another team written another kind of systems. We are working on learning cooperative behavior in multi-agent environment using the server. In this article, we show a result of learning of selection of playplans in a simple situation of two-on-one in front of a goal on the server.

2

Soccer Game as a Standard Problem

From the standpoint of multi-agent systems, soccer (association football), which is just one of typical team sports, make a good example of problems in the real world that is moderately abstracted. Multi-agent systems provides us with research subjects such as cooperation protocol by distributed control and e ective communication, while having advantages as follows: {

eciency of cooperation

{ { {

adaptation robustness real-time Soccer has the following characteristics:

{ { { { {

Robustness is more important than elaboration. A team should take a failsafe strategies, because play-plans and strategies may fail by accidents. Adaptability is required for dynamic change of plans according to the operations of the opposing team. Team play is advantageous. A match is uninterrupted. Each player is required to plan in real-time, because it is dicult to make a whole plan before plays. As precise communication by language is not expected, e ectiveness must be provided by combining simple signals with the situation.

These characteristics show that soccer is an appropriate example for evaluation of multi-agent systems. Various examples have been used to evaluate performance of multi-agent systems, but such examples are simple and small. Soccer is complex and large, so it is suitable to evaluate them in the large scale. Moreover, because soccer has a long history and is familiar to many people, there are many techniques and strategies that may be also useful for multi-agent systems. Recently, soccer has been used as an example on real robots as well as software simulators [1, 4, 3, 5]. Many of those experiments, however, have their own ways of setting, which makes it dicult to make a comparison among them. For satisfying the need for a common setting, we are developing Soccer Server[2]. Adaptation of this soccer server as a common setting makes it possible to compare various algorithms on multi-agent systems in the form of a game. Soccer Server is being used by many researchers, and was chosen as the ocial simulator for RoboCup 97, an international competition of robotic soccer held in IJCAI '97 at Nagoya.

3

Soccer Server

Soccer Server provides a virtual soccer eld, in which players controlled by clients run and kick a ball. Figure 1 and Figure 2 show its window images. Soccer Server consists of 3 modules: a eld simulator module, a referee module and a message-board module (Figure 3). The eld simulator module calculates movements of objects on the eld and checks collisions among them. The referee module controls a game according to rules. The message-board module manages communication among clients. A client connects with the server by a UDP socket. The server assigned a player to the client. Using the socket, the client sends commands to control its player and receives information from sensors of the player. Basically, each client can control only one player 1 , so that the server connects with 22 clients at most 1

Technically, it is easy to cheat the server. Therefore this is a gentleman's agreement.

2.

All communication between the server and each client is done using by ASCII strings. 3.1

Simulator

The soccer eld and all objects on it are 2-dimensional. The size of the eld is decided according to the ocial size of rules of human soccer: The length is 105 and the width is 68. The width of goals is doubled, that is 14.64, because 2-dimensional simulation makes it dicult to get goals. Simulation of movements is simpli ed. Movements of objects are simulated stepwise one by one. Noise is added to each movement according to the speed of the object. If a object overlaps another object, that is, collides with another object after its movement, the object is moved back until it does not overlap other objects. Then its velocity is multiplied by 00:1. 3.2

Protocol

All communication between the server and each client is done using by ASCII strings. An unit of communication is an S-expression. Protocols of communication between the server and a client is as follows. Control Command

player. {

(turn

Moment

{

(dash

Power

{

{

A client can send the following commands to control its

)

Change the direction of the player according to Moment. Moment should be 0180  180. Actual change of the direction is reduced when the player is moving fast. )

Increase the velocity of the player toward its direction according to Power should be 030  100.

(kick

Power Direction

)

Kick the ball by Power to the Direction if the ball is near enough. should be 030  100, and Direction shoule be 0180  180.

(say

.

Power

Power

)

Message

Broadcast Message to all players. Message is informed immediately to clients using a (hear ...) format described below.

A client gets two kinds of sensor information about the eld from the server, visual and auditory information. The visual and auditory information are informed by see and hear messages respectively.

Sensor Information

2

In order to test various kind of systems, we may permit a client controls multi-players when each control module of a player is separated logically from each other in the client.

{

(see

Time ObjInfo ObjInfo

...)

Inform visual information. Time indicates the internal time(step-cycles of simulation). ObjInfo is information about a visible object, whose format is: (ObjName ObjName

Distance Direction

::= (player

j (flag

)

) j (goal Side) j (ball) [l|c|r] [t|b]) j (line [l|c|r|t|b]) Teamname UNum

As the distance to a player increases, more information about the player is lost. Actually, UNum is lost in the case of farther than a certain distance, and Teamname is lost in the case of very far. This message is sent 2 times per second. (The frequency may be changed.)

{

3.3

(hear

)

Time Direction Message

Inform auditory information. This message is sent immediately when a client sends (say Message) command. Direction is the direction of the sender. Time indicates the current time. Judgements of the referee is also informed using this form. In this case, Direction is `referee'. Coach Mode

In order to make it easy to set-up a certain situation, Soccer Server has a coach mode. In this mode, a special client called `coach' can connect to the server, who can move any players and the ball. This mode is useful for learning and debugging client programs. 3.4

Implementation

Soccer Server is implemented using g++ and X-window system with Athena widget-set. Currently, I support SunOS 4.1.x, Solaris 2 and DEC OSF/1. I am planning to support other OSs and architectures. Programs of the soccer server are available by FTP: ftp://ci.etl.go.jp/pub/soccer/server/sserver-2.70.tar.gz

Home page of the soccer server is: http://ci.etl.go.jp/~noda/soccer/server.html

We also have a mailing list about Robo-Cup: [email protected]

4 4.1

Learning Choice of Plan Selection of Play in Soccer

In the case of dynamical environment like soccer, an agent must select a plan from many candidates reactively. Furthermore, in a multi-agent system, the agent

Fig. 1.

Window Image of Soccer Server

must consider about behaviors and performances of other agents. In such case learning is necessary, because it is dicult to write down all rules to select plans in such complex environment. Let's consider a situation that an o ensive player attacks the opponent goal with a teammate against an opponent player (Figure 4). This is a simple but typical situation which requires cooperative play. The o ensive player can shoot the ball to the goal directly, or can pass the ball to the teammate. Generally, reasonable rules to select these plans are as follows: { {

If the opponent player cuts shoot-courses, the o ensive player should pass the ball to the teammate. If the opponent player cuts pass-courses, the o ensive player should shoot the ball directly.

However, it remains a problem how to de ne conditions of `cutting-shoot-courses'

Fig. 2.

Close-up of Players and a Ball Soccer Server

Client

Socket

Message Board

Socket Referee

Client

X window

Field Simulator Client

Socket

UDP/IP

Fig. 3.

Overview of Soccer Server

and `cutting-pass-courses', which requires complex models of the players. In order to acquire such conditions and models, we applied learning by neural networks. 4.2

Experiment

We carried out an experiment as follows: { { { {

The three players are set randomly in a penalty area. The opponent player is programed to keep the position between the ball and the goal. The teammate is programed to wait the pass and shoot the ball. The o ensive player is programed as follows:

1. 2. 3. 4.

{

{

{

Collect information about positions of all objects (players, ball and goal). Input the information to the neural network. Receive the output of the network. Choose one of `pass' and `shoot' plans according to the output of the neural networks. The neural network consists of 8 input-units, 30 hidden-units and 2 outputunits. The network receives relative positions of the objects, and outputs two values, Opass and Oshoot , which indicate expected success rate of pass and shoot plans. Initially, weights of connections in the network are set randomly, so that the network outputs arbitrary values before learning. The o ensive player chooses pass or shoot plans according to the rate of Opass and Oshoot : The probability of choosing the pass-plan is Opass =(Opass + Oshoot ). A coach (teacher) informs `nice' to all players when the o ending team gets a goal, and `fail' when the ball is out of the eld or time is over.

Initially, we ran the players 1000 times with the initialized neural networks, and recorded data of situations that include positions of objects and nal judgement of the coach (`pass' or `fail'). Using the data, we trained the networks to output expected success rates of both plans correctly by the back-propagation method. Finally we ran again the players with the trained neural network. Table 1 shows the rates of successes and failures of both plans before and after the learning. We can see that success rates of both plans are remarkably improved. Figure 5 shows changes of outputs of the neural networks for directions to the opponent player. In this graph, the curve of `shoot' goes down when the opponent player is near to the goal. On the other hand, the curve of `pass' goes down when the opponent player is in the side of the teammate. These responses implicitly re ect the rules to select plans described above: The area where the `shoot'-curve goes down is the situation the opponent player cuts shoot-courses, while the area where the `pass'-curve goes down is the situation the opponent player cuts pass-courses or marks the receiver. Because the network acquired such conditions suitably, the o ensive team improved the success-rate. Moreover, we found that the network also re ects models of other players suitably. Figure 6 shows changes of the output of `shoot' for the distance and the direction of the opponent player. In this graph, the surface forms a valley along the direction of the goal, and the width of the bottom of the valley becomes narrow when the distance to the opponent player increases. This means that the opponent player has a xed cover area, whose apparent angle becomes small when the opponent player is away from the o ensive player.

5

Conclusion

In this article, we reported Soccer Server and an experiment of learning selection of play-plans on the server. The result of the experiment showed that simple learning technique of neural networks is useful to acquire models of other agents

Fig. 4.

Table 1.

A Situation of 2 on 1

Success Rate of Shoot and Pass Plans Before Learning

pass shoot total nice 247(55.6%) 231(41.5%) 478(47.8%) fail 197(44.4%) 325(58.5%) 522(52.2%) total 444 556 1000 After Learning

pass shoot total nice 327(74.3%) 353(63.0%) 680(68.0%) fail 113(25.7%) 207(37.0%) 320(32.0%) total 440 560 1000

and environment, and to improve the total performance. It is dicult to use acquired models directly for symbolic planning of sequences of plays, because such models are represented implicitly. It is also a problem that it is dicult to apply such neural networks to higher levels of decision making. However, these problems may be overcome by combining symbolic and neural planing using back-propagation as feedback[7, 6]. The experiment described in this article was done as a rst step of a study of learning cooperative behavior. Cooperation was not a major part in this experiment. We, however, established the experiment to be easy to extend cooperative learning, and we are planning further experiment in which all players learn their mechanisms to select plans simultaneously.

1 0.8 0.6

0.4

0 -80

Fig. 5.

Shoot Pass

Direction to Teammate

0.2

-60

-40

-20

0

Direction to Opponent Player

20

Change of Network Output for Direction to Opponent Player

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 14 12 10 8 6 4 2

Fig. 6.

0

20

Direction to

-20

-40

-60

-80

Distanc

e

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Output

Output

Direction to Goal

Opponent P

layer

Change of Shoot-Output for Distance and Direction to Opponent Player

References

1. Minoru Asada, Shoichi Noda, and Koh Hosoda. Non-physical intervention in robot learning based on lfe method. In Proc. of Machine Learning Conference Workshop on Learning from Examples vs. Programming by Demonstration, 1995. 2. Hiroaki Kitano, Minoru Asada, Yasuo Kuniyoshi, Itsuki Noda, and Eiichi Osawa. Robocup: The robot world cup intiative. In Working Notes of IJCAI Workshop: Entertainment and AI/Alife, pages 19{24, Aug. 1995. 3. M. K. Sahota. Reactive delivation: an architecture for real-time intelligent control in dynamic environments. In Proc. of AAAI-94, pages 1303{1308, 1994. 4. Mickael K. Sahota. Real-time intelligent behaviour in dynamic environments: Soccer-playing robots. matster thesis, Department of Computer Science, The University of British Columbia, Aug. 1993. 5. P. Stone and M. Veloso. Learning to pass or shoot: collaborate or do it yourself, 1995. unpublished manuscript. 6. J. Tani and N. Fukumura. Learning goal-directed sensory-based navigation of a mobile robot. Neural Networks, 7(3):553{563, 1994. 7. Yasuhiro Wada and Mitsuo Kawato. A neural network model for arm trajectory formation using forward and inverse dynamics models. Neural Networks, 6:919{932, 1993.

This article was processed using the LaTEX macro package with LLNCS style