Visualising Tierra's tree of life using Netmap - CiteSeerX

0 downloads 0 Views 2MB Size Report
Aug 16, 2002 - erated in an evolutionary run of Tierra (the tree of life). Netmap is a sophisticated tool for visualising and analysing any dataset containing a ...
Visualising Tierra’s tree of life using Netmap Russell K. Standish1 and John Galloway2,3 1

Mathematics, University of New South Wales, 2052 Australia [email protected], http://parallel.hpc.unsw.edu.au/rks 2 Netmap Analytics, 16/39 Herbert St, St Leonards, 2065 Australia [email protected], http://www.netmap.com 3 University of Technology, Sydney Abstract We report on some preliminary results of a project using Netmap to visualise the network of genotypes generated in an evolutionary run of Tierra (the tree of life). Netmap is a sophisticated tool for visualising and analysing any dataset containing a network of graph structure. It has been successfully used for fraud detection in a number of commercial datamining projects, in the insurance and retail sector. It has the potential for detecting unusual relationships in the wide varieties of networks generated in Complex Systems.

Introduction Scientific Visualisation is an important tool for making sense of the large quantities of data generated by high performance and often highly complex computer models (Fosdick et al. 1996, Chapter 10). It couples the human visual system, which is a supreme pattern recogniser honed by millions of years of evolution, with the analysis process. Non-visual techniques, by contrast, always require a hypothesis which is tested by analytical techniques on the data. Consequently, non-visual analysis will miss what is not suspected. Network graphs are a crucial component to the understanding of complex systems (Green, Newth, & Kirley 2000). Network graphs are cartoon-like models of the system that capture formally the important properties of the system. Graphs have a rich set of properties that is still being actively explored(Albert & Barab´asi 2002). However, little work has been done to date on visualising network graphs generated by complex systems, aside from visualising Internet traffic1 , and in the social network community2 . Netmap3 is a tool for laying bare patterns and coincidences within large datasets of loosely related data. It 1

See http://www.caida.org for a discussion and links See, for instance, the network visualisation session in the Sunbelt series of conferences: http://www.inf.unikonstanz.de/algo/visone/SunbeltXX 3 http://www.netmap.com 2

has been used successfully for a number of years in detecting unusual patterns that may be evidence of fraud in datasets belonging to the insurance industry and the retail sector. The purpose of this project was to examine some datasets generated by complex systems models for interesting patterns that may have been missed by conventional analysis. As an initial foray into this subject, we examined a genebank dataset generated by the well known Tierra artificial life system.

Netmap Netmap displays the network data in a number of ways in a zoomable, panable and rotatable canvas. In the first mode of use, the network nodes are arranged around the circumference of a circle, with the edges drawn between the nodes. If a group of nodes is more interconnected from within, than without, it is split from the main ring, and displayed in a subsidiary ring outside. Such networks are termed emergent networks, and are indications of something interesting. More precisely, each node within the emergent network has more than 50% of its links to other members of the emergent network. An example of this display is shown in figure 2. The subsidiary ring in that case, are not emergent networks, but rather genotypes sorted according to the maximum population reached. A second visualisation method involves selecting a particular node, and displaying the nodes connected to it, and the ones connected to those and so on. The nodes within a particular column, all have the same hop-count from the selected node. Figure 3 shows an example of this kind of display.

Tierra Tierra (Ray 1991) is an artificial life system where short self-replicating programs running on a virtual CPU evolve and compete for survival. Part of the Tierra system, called the Genebanker records the organism’s program code (the genotype) as each new genotype arises. The genotypes are written to disk once it has exceeded

172

ALife VIII: Workshop proceedings

bits: 3 EX TC TP MF MT MB genotype: 0035aaa genetic: 0,35 parent genotype: 0044aaf 1st_daughter_a: inst: 259 instP: 259 1st_daughter_b: flags: 0 mov_daught: 35 breed_true: 1 2nd_daughter_a: inst: 260 instP: 260 2nd_daughter_b: flags: 0 mov_daught: 35 breed_true: 1 Origin: InstExe: 4775,573754 clock: 1000789270 Tue Sep 18 15:01:10 2001 MaxPropPop: 0.3859 MaxPropInst: 0.1174 MaxPop: 220 mpp_time: 4800,242218 MaxCpus: 0 ploidy: 1 Figure 1: Genebank record header for 0035aaa. It originated from 0044aaf at 4775,573754 instructions.

Figure 2: Ancestor-Descendent network. Genotypes are arranged anti-clockwise around the circle starting at the 3 o’clock position, according to the maximum population count (MaxPop field of the genotype header). A line connects a genotype with its parent genotype. Neutral mutations are shown in black, but its hard to make them out in this figure. The outlying rings show nodes that belong to a group with the same MaxPop count. /C=/NetMap/NetMap/queues/tierra/002.plt

Created : 16-Aug-2002 18:14:44

Plotted : 16-Aug-2002 19:11:55

a user specified threshold - either as a perecentage of the total memory of the computer, or a certain number of individuals. The thresholds can be set to zero so that every genotype is written to disk, including transitional forms between adaptive species, however this is usually not done, as the I/O then dominates the total runtime. A wide variety of behaviours evolved, with organisms making use of other organism’s copy loops (parasites), or stealing their CPU resources (hyperparasites) for example. Standish (1999; 2003) has developed a means of measuring the phenotypic properties of Tierran or-

ganisms, and in particular can determine if two Tierran genotypes (programs) correspond to the same phenotype (has the same behaviour when executed). Two genotypes with the same phenotype are said to be neutrally equivalent. He has found that around each genotype in genotype space, there are a large number of neutrally equivalent genotypes. It is therefore expected that neutral evolution(Reidys, Kopp, & Schuster 1997) (evolution of the genotype without phenotypic change) would be very important. In Tierra, with the amount of CPU resource allocated evenly to all organisms, selection favours shorter organisms that are less expensive to copy. Clearly, this effect limits the evolutionary potential of the system. By setting a parameter called SlicePow, CPU resources are allocated proportional to `SlicePow , where ` is the organisms length. SlicePow=1 corresponds to no selection bias as a function of length, and organisms can grow in size, but seemingly not in informational complexity (Standish 2003). In practice, organisms rapidly grow in size until a single organism occupies more than half of the soup, at which time it can no longer replicate, unless SlicePow is significantly less than 1. In this paper, we report on analysing a single dataset, with SlicePow=0.95. The largest genotype in the dataset was 6712 instructions long, but the most complex genotype was just over 100 instructions in complexity4 so evolution seemed to mostly involve adding “junk” code.

Results This project has only just started, and results only preliminary. We analysed the data from a single Tierra run of about 6 × 109 executed instructions, with SlicePow=0.95, which resulted in 4848 genotypes recorded in the GeneBanker database. Phenotype analysis showed that there were 1376 distinct phenotypes amongst these. Each record in the GeneBanker database 4 Information Based Complexity is usually reported in bits — in Tierra, one instruction corresponds to 5 bits.

ALife VIII: Workshop proceedings

173 /C=/NetMap/NetMap/queues/tierra/004.plt Created : 16-Aug-2002 18:16:41

004

Plotted : 16-Aug-2002 19:11:59

0460aaa 0433aac 0649aaa 0324aab 0057aay 0462aaa 0215aad 0508aab 0507aaa 0296aab 0449aaa 0090abe 0385aac 0155aah 0155aag 0148aal 0041aap 0231aae 0533aab 0209aag 0342aab 0656aaa 0225aae 0077abp 0116aaq 0312aae 0173aak 0050abs 0125aav 0047aaq 0043aae 0128aau 0436aaf 0207aad 0045aar 0108abi 0045aaz 0040aai 0053abe 0066acb 0085abi 0472aaa 0046aaj 0040aah 0040aaj 0061abk 0070abt 0035abf 0249aac 0665aaa 0070aby 0029aac 0054abt 0031aad 0277aae 0048aap 0716aac 0560aab 0683aac 0571aab 0540aab 0304aab 0983aac 1378aaa 0734aac 0191aaf

0080aaa

0035aaa

0018aaa

0050aah

0041aae

0041aaa

0035aaz

0045aaj

0036aac

0053aaf 0070aal

0036aah

0047

aac

0059aac

0058

aaq 0050

abc

f

Figure 3: One of the most highly radiated family trees in the dataset. The ancestral genotype is 0367aab, shown at the top of the figure.

0053aaf 0070aal

aa

j

0017aab

50

m

0048aa

0062aa

n

0050ab a

0065aa

00

0045ab i

0045aau

0040aa e

0049aa n

j

b

0080aa

r

0064aa

aaq

0054ab

0045aau 0036aah

00 49 aa j

0050aam 0047aac 0059aac

cb

0054aaz

54 a

0055 aam 0054

aam

0059

0059

0048 aaa

00

0044 aac

0045ab 0040aa i 0048aa e 0062aa j f

aav

0041 aal

0049aa 0050ab n a

a

0054ab 0065aa j n

0014aa

w

0054aa

0049aa 0050aa j 0080aa q b

aal

0050

0045

aae

aav

aax

0063

0039 aaf

0054ac b 0058ab c 0064aa r

0044 0050 aac 0055 aav 0054 aam aaq

0045 0059 aal aam

0044

c

0039 0044 aaf 0041 aae 0048 aal aaa

0048 0059 aac 0063 aav aax

00 48 aa

Figure 4: Zoomed piece of figure 2 at the 3 o’clock position, showing 0080aaa and 0035aaa. Neutral mutations are marked in green, whereas non-neutral ones are shown in blue. has a header, which records (amongst other things) the immediate ancestor of that genotype, and the time of creation. An example header is shown in figure 1. Using the parent genotype field, a network was constructed linking the genotypes together. However, because Genebanker does not record all genotypes generated, but only those crossing a particular threshold, the network is not fully connected. In the future, we will probably restrict our analysis to genebank records where the threshold is set to zero so that the network is continuous. Netmap initially threw up a few oddities — quite a few node pairs were listed as being parents to each other, and there was even one case, shown in figure 5) of a triplet with 0169aae begetting 0123aau begetting 0123aav begetting 0169aae. At this stage, it is not clear how this situation came to pass, whether it is a bug in the Genebanker code, or a design feature. Figure 2 shows all the genotypes classified by their maximum population count (MaxPop), starting at the 3 o’clock position and progressing anti-clockwise. The outlying rings respresent sorting of the genotypes according to the maximum population reached by that genotype. The outer rings are coloured to indicate the different MaxPop classes. The first red ring are those genotypes with a zero value of MaxPop, namely those mentioned in the Parent Genotype field, but not otherwise represented in the Genebanker record. The next ring has a MaxPop of 1 and so on. At approximately the 12 o’clock position, a marked discontinuity occurs, indicating a threshold above which genotypes are adaptive, rather than being interim forms. Figure 4 shows a piece of this graph at the 3 o’clock position zoomed in. The most successful genotype is the ancestor 0080aaa with a MaxPop of 277 and the next is 0035aaa with a MaxPop of 220, both of which can be seen here. Figure 2 shows the neutral mutations in a different color to the non-neutral mutations, but it is rather hard to make out the neutral mutations. Figure 6 shows just

0035abf

0070aby

0031aad

0716aac

0683aac

0304aab

0047aad

0044aai

0036aaj

0035aay

0031aad2

0024aab

0023aaa

0018aab

0017aab

0012aaa

0209aag

0077abp

0050abs

0207aad

0040aai

0665aaa

0029aac

0277aae

0560aab

0540aab

0983aac

0734aac

0054abd

0053aav

0052aag

0049aay

0048aad

0044aam

0042aab

0036aaa

0035aat

0026aaa

0024aaa

0019aac

0018aaa

0014aaa

ALife VIII: Workshop proceedings

abh

0045aar

0049aab

0041aap

0054

0125aav

0051aak

0385aac

00 5 4 abg 0191 aae aae

0169

aav

aak 0047 acd 0080 aag 0036

Plotted : 16-Aug-2002 19:12:37

0123

Created : 16-Aug-2002 18:27:48

0312aae

0052abj

0191aaf

0342aab

0054aah

1378aaa

0231aae

/C=/NetMap/NetMap/queues/tierra/010.plt

0155aag

174

01

23

aa

u 00 00 67 00 88 ab 64 ab i aa u x

00 00 67 00 64 ab 01 79 aa n 00 08 aa z 54 aa y aa r v

00 00 79 7 01 1 ab 00 08 ab c 5 00 0 aa b 54 aa t aa k q

e

b

e aa

v 00

j

35

aa

aa

45

01

i

23

aa

45

aa v

0037 aaf

aa

00

00 37

44

00

00

t

35

aa

aa

45

u

00

00

aq

5a

35 aa z

4 00

00

45

ab

the neutral mutations. What is surprising is the very small role played by neutral evolution in this particular example. Most of the neutral evolution is centred on one genotype — 0035aaa shown at the 3o’clock position in Figure 6. This genotype features in figure 3 3 hops from the root, and has given rise to a large number of different genotypes, many of which are neutral. The neutral mutations are again shown in green.

aab

0038aa

abc

00

0038

0040aad

0041aaa

0040aae

aad

b

0041

0041aa

0106

0038aak

Figure 5: The three way cycle where 0169aae begets 0123aau begets 0123aav begets 0169aae again. Note that 0123aau and 0123aav are neutrally equivalent.

z

aa 94

i

00

00

60 ab

aa

h

00

j

35

00 49

aa

f

00

35

aa

i

00

50

d

aa

g

35

aa

00

0050

aaa

aac

0032

0050 abq

aab

0035

0094ab b

g 0037aa

0053aaz

0261aab

0054abg

0014aac

0054abh

0035aaa

0114aai

0056aan

0094abd

0058abw

0096ab

a

0058ac

h

0106

aak 0060

abf

0116

aaq

aam

0060

01

24 aa

m

ab

n

60

00

00

n ab

60

00

00

q ab

80

ac

i

76

ac

71 00

f

00 74 ab t

w aa 61

00

ab

aa

74

x

00

This is still a very preliminary report, and the various oddities raised by netmap need further investigation. It is surprising how little neutral evolution occurred in this example, but this could have been due to the high SlicePow value. Further Tierra runs with different parameter settings need to be studied to draw any real conclusions. However, it is clear that Netmap is a useful tool for throwing up unusual features of datasets created by artificial evolution.

61 00

62

0065abq

e 0065ab

0065abk

aau

0063

aat

n

0123

63

aa

00

o

acg

00

n

ab

acb

0067ab

0084abh

0067abi

0067

0067

68

c

00

ab

ab

70

f

00 6

00

bs 2a

1a b

b

7 00

by

2a

aa

z

7 00

00

q

61

Conclusion

Figure 6: A plot showing just those nodes undergoing neutral evolution /C=/NetMap/NetMap/queues/tierra/007.plt

Created : 16-Aug-2002 18:23:32

Plotted : 16-Aug-2002 19:12:20

References Albert, R., and Barab´asi, A.-L. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74:47. Fosdick, L. D.; Jessup, E. R.; Schauble, C. J. C.; and Domik, G. 1996. An Introduction to High Performance Scientific Computing. Cambridge, MA: MIT

ALife VIII: Workshop proceedings Press. Green, D. G.; Newth, D.; and Kirley, M. 2000. Connectivity and catastrophe – towards a general theory of evolution. In Bedau, M. A.; McCaskill, J. S.; Packard, N. H.; and Rasmussen, S., eds., Alife VII: Proceedings of the Seventh International Conference, 153–161. Cambridge, MA: MIT Press. Ray, T. 1991. An approach to the synthesis of life. In Langton, C. G.; Taylor, C.; Farmer, J. D.; and Rasmussen, S., eds., Artificial Life II. New York: Addison-Wesley. 371. Reidys, C.; Kopp, S.; and Schuster, P. 1997. Evolutionary optimization of biopolymers and sequence structure maps. In Langton, C., and Shimohara, K., eds., Artificial Life V, 379. MIT Press. Standish, R. K. 1999. Some techniques for the measurement of complexity in Tierra. In Floreano, D.; Nicoud, J.-D.; and Mondada, F., eds., Advances in Artificial Life: 5th European Conference, ECAL 99, volume 1674 of Lecture Notes in Computer Science, 104. Berlin: Springer. Standish, R. K. 2003. Open-ended artificial evolution. International Journal of Computational Intelligence and Applications. (accepted).

175

Suggest Documents