Evolving inductive generalization via genetic self-assembly

0 downloads 0 Views 682KB Size Report
circuits. Keywords: Self-assembly, inductive generalization, evolvable logic, circuit design, genetic ... code converter have also been successfully evolved. .... power of two is only a bit shift) and therefore is able to deal with all possible ... The interplay of encoding, self-assembly and evolutionary dynamics is shown in. Fig. 2.
Evolving inductive generalization via genetic selfassembly Rudolf M. Füchslin*, Thomas Maeke, Uwe Tangen & John S. McCaskill

Ruhr-Universität Bochum, Biomolecular Information Processing (BioMIP), Schloss Birlinghoven, D-53754 Sankt Augustin, Germany. *Corresponding author: [email protected] Summary Sentence: Self-assembly of genetically encoded units enables the evolution of inductive generalization in functional structures such as multiplier circuits. Keywords: Self-assembly, inductive generalization, evolvable logic, circuit design, genetic algorithm, evolution, multiplier.

Abstract We propose that genetic encoding of self-assembling components greatly enhances the evolution of complex systems and provides an efficient platform for inductive generalization, i.e. the inductive derivation of a solution to a problem with a potentially infinite number of instances from a limited set of test examples. We exemplify this in simulations by evolving scalable circuitry for several problems. One of them, digital multiplication, has been intensively studied in recent years, where hitherto the evolutionary design of only specific small multipliers was achieved. The fact that this and other problems can be solved in full generality employing self-assembly sheds light on the evolutionary role of self-assembly in biology and is of relevance for the design of complex systems in nano- and bionanotechnology.

1

1

Introduction

Understanding the autonomous design of self-assembling complex systems1 is vital for further developments in nanoscience. Despite much progress in harnessing evolutionary processes2,3 and in particular genetic algorithms4,5, the conditions for evolving general solutions to problems, applicable to a combinatorially complex variety of distinct problem instances, remain unresolved. We are specifically interested in those cases where this variety prohibits an evolutionary construction of a general solution by accumulating specific solutions to individual problem instances. In such a situation, a general solution has to be constructed by evolutionarily detecting and exploiting the underlying structural properties of the whole task under consideration. Such general solutions are economical and useful as modules for building complex systems. Natural instances abound at the nanoscale: for example general sequence replication solved by polymerases and base pairing, general protein biosynthesis solved by translation with the ribosomal apparatus (for the evolution of the genetic code see6,7), and general pathogen recognition8 solved by mRNA splicing between sets of sequence modules to create antibody diversity. In this article, we show by explicit simulation that complex circuit design problems can be solved by exploiting the properties of self-assembling, genetically encoded components. We posit that natural systems have evolved general solutions to environmental tasks efficiently by making use of a similar modular genetic encoding of self-assembling units. To elucidate the evolutionary capabilities of self-assembling systems, we discuss in detail the evolution of scalable digital multipliers. The multiplication problem can serve as a prototype of a complex convolution in the genotype-phenotype mapping because the interior bits of multiplication products are notoriously convoluted functions of the inputs. They even find application in random number generation9,10. Scalability in this context means that by employing one and the

2

same set of self-assembling components, arbitrarily large n × n-bit functional circuits will be constructed given sufficient resources of space and numbers of component copies. Full scalability necessarily implies that the components, by virtue of their internal logic and the patterns they form by self-assembly, embody the logical structure of multiplication in abstract generality. The methods presented are not restricted to multiplication. Other scalable circuit designs, such as general arithmetic logic units (ALU), adders or a binary to Gray code converter have also been successfully evolved. Self-assembly is a process by which the local interaction between components (e.g. based on shape complementarity), determines their assembly into larger structures11 and is vital for biological systems. It is a structuring process complementing catalytic rate control and still operating near equilibrium, which, when genetically tuned, allows macroscopic objects to be constructed reliably under varying conditions. The diversity and precision of nanoscale biological function leads one to expect that complex engineering structures such as nanoscale circuits12,13,14 may also be assembled from components equipped with analogous recognition elements. Self-assembly is also important in the design of self-replicating molecules15,16, supramolecular chemistry17, natural and artificial cells11, and in molecular computation18,19,20. Fuelled by progress at the microscale21,22,23, the use of self-assembling nanostructures holds the promise of surmounting the physical limits to lithographic instruction24,25. Microscopic planar self-assembly of electronic components, with subsequent regular wiring completion (e.g. by electroplating) has been demonstrated in the laboratory26, and this would also allow physical wiring completion of the self-assembling functional architecture investigated here. Recently, also three-dimensional mulitcomponent self-assembly of electronic components has been shown27. These developments argue that the evolution of self-assembling components can have an immediate impact on nanotechnology. Physical models for evolution involving self-assembly either explicitly28 or implicitly as in the quasispecies theory29, have only addressed the evolution of

3

specific solutions to survival problems, and hence have not revealed an evolution-enabling role for general problem solution via self-assembly. Indeed, in the case of complex systems, autonomous evolutionary design has foundered for complex problems on three features of the evolutionary process: the ruggedness and the sparseness of good solutions in the genotype to phenotype mapping30, the small fraction of the physical environment that an individual experiences31 and non-monotonic optimization as a result of frequency dependent selection. The current work points to potential advantages in biasing evolutionary search to favor general solutions for complex problems. Practically, de novo circuit design has proved a formidable barrier for artificial evolution for two reasons. First it has been practically impossible to evolve all but the simplest digital circuits using examples of correct behavior, furthermore structures evolved in this way proved to be idiosyncratic and irregular, difficult to use in a modular way or to generalize. Self-assembling components overcome both of these difficulties. Larger digital multiplier circuits, constructed using information on only the correct output from limited multiplication examples, have been beyond the limits of evolutionary design. Even rationally designing a multiplier circuit from primitive components is provably hard for minimal resources , . Unbiased genetic algorithms32,33,34,35 have found only special circuits for multiplying very small numbers, and searching general feed-forward circuits to find even a non-minimal multiplier has proved tractable for similarly sized small problems34,35. In fact, the largest binary multiplier circuit found with unconstrained search , achieved the multiplication only of 4-bit numbers and employed special logic that does not generalize to larger numbers. We report that scalable circuits can be designed when self-assembling genetic units are introduced into the evolutionary design process. Fig. 1, referring to a general multiplier, gives an overview of this process that will be detailed further in the following. The figure shows self-assembling logic blocks (SLBs), to be discussed in detail in Sec. 2.1. A genome encodes the computational logic and the recognition sites of a limited number (typically six to ten) of different types of

4

SLBs. These SLBs spontaneously aggregate to form complete logic circuits whereby it is assumed that sufficient identical copies of each SLB are available. In this work, the recognition sites determine the docking of components to form a two-dimensional electrically connected array. This is achieved in simulation by providing (virtual) quadratic substrate boards onto which the SLBs assemble. The self-assembly principle remains valid in free solution, so the board is not a necessary feature of the presented method. The self-assembly process and the details of the recognition mechanism or the initialization of the substrate are presented in Sec. 2.2. We emphasize that the evolved circuits are scalable: the local matching rules allow multipliers of any desired overall size to be assembled by simply using more component copies (and correspondingly large substrate boards). The intrinsic possibility for a (potentially complex) global geometric regularity of selfassembled circuits is the basis for the evolution of scalable solutions to problems which are logically complicated but feature abstract internal regularity. In an autonomous evolutionary design process, one has to compare circuits representing solutions (or partial solutions) by referring exclusively to their outputs and not by externally qualified internal structures. Additionally, the fitness function has to be chosen to meet the requirements of scalable designs, see Sec. 2.3. One has to consider that evolutionary progress can result from two basic effects: structural improvements that lead to an enhanced performance on all (or a subclass of all) possible problem instances or erratic improvements resulting from adaptation to specific problem instances. In the case of multiplication, an example for the former is a circuit that realizes multiplications by powers of two via bit-shifts (note that in binary notation a multiplication by a power of two is only a bit shift) and therefore is able to deal with all possible multiplications of the form a*2n. On the other hand, an erratic improvement would result from accidentally acquiring the ability to reproduce the correct result of say 37*16 without an increase in performance on other multiplications. For the evolution of scalable solutions, only structural improvements are desired: erratic progress is not only nugatory, but may even lead a population into a hard-to5

escape local maximum of fitness. Scalable circuits are necessarily based on structural improvements; adaptation to specific problem instances without exploiting underlying logical structures is intrinsically limited, simply because the genome encoding the circuit components is of finite size. In order to exploit the intrinsic possibility for structural improvements resulting from employing self-assembling designs, we set up an autonomously regulated evolution scheme. This is achieved in the following way: besides the logical functions and recognition sites for a set of SLBs, each individual genome encodes a small evolvable set of problem instances (the “test vector”). The algorithm exploits frequency dependent selection: When individuals contest with respect to multiplication ability, one individual’s circuit is scored on its opponent’s test vector in pair-wise tournament selection, see Fig. 1e and Sec. 2.4. This differs from twin population co-evolutionary optimization36, in that offspring have to cope with the test vectors of their siblings. Even small test vectors (e.g. size 16) proved to be sufficient for the evolution of arbitrarily large multipliers or ALUs. The genetically linked “co-evolution” of test-vectors and circuit designs drove the population of test vectors automatically at a manageable rate towards the most convoluted multiplication tasks.

2 Methods Genetic self-assembly involves four aspects for which we detail our method below: 1. The self-assembling logic block (SLB) and its corresponding gene, encoding both, inter-block recognition sites and logical functionality. 2. The self-assembly and circuit synthesis. 3. The evaluation of these circuits using test vectors (lists of problem instances, here either a single number or a pair of numbers, as e.g. in the case of multiplication). A fitness function is provided that yields a modular quantification of partial success; this is a requirement for evolving scalable circuitry.

6

4. The evolutionary dynamics of populations of interacting proliferating individuals, including structured mechanisms of variation for encoded SLBs and test vectors. The interplay of encoding, self-assembly and evolutionary dynamics is shown in Fig. 2. Note that the genome of an individual carries information for three different types of entities used at distinct stages of the evaluation process: recognition patterns determining the self-assembly process, logic structures defining the functionality of the SLBs and finally a test vector, used in a tournament selection process. The fact that these different entities are encoded on the same genome leads to a coupling of their evolution. The specific model choices and parameters we discuss in the following seem to be the most natural and were chosen on the basis of simplicity, but some of them did prove critical for achieving rapid evolutionary optimization. In order to stress the distinction between basic properties of self-assembly and specific technical model choices, several of the latter are discussed in the appendix. This split also emphasizes the fact that the more general aspects of self-assembly are of fundamental relevance for successful evolution, whereas most of the technical conventions proved to be convenient or beneficial with respect to efficient evolution but not critical for success as such. In consequence, Fig. 3 to 5 refer jointly to Sec. 2 and the appendix. A complete list of all parameters and variables is given in Table 1.

2.1 Self Assembling Logic Blocks (SLB) The structure of a SLB is shown in Fig. 3a. The computational functionality is determined by four outputs (o0 to o3), each of which gives a signal that is a function of the four input signals (i0-i3). It would be possible to calculate each of the outputs using a four-bit function generator, making simple signal transfer (an output is directly connected to an input) rather sparse in the space of genotypes. This difficulty can be overcome by an encoding representing a phenotypical function as given in Fig. 3b, which leads to a natural bias towards signal transfer. The details of this encoding are not critical, only the fact that it establishes a

7

balance between function and routing. For the implementation chosen in this work see the appendix. The heterophilic recognition sites, represented in Fig. 3a by sequences of sockets (on the left and the upper edge) and plugs (on the right and the lower edge), are the basic structures determining the self-assembly process of the SLBs into a rectangular array. Our investigations showed that the recognition mechanism should exhibit the following features (for our implementation see the appendix). First, the balance between variability of self-assembly patterns and evolutionary efficiency is critical. In our simulations, this balance is controlled by the length of the recognition sequence and the size of the alphabet in use. And second, an evolutionary freedom to make positions in the recognition sequence promiscuous (in our implementation by the possibility to equip a position in the recognition sequence with no plug or socket). From an abstract point of view, this means that the evolution of a pattern can be achieved employing two different mechanisms, possibly in combination. Firstly, defining a pattern by constructing according recognition sequences and secondly, establishing a pattern starting from promiscuity by progressive exclusion of matches between specific types of components. The genome usually contains information for four to ten different SLBs with fully evolvable logic and recognition sites (each requiring 96 bits, see the appendix). Additionally, it encodes one auxiliary default block, which has only evolvable plugs and a fixed simplest logical functionality, namely just transmitting inputs to outputs, see Fig. 3d. Note that due to the fact that the socket recognition sites of this default block remain empty by definition over the whole course of evolution, this special SLB matches any combination of plugs and therefore ensures complete self-assembly for any genome. In all the results presented, the logical functionality of this default block is not evolvable. However, this turned out not to be critical. The encoded SLBs may differ both in their internal logic functions and in their recognition patterns. In order to restrict attention initially to feed-forward circuits,

8

we consider logical components with only inputs on two edges (top and left) and outputs on the other two edges. Thus, the square tiles are not invested with rotational degrees of freedom in this simple case. Hexagonal or other shaped tiles could also have been chosen. Having just two connections per edge was found to provide a suitable granularity for assembling complex digital processing. A simpler structure with one input or output per edge only allows two 2-input combinatorial functions per SLB and, while many such blocks can emulate the functionality of SLBs with two connections per edge, this does not provide a good balance between routing resources and logic. It turned out that restricting the maximal amount of logic in the SLBs may speed up evolution. This was implemented by requiring that only a given number nFG < 4 of the outputs of an SLB delivers a signal from a four-bit function generator, whereas the remaining outputs are connected either to ground or directly to an input. In the case of multiplication, nFG < 2 proved beneficial (although not crucial), whereas ALUs could only be evolved by allowing maximal use of function generators nFG = 4. The generalization of the presented structural elements and mechanisms to three dimensions is straightforward.

2.2 Self-assembly process and circuit synthesis Both, the logical interconnect and the overall logical circuit, are specified uniquely by the block self-assembly, mediated by the recognition/binding mechanism and taking place on a quadratic board (serving as a substrate) that determines the overall size of the circuit and provides the interface to the environment. Fig. 4 shows a board for only a small circuit, suitable for 2×2-bit-functions. Such a function f(x,y)→z has a 4bit output, denoted by Z0-Z3. In order to allow simplify routing, the input signals X0-X1 and Y0-Y1 are provided redundantly and remaining open inputs are connected to ground. Note that for nbit × nbit -problems we employ a board of size 4nbit × 4nbit as required by the presented interface to the input signals.

9

For initial concreteness and computation speed, the self-assembly process is chosen to be completely deterministic, whereby a SLB can be appended if the left and upper plugs do not mismatch (see above and Fig. 1c and 4). To initiate the self-assembly, the board provides two outer rims with evolvable, repetitive binding sites, acting as initialization for the self-assembly of the SLBs, which themselves are assumed to be available in as many copies as are necessary to complete the circuit. The recognition sites, as they are given in a simplified form in Fig. 4, lead to unequivocal matching of the SLBs but this is not the generic case. In order to resolve ambiguities, a binding energy is employed, i.e. if different SLBs match with a given plug-structure, the one leading to the higher binding energy is taken. The binding energy or binding quality is given by the number of truly matching plug-socket pairs, whereby promiscuous matches are not counted. If there still remains an ambiguity, the SLB encoded at the largest distance from the start of the genome is taken. If no fully evolvable SLB matches, the genome’s default SLB is plugged in; this is always possible due to the fact that its socket recognition region is fully promiscuous. Up to this point, the outer rims of the board have no functionality and their evolvability is restricted to a single repeated recognition site. A straightforward generalization is given by encoding a finite number of additional edge blocks on the genome and allowing edge self-assembly, illustrated in Fig. 5 and detailed in the appendix. Self-assembly of the edges proved beneficial for flexible evolution and was necessary e.g. in case of ALUs. In addition to establishing more complex recognition patterns on the rims of the board, the interface to the input signals is extended by edge blocks carrying some (simple) evolvable functionality. Based on the observation that the input structure and its evolvability is relevant for evolution speed, we plan in future to allow the system to assemble its own inputs and outputs at arbitrary positions, via the generic inter-component recognition mechanism, treating inputs and outputs as blocks with their own recognition patterns.

10

Examples for a scalable multiplier and an ALU are given in Fig. 6. Fig. 6a represents the different types of SLBs employed for the circuit, whereas Fig. 6c shows the self-assembly process. The scalability of the circuits was shown by identifying the detailed logical functionality of each type of SLB and analyzing the inductive properties of the assembled pattern. Note that checking for scalability turned out to be rather simple for the circuits we investigated: this may not necessarily hold for other cases. However, besides scalability analysis, the circuits presented in this work have been tested exhaustively for the indicated input size. Fig. 6d gives the SLBs leading to an ALU, the corresponding circuit is shown in Fig. 6e. The lowest bits of the inputs, x and y, (x0, y0), form the operator selection (x0y0) for the ALU (00 = addition, 01 = XOR, 10 = AND, 11 = OR). The examples presented are taken from a large variety of circuits that were evolved to handle the respective task. It is emphasized that the individual evolved multipliers differ considerably in their self-assembly patterns as well as in the logical functionality of their SLBs. The same holds for the ALUs.

2.3 Circuit evaluation For a particular run, the size of the board n = 4nbit was held fixed, at a large enough value (e.g. n = 24, 32 or 48) to deal with a whole range of different size multiplication problems up to a maximum size. Importantly, and in contrast with other evolutionary approaches, the evolution time (measured in circuit evaluations) proved independent of board size, above a minimum threshold (About four input bits for multiplication, entailing a 16×16-board.). In this way, we have been able to evolve solutions, tested during the evolution on up to 8×8-bit multiplication (4×4-bit usually suffices though), which also scale by the selfassembly process to multiply correctly 16×16-bit, 32×32-bit and larger multiplication tasks. Board-size independence demands for a fitness function also being independent of n. Before going into detail, two remarks explaining the underlying ideas are given. Firstly, board-size independence requires that the fitness of a specific circuit not be evaluated exhaustively (by considering all possible 22n inputs),

11

because then the fitness for boards of different size would be calculated with respect to different sets of problems. Instead, the fitness is evaluated with respect to subsets of problems (the test vectors). These test vectors are lists of input pairs and have fixed length nTV. Secondly, an n×n-bit function has in general a 2n-bit result string. Taking into consideration all of these 2n bits would again introduce an n-dependence to the fitness function. The fitness function we devised intrinsically determines the length of a substring (always starting at the least significant, Z0, bit) which then is compared with the corresponding string of the correct result. The actual fitness value of a given circuit is calculated according to the following scheme: 1. Calculate the result-strings (Z0-Z2n) for all elements of the test vector and compare them with the correct results. 2. Determine the number nbonus of consecutive bits (starting from Z0), which are correct for all elements of the test vector under consideration. 3. Starting from Z0 and going up to Zp, p = nbonus + ninitial award - 1 evaluate the total number ncorrect of correctly calculated bits for all nTV input pairs in the test vector. The fitness f is then given by f = nbonus +

ncorrect pnTV

For a visualization of this see Fig. 7. This scoring provides a graceful biasing of the n-bit×-bit task towards sub-problem completion and allows progress to be made on large tasks. The evaluation function thus has some features in common with the much studied blocked or “royal road” fitness function37. Independently of problem size, individuals have to both connect up external inputs with outputs and compute the appropriate logical mapping (e.g. multiplication). While rewarding the correct completion of all lower bits of a given task first, no problem specific assistance was provided.

12

2.4 Evolutionary dynamics To complete our description of genetic self-assembly, the evolutionary process of variation and selection in a population of individuals needs to be specified. Single and independent multiple bit mutation was allowed at different rates for the logical and recognition portions of the genes. In addition, on switching between the four functional categories of logic — function generator, arbitrary MUX, through connection or constantly zero — the logic was smoothed by choosing the closest matching functionality upon mutation. For example, on varying from a MUX to a function generator, the function generator encoding that particular MUX was chosen. This procedure proved reasonably efficient for evolutionary optimization, but is not deemed critical to our success. Secondly we included a variation mechanism involving SLB gene duplication (overwriting an existing SLB gene in the process to conserve sequence length). Thirdly, a general subclass of double mutations in the recognition portion of the genome were chosen at enhanced frequency. These mutations involve twin changes of opposing recognitions bits in juxtaposed edge pairs on two different, randomly chosen SLBs. This structured variation mechanism proved very effective in accelerating evolutionary optimization by inducing frequent changes of the self-assembled pattern. In consequence, four parameters determine the rate of variation: the bitnormalized mutation rates for recognition and logic, the gene duplication rate and the rate of twin changes of recognition sites. For the multiplier evolution we usually also restricted mutations in the multiplexer bits which activate the function generators, so that an SLB had at most two function generators in use at a time (see Sec. 2.1. above). This was not necessary but also sped up the evolutionary process. In the ALU example this restriction was not employed. In order to deal with the problem of exponentially increasing test vector sets as the bit length of multiplicands increases, without being restricted to a constant subset, we let the test vectors co-evolve with the circuits. The variation rate for

13

test vectors, the population size and the number of test vectors are then the remaining parameters characterizing the simulation.

3 Results 3.1 Inductive Generalization The main result presented in the paper is the proof that the use of genetically encoded self-assembling components enables the evolutionary design of scalable

circuits

from

examples

of

correct

functionality

by

inductive

generalization. In our simulations, scalable circuits were evolved without information about the structure of the task, using functionally unconstrained logic building blocks, in less than 24 hrs on a PC. Only 16 test tasks and 6 genes for SLBs were required per individual (in a population of 32 individuals), for the circuit to evolve the general ability to solve the posed problem, independently of problem instance size. The complexity of the logic employed in the different evolved circuits varies significantly, but self-assembly seems to provide a natural bias towards more regular logic arrangements, matching our intuition about simplicity. We have not had to introduce any evolutionary constraints favoring minimal or simple circuits. To get more insight into this phenomenon, we analyze in this section in detail the case of multiplication. Several types of circuit construction problems for multipliers do become formally hard (in NP), for minimal circuit resources38, and no scalable solutions are then expected. However, without any additional requirements, multiplication has hitherto become increasingly difficult to evolve in larger circuits, because correct solutions are lost in a large search space. We employed a fitness function depending cooperatively on predictions of individual bits in the test products, one that will work for variable length binary products (see Sec. 2.3). The fitness function does not optimize the circuits for compactness, nor does it provide any problem-specific assistance. Instead, we have taken pains to establish a generic unbiased set of logical primitives in order to demonstrate de novo evolution of the desired functionality.

14

Fig. 8 describes the process of inductive generalization for a scalable multiplier circuit. In previous work on smaller multipliers, most optimization time is spent fitting the (drifting) last outstanding multiplications into the almost perfect circuit. By contrast, with genetic self-assembly, there is a time point at which the system captures general features of the ability to multiply (see Fig. 8a). This is accompanied by a sudden increase in the number of completely correctly predicted product bits. Statistical analysis of this phenomenon revealed a clear peak, the “inductive hill” (see Fig. 9a), in waiting times for successive perfection on bits 6 to 8 of the products, with waiting times decreasing to zero for higher bits. The zero waiting time for higher bits reflects the fact that if the genome encodes components which self-assemble to a circuit that solves correctly the lowest eight bits, the system has mastered the task of multiplication in all generality. No time is needed for the evolutionary solution of higher bits, because the circuit is scalable, indicating successful inductive generalization. Fig. 9b shows the phenomenon of the inductive pass by giving a statistical measure for the waiting time needed for completing multiplication up to a number of s bits in dependence on the mutation rate rTV for the test vector portion of the genome. These waiting times were derived from time series such as in Fig. 8a, giving the maximal fitness in the population. In order to provide a statistically significant picture of the phenomenon of inductive generalization, we defined the time ts to be the first time at which an individual in the population calculated correctly all the lowest s bits of the problem instances posed by its opponent in a tournament. The waiting times are then given by the difference between ts and ts1.

Two further peculiarities of Fig. 9b have to be noted. Firstly, the runs we

investigated were of limited length (20 million individual tournaments) and not all of them yielded a solution. In that case, the incompleted waiting times were set equal to the total length of the run, which explains the plateau for very high and very low rTV. Secondly, the data in Fig. 9 represent the third quartile for the corresponding waiting times; this statistic has been derived from 40 simulation runs for each value of rTV.

15

As already stated, these waiting times are defined with respect to time series as shown in Fig. 8. This means that they refer only to those few multiplications coded in the employed test vector, which may appear to be only a weak indication for having solved the problem of multiplication completely. However, the probability of randomly predicting n products for multiplicands of bit length r approaches 2-2nr for large r, already vanishingly small for the number of samples during simulation for n=4 and r=6. Initial concerns about fluctuations making evolution difficult turned out to be allayed by the population dynamics of the test vectors, which served to significantly dampen fluctuations in the evaluation process. Note further that the correctness of the presented circuits in this work is not only justified by the above probability argument; it has been tested exhaustively and the scalability of the circuits has been shown by an analysis of their internal logic. Similar results can be seen for the evolutionary design of ALUs or other scalable circuits, such as adders or binary-to-Gray code converters.

3.2 Evolutionary Dynamics To investigate the surprising potential of genetic self-assembly further, we traced the time course of the test vectors, multiplication success and structural selfassembly in Fig. 8b and 8c. That multiplication can be performed recursively is well known, but this has hitherto proved difficult to detect from examples. In binary form, the induction may be expressed in terms of the initial conditions a.) and b.) and recursive relationship c.) as in equation A below: A. Standard recursive multiplication: a. ) a * 0 := 0 b.) a * 1 := a c.) a * b := (a * (b mod 2))

+

(2a * [b/2])

B. Recursive no carry multiplication: a. ) a *nc 0 := 0 b. ) a *nc 1 := a

16

c. ) a *nc b := (a * (b mod 2)) XOR (2a *nc [b/2])

where * is the multiplication operation, := represents a definition and the square brackets indicate the “integer part of”. In c), the first term corresponds to either definition a.) or b.), since (b mod 2) is 0 or 1. The second term involves the product of two terms obtained by multiplication and integer division by 2. These operations are binary shift operations. Although the first term gets larger, the recursion still terminates, because the second term eventually decreases to either 1 or 0. Of the operations, the only operation non-local in the binary representation is the addition operation in c): it involves carry propagation in general. In order to further dissect the inductive principle, which self-assembly enables our evolutionary process to discover, we investigated a subclass of multiplications in which a purely local processing of information suffices. The exclusive or (XOR) operation captures the essence of addition without carry, and hence multiplication without carry as shown in B above. The classification used in Fig. 8 corresponds directly to a dissection of this induction. Multiplications by zero are collected in class I, by unity in class II and by powers of two in class III. Multiplication pairs (a, b), for which the calculation above is the same on replacing + by bitwise XOR, do not require carry-operations and are in class IV. These classes can be understood in terms of circuit sophistication: in order to solve problems in class I, a uniform (zero) output is sufficient, class II needs transfer of the input over the circuit to the outputs, class III requires shift operations, class IV a local form of addition and finally class V involved carry logic. The evolutionary process discovers the inductive principle for multiplying vectors in the simpler classes I-III, then IV, and finally the non-local class V. These classes are first solved with circuits that only work for small examples and then in full generality. We have shown that the potential of genetically encoded self-assembly for inductive generalization can be exploited by evaluating the fitness of a circuit on a very limited number of mutually exchanged test problems in each step. This procedure, besides being computationally efficient, leads to a co-evolutionary coupling between circuit designs and test vectors (see Fig. 8c) driving the test 17

vector population towards logically convoluted problems. The fact that such a coupling can be observed is no surprise because a genome confronting its adversary in the tournament selection process with a “difficult” problem is likely to have an advantage. The question arises whether this coupling is only an artifact or whether the problem exchange procedure fosters the evolution of structured designs. Fig. 9 and Fig. 10 indicate the latter to be true by presenting statistics for the waiting time for inductive generalization depending on the mutation rate of the test vectors rTV. For each value of rTV, at least 30 individual runs were performed. For reasons of CPU-time cost and in contrast to Fig. 8, the statistics refers to 6×6-bit multipliers instead of 8×8-bit or larger multipliers. However, several random samples were tested to scale up to 8 bits and no exception to scalability was encountered. One observes that if the co-evolutionary dynamics is disrupted by too large value of rTV, the waiting times increase strongly. This behavior is also found for very low rTV resulting in the “inductive pass” observed in Fig. 9b. In order to understand this, one has to consider that there are two basic strategies for coping with the problem instances in the test vector population. One is to find structured solutions (they may be partial and only be valid for subclasses of the complete set of problems) and the other is to adapt to the actual members of the problem population. The latter strategy may lead to a fast increase of fitness at an early stage or fast adaptation to occasionally emerging new problem instances, but is more vulnerable to fluctuations in the test vector population. This means that if rTV is too low, the evolution of structured solutions is hindered by the relative success of special-case solutions. This interpretation is corroborated by the results shown in Fig. 11, which basically represents the statistics of the ratio C of relative success in classes I-IV over that in class V. This relative success is calculated by counting the number of correct bits divided by the total number of result bits for all m × m-bit multiplications, this with respect to the corresponding set of classes (We emphasize that the ratio C is not taken only for those problem instances in the test vector population but for

18

all possible 6×6-bit problems). The number C varies of course over an evolutionary run; in order to get statistics over several runs, we took the median value of each individual run and depicted the resulting data set for around 30 runs for each value of rTV in Fig. 11 using box plots. The basic idea is that in case of sole adaptation to specific multiplication tasks, this ratio is expected to be close to one (single instance multiplication is, with the exception of multiplication by zero, of approximately equal difficulty for all bit strings), whereas the evolution of generalizing circuits (with earlier success for classes I-IV than V) is expected to yield a C significantly above one. This can in fact be observed in Fig. 11. The median is used as a statistic, because for the time course of evolutionary processes, because it provides better characterizations than mean values as expected for exponentially distributed waiting times in innovative processes39. We conclude that a value of rTV small enough to allow the establishment of coevolutionary coupling of circuit designs and test vectors but sufficiently large to devaluate adaptation to specific problem instances supports the efficient evolution of inductive generalization in self-assembling structures.

4 Discussion Self-assembly encoded structures yield a rather general but biased sampling of possible functions. This was evidenced by the ability of genetic encoding to solve a range of problems, including finding a scalable ALU. In order to distinguish selfassembly genetic guidance from effects due to component genetic encoding, we investigated several different encoding schemes for components. Recognition patterns must be sufficiently diverse to provide a rich set of self-assembly patterns and the number of connections between SLBs must be large enough to allow efficient routing. Encodings in which routing connections (wires) had to be realized as special cases of multi-input logic functions increased evolution time significantly. Restricting the maximal number of combinatorial functions per SLB from four to two proved to be beneficial in the case of multiplier evolution but not for ALUs; but both profited from coding the functions to give a bias for direct I/Oconnections (routing). This is achieved via in built genetic biasing, see Sec. 2 and

19

Fig. 3; e.g. simple transfer of an input signal need not be realized by a four input function generator, the evolvable multiplexer allows direct routing. Similarly, the variation mechanism has a strong potential influence on the sampling of new structures. Besides single bit (point) mutations, the addition of duplication events on the genes for individual SLBs was evaluated. This was seen to foster smooth differentiation of component recognition for the assembly process and similarly diversification of logical functionality from simple routing connections. Mutation also change the sampling of new components. Whereas the genetic selfassembly process was relatively robust towards changes in mutation rates for circuit components, the mutation rate of test vectors showed a distinct optimal value (see Fig. 9 and 10). The optimal size of the test vector also turned out to be small (16 problem instances) and to yield complete multipliers within available computation time only for sizes between 4 to 64. Finally, the choice of a modular and scalable fitness function giving a bias towards perfection on subtasks turned out to be important; the fitness function reported proved applicable for all problems investigated. The simple self-assembly process employed in this paper is a mere caricature of complex regulated physical self-assembly. In particular, there is an additional redundancy and robustness required in physical self-assembly, which is an error prone process. The simplification adopted here, in which self-assembly is only allowed to proceed if both neighbors are already in place, and in which exact matching (taking promiscuous symbols into account) is required, makes the selfassembly algorithm deterministic. In fact, in the present form, the two dimensional build up is formally equivalent to the time course of a one dimensional cellular automata rule (CA), in which the next state is dictated by the two neighboring cells on the previous diagonal. If we restrict attention to the selfassembly of the recognition patterns, ignoring differences of content, the number of such rules can be readily calculated. Ignoring promiscuity symbols, for a binary pattern length p=2 this number is equal to the number of possible exposed recognition patterns at an assembly site (22p) raised to the power of the number of possible input patterns on the two binding edges of an assembling block (also

20

22p). The result is 44=64 for p=1 and 1616≈2.1019, demonstrating the rapid rise in self-assembly variation as building block edge diversity increases to typical nucleotide levels. For p=2, the restricted number s of SLBs in the genome provides a stronger limitation for s

Suggest Documents