Metaheuristics applied to Bioinformatics problems

Metaheuristics applied to Bioinformatics problems

Metaheuristics applied to Bioinformatics problems Jean-Michel Richer [email protected] http://www.info.univ-angers.fr/pub/richer

IBSS 2011 - Tanger, Maroc 1 / 114

Metaheuristics applied to Bioinformatics problems

Aim

Aim see how Metaheuristics are used to solve different kinds of problems in bioinformatics Multiple Sequence Alignment Phylogenetic Reconstruction other problems

2 / 114

Metaheuristics applied to Bioinformatics problems Outline

Outline

1

Multiple Alignment

2

Phylogenetic Reconstruction

3

Other problems

4

Conclusion

5

Bibliography

3 / 114

Metaheuristics applied to Bioinformatics problems Multiple Alignment

Multiple Alignment

Multiple Alignment

4 / 114


What is an alignment ?

Definition (alignment) Given a set S = {S1 , . . . , Sk } of sequences, find a matrix M(k, n) s.t. P max(|Si |) ≤ n ≤ i=k i=1 |Si | each character M[i, j] is a residue or a gap −

there is no column such that all characters are gaps M[i] = Si if we remove all the inserted gaps

5 / 114


What is a good alignment ?

Main question Given a set of sequences S, what is a good alignment for S ?

6 / 114



Main question Given a set of sequences S, what is a good alignment for S ?

Answer nobody can tell !

7 / 114



Example 1 - ATTTC and ATTC A A

T T

T T

T -

C C

A A

T T

T -

T T

C C

A A

T -

T T

T T

C C

all are equivalent

8 / 114



Example 2 - ATTTTC, ATC and CTC A A C

T T T

T -

T -

T -

C C C

A A C

T T T

T C C

T -

T -

C -

ATT = Ile, ATC = Ile, CTC = Leu, TTC = Cys

9 / 114


What is a good alignment ? Example 3 - but what if ATTTTG, ATC and CTC A A C

T T T

T -

T -

T -

G C C

A A C

T T T

T C C

T -

T -

G -

I I L

L -

A A -

T T -

T C -

T C

T T

G C

I I -

L L

ATT = Ile, ATC = Ile, CTC = Leu, TTG = Leu 10 / 114


Quality of an alignment

Sum of pairs a function to assess the quality of an alignment M substitution matrix (PAM, BLOSUM, GONNET, ...) gap cost model (linear, affine, concave, ...) SP(M) =

j=k i=k −1 X X

i=1 j=i+1

r =n X r =1

!

w (M[i, r ], M[j, r ])

11 / 114


Quality of an alignment

Sum of pair

ATTCTCTTATATA... ATTGTGTTATTTT... CTTCTCTTATTCT... CTACTCTTATTCT... evaluation does not depend on the sequence order

12 / 114


Pairwise Sequence Alignment

Definition (Pairwise Sequence Alignment - PSA) Given two sequences S and T , find an alignment M(2, n) s.t. max(|S|, |T |) ≤ n ≤ |S| + |T |

SP(M) is optimal for the Sum of Pairs function

13 / 114


Pairwise Sequence Alignment

How to sove this optimization problem ? Metaheuristics search the best alignment in all alignments for 2 sequences of length n for a given objective function f : a maximization problem Dynamic Programming decomposition into subproblems

14 / 114


Pairwise Sequence Alignment How to sove this optimization problem with Metaheuristics number of alignments for 2 sequences of length n : kX =n

k =0

k n Cn+k × Cnk = 1 + C2n = 1+

(2n)! (n!)2

approximately : 22n ∼√ πn for two sequences of length n = 100, there are 9.05 × 1058 alignments

15 / 114


Dynamic Programming

Definition (Dynamic Programming) introduced by Richard Bellman 1954 [3] a method for solving complex problems by breaking them down into simpler subproblems where one needs to find the best decisions one after another the word programming referred to the use of the method to find an optimal program

16 / 114


PSA with Dynamic Programming

PSA with Dynamic Programming - Recursive formula the best alignment between S[1..i] and T [1..j] is :   align(S[1..i − 1], T [1..j − 1]) + w (S[i], T [j]) max align(S[1..i], T [1..j − 1]) + w (−, T [j])  align(S[1..i − 1], T [1..j]) + w (S[i], −)

17 / 114


PSA with Dynamic Programming PSA with Dynamic Programming

S i−1

Si 1

Tj−2

Tj

1

2 2

3

3

Si Tj−1 Tj S i−1S i Tj

gap insertion mismatch

Tj−1

S i−1 S i Tj−1 Tj

substitution match

S i−2

18 / 114



PSA with Dynamic Programming complexity : O(n2 ) however, what to do in case of equivalences ? A A - - A T - T T -

19 / 114



PSA with Dynamic Programming complexity : O(n2 ) however, what to do in case of equivalences ? A A - - A T - T T -

20 / 114


PSA - Example

Example (Parameters) S = ACAGTC T = CATTGC match : w (a, a) = 1 substitution : w (a, b) = 0 linear gap penalty : go = 0

21 / 114


PSA - Example

Initialization of matrix M M[0, 0] = 0 M[i, 0] = M[i − 1, 0] + go M[0, j] = M[0, j − 1] + go

∀i ∈ [1, N] ∀j ∈ [1, P]

22 / 114


PSA - Example

Recurrence relation M[i − 1, j − 1] M[i, j − 1]

տ ←

M[i − 1, j] ↑ M[i, j]

Recurrence   M[i − 1, j − 1] +w (xi , yj ) M[i, j] = max M[i − 1, j] +go  M[i, j − 1] +go 23 / 114


PSA - Example

initialization S/T C A T T G C j

0 0 0 0 0 0 0 0

A 0

C 0

A 0

G 0

T 0

C 0

1

2

3

4

5

6

i 0 1 2 3 4 5 6

24 / 114


PSA - Example

recurrence on M S/T C A T T G C j

0 0 0 0 0 0 0 0

A 0 0 1 1 1 1 1 1

C 0 1 1 1 1 1 2 2

A 0 1 2 2 2 2 2 3

G 0 1 2 2 2 3 3 4

T 0 1 2 2 3 3 3 5

C 0 1 2 2 3 3 4 6

i 0 1 2 3 4 5 6

25 / 114


PSA - Example

Traceback from M

26 / 114


PSA - Example

Alignments There are 5 optimal alignments : -CATTG-C ACA--GTC

-CAT-TGC -CATTGC ACA-GT-C ACAGT-C -CA-TTGC ACAGT--C

-CA-TTGC ACAG-T-C

which one is the best ?

27 / 114


PSA and Dynamic Programming

Different kinds of alignments global with linear gap penalty Needleman and Wunsch, 1970 [21] global with affine gap penalty Gotoh 1982 [14] local (Smith et Waterman 81 Smith and Waterman, 1981 [30]

28 / 114


Multiple Sequence Alignment


29 / 114



Definition (Multiple Sequence Alignment - MSA) Given a set of k sequences S, find an alignment M(k, n) s.t. P max(|Si |) ≤ n ≤ i=k i=1 |Si | SP(M) is optimal for the Sum of Pairs function

30 / 114


Complexity of MSA

Complexity of MSA intractability proved by Wang and Jiang, 1994 [33] complexity of Dynamic Programming extension : O(k 2 × 2k × l k ) or O(2k × l k ) for a set of k sequences of length l

31 / 114


Trick

Trick

32 / 114


MSA and Carrillo Lipman

Carrillo and Lipman trick to reduce complexity Carrillo and Lipman, 1988 [4] decrease complexity by considering part of the matrix computations

33 / 114


MSA and Carrillo Lipman Carrillo and Lipman trick to reduce complexity

1111 0000 00000 000 000 111 000011111 1111 00000111 11111 000 111 000 111 00000 11111 000 111 000 111 00000 11111 000 111 000 111 00000111 11111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 in 2 dimensions

34 / 114


a first heuristic : Clustal

a first heuristic : Clustal

35 / 114


Clustal : a heuristic method

Clustal 1 generate a guide-tree (UPGMA, NJ, ...) 2

align profiles (∼ consensus) along the tree branches

36 / 114


Clustal : a heuristic method

Characteristics of Clustal influence of guide tree loss of precision with profiles complexity : O(k 2 × n2 )

37 / 114


Muscle

Muscle MUltiple Sequence Comparison by Log Expectation Edgar, 2004 [7] use of a much faster, but somewhat more approximate, method to compute distances use of UPGMA for better progressive alignment because forces alignment of most similar sequences first improvement of alignement

38 / 114


Muscle

Muscle - first step k − mer clustering to build a tree

do not construct an alignment count the number of short sub-sequences (known as k-mers) that two sequences have in common around 3,000 times faster that Clustal’s method but the trees will generally be less accurate

39 / 114


Muscle

Muscle - second step use the tree to construct a progressive alignment proceed as Clustal

40 / 114


Muscle

Muscle - third step build a tree from alignment and compare to initial tree compute the pair-wise identities of each pair of sequences obtain distance matrix and build new tree if trees are identical, nothing to do otherwise rebuild a new alignment process this tree refinement until the tree stabilizes or until a specified maximum number of iterations has been reached

41 / 114


Modification of the fitness function

Modification of the fitness function

42 / 114


Modification of the fitness function : COFFEE

Modification of the fitness function : COFFEE Consistency based Objective Function For alignmEnt Evaluation introduced by Notredame, Holm and Higgins, 1998 [24] describes the quality of a multiple protein sequence alignment first evaluate a library and then compute COFFEE for a given alignment

43 / 114



COFFEE : build the library the library is made of all pairwise alignments of k sequences for example : use clustal to obtain the pairwise alignments

44 / 114



COFFEE : evaluate compare each aligned residues of the MSA to library simple version : number of pairs of residues and library divided by total number of pairs in MSA

45 / 114


Metaheuristics for MSA

Metaheuristics for MSA

46 / 114


SAGA a metaheuristic approach to MSA

SAGA - Sequence Alignment by Genetic Algorithm designed by Notredame and Higgins 1996 [23] population of alignments submitted to GA

47 / 114



SAGA - variation operators 19 mutation operators (add, delete, move gaps) 6 crossover operator (modify or combine gap regions) use of an OS (Operator Scheduling) strategy to select which operator to apply

48 / 114



SAGA - Operator Scheduling strategy each operator is assigned a probability (initially equal) operators are rewarded when they create better individuals in the population motivation : difficult to know in which order to apply operators kind of a natural selection for operators

49 / 114


MSA-EA - an improvement of Clustal

MSA-EA introduced by Thomsen, Fogel and Krink, 2003 [32] a Genetic Algorithm use solution of Clustal as a seed (1.2 length + gaps at end) apply variation operators (BlockShuffle, LocalShuffle, ...)

50 / 114


MSA-EA - an improvement of Clustal Variation operators : example of LocalShuffle picks a random AA from a randomly chosen row (sequence) checks whether one of its neighbors is a gap if so, swap (if both neighbors are gaps then one of them is picked randomly)

Variation operators discussion for LocalShuffle no biological meaning no optimized choice

51 / 114


Conclusion for MSA

Resolution of MSA Metaheuristics take too much time iterative Dynamic Programming seems sufficient and efficient enough : ClustalW who really knows what a correct or good alignment is ?

52 / 114


Conclusion for MSA

Some results on balibase : SPS score with bali score Softwares CLUSTAL MAFFT MUSCLE PROBCONS TCOFFEE MALINBA

Set 1 0.809 0.829 0.821 0.849 0.814 0.811

Set 2 0.932 0.931 0.935 0.943 0.928 0.911

Set 3 0.723 0.812 0.784 0.817 0.739 0.752

Set 4 0.834 0.947 0.841 0.939 0.852 0.899

Set 5 0.858 0.978 0.972 0.974 0.943 0.942

Time (s) 120 98 75 711 1653 343

53 / 114

Metaheuristics applied to Bioinformatics problems Phylogenetic Reconstruction



54 / 114


Methods for Phylogenetic Reconstruction

Methods for Phylogenetic Reconstruction distance based ∼ O(n3 )

UPGMA, WPGMA NJ BioNJ (Gascuel, 1997 [8])

character based : Maximum Parsimony Maximum Likelihood

55 / 114


Maximum Parsimony

Maximum Parsimony character-based approach that relies on the work of the german entomologist Willy Hennig (1913-1976) based on Occam’s razor (1285–1349) or principle of economy

56 / 114


Maximum Parsimony

Maximum Parsimony input : a multiple sequence alignment of n sequences of m residues algorithm : find the tree with minimum changes output : a tree of minimum score, i.e. with minimum changes minimization problem

57 / 114


Small parsimony problem

Definition (Small parsimony problem) given a multiple alignment of length m of a set L of n sequences and a tree T whose leaves are labelled with sequences of L, find the parsimony score of T . complexity : O(n × m)

58 / 114


Large parsimony problem

Definition (Small parsimony problem) given a mulitple alignment of length m of a set L of n sequences, find a most parsimonious tree T , i.e. a tree with minimum parsimony score complexity : factorial

59 / 114


Maximum Parsimony

Number of trees number of taxa 10 20 30 40 50 80 n

# of unrooted trees 2.0e+06 2.2e+20 8.6e+36 1.3e+55 2.8e+74 2.1e+137 Qn i=3 (2i − 5)

# of rooted trees 3.4e+07 8.2e+21 4.9e+38 1.0e+57 2.7e+76 3.4e+139 Qn i=2 (2i − 3)

60 / 114


Phylogenetic reconstruction with Maximum Parsimony

Maximum Parsimony is a combinatorial optimization problem for which we must find efficient methods a minimization problem

61 / 114


Methods

Methods branch and bound local search methods genetic algorithms memetic algorithms other optimization techniques

62 / 114


Branch and bound

Branch and bound introduced by Hendy and Penny 1982 [15] generate a first tree as upper bound then create trees by iteratively adding new taxon under upper bound

63 / 114


Branch and bound

Image from Mikael Thollesson (c) 64 / 114


Branch and bound

Drawback of Branch and bound too many trees are generated maximum number of taxa : 20

65 / 114


LS descent

LS descent algorithm Algorithm 1 descent(S, f , N) s ← choose or generate an initial configuration ∈ S for a given number of iterations do find s′ ∈ N(s) such that f (s′ ) < f (s) or return s s ← s′ end for return s

66 / 114


LS generation of the initial configuration

Generation of initial configuration random (sometimes too far from optimum) branch and bound stepwise addition

67 / 114


LS generation of the initial configuration

Generation of initial configuration random (sometimes too far from optimum) branch and bound stepwise addition

Stepwise addition new taxon added to all branches and keep most parsimonious tree

68 / 114


Acceptance of new configuration

Acceptance of new configuration strict descent : only improving configuration side-walk descent : improving or equivalent neighbors random-walk : possibility to accept deteriorating neighbors simulated annealing : specific random walk with a non-constant probability

69 / 114


Escape from local optimum

Noising techniques Iterated Local Search : perturbation of current configuration Parsimony Ratchet : modification of fitness function

70 / 114


Parsimony Ratchet

Parsimony Ratchet introduced by Nixon or Horovitz, 1999 [22, 17] When a local optimum s⋄ is reached, the Ratchet noises the evaluation function : the weights of a proportion of the characters (10-15%) can be increased or some characters can be eliminated a LS is performed from s⋄ using the noising evaluation function f ′ continue LS back to original objective function f

71 / 114


Neighborhoods for MP

Neighborhoods A configuration is a tree, three main neighborhoods exits NNI : (Nearest Neighbor Interchange) [34], small size SPR : move (Subtree Pruning Regrafting) [31], average TBR (Tree Bisection Reconnection) [31], large

72 / 114



NNI consists in swapping two subtrees which are separated by a branch size : (2n − 6)

extension : p-ECR which shuffles p adjacent branches. In particular, 1-ECR is equivalent to NNI.

73 / 114



SPR cuts a branch and creates two separate trees : the clipped tree and the residual tree The clipped tree can then be regrafted on each branch of the residual tree size : at least (2n − 3) × (n − 3)2

74 / 114



TBR breaks the tree into two subtrees and reconnects the re-rooted clipped tree to any branch of the residual tree The clipped tree can then be regrafted on each branch of the residual tree size : 2 × (n − 3) × (2n − 7)

75 / 114



Comparison of NNI, SPR and TBR NNI : O(n) SPR : O(n2 ) TBR : O(n3 ) NNI ⊆ SPR ⊆ TBR

76 / 114



Variable Neigborhoods NNI, STEP, SPR, where STEP is a SPR for which only leaves are pruned was proprosed by Ribeiro SPR + 2-SPR Ribeiro, Vianna, 2005 [25] (where k-SPR is the composition of k SPR transformations) ¨ Parametric Progressive Neighborhood (PPN) Goeffon, Richer, Hao, 2007 [11] : SPR → NNI

77 / 114



Parametric Progressive Neigborhood others : increase neighborhood size PPN : decrease neighborhood size

78 / 114


Neighborhoods for MP Parametric Progressive Neigborhood 6000 5500 5000

score of the current tree

4500 4000 3500 3000

NNI

2500 SPR

2000

PPN

1500 0

5000

10000

15000

20000

25000 iterations

30000

35000

40000

45000

50000

Evolution of the score of trees for SPR, NNI and PPN with a 300-100 random instance starting from a random tree 79 / 114


Other LS algorithms

Other LS algorithms Tabu Search Yu-Min, Shu-Cherng, Jeffrey, 2007 [36] Simulated Annealing (LVB Barker 2004 [2]) GRASP + VNS Ribeiro et al [1, 25]

80 / 114


Genetic Algorithm Genetic Algorithm Algorithm 2 GeneticAlgorithm(S, f , x) P ← { choose or generate n individuals ∈ S } for a given number of crossovers x do p, q ← select-parents(P) r ← crossover (p, q) mutation(r ) if selection(r) then replace(P, r ) end if end for 81 / 114


Genetic Algorithm

Genetic Algorithms for MP [18, 5, 6, 26] tree crossover operators follow the subtree cutting and regrafting strategy but crossover and mutation should be tailored to the target problem in order to integrate problem-specific constraints and thus improve the search

82 / 114


Genetic Algorithm

A specific crossover operator Distance-Based Information Preservation Crossover ¨ introduced by Goeffon, Richer, Hao, 2006 [10] based on the notion of topological distance between two leaves aims to preserves common properties of parents in terms of topological distance between taxa

83 / 114


Topological distance

Definition (Topological distance) let i and j be two taxa of a tree T the topological distance δT (i, j) between i and j is defined as the number of edges of the path between parents of i and j minus 1 if the path contains the root of the tree

84 / 114


Topological distance

Topological distance Distance δ T

Tree T k j f

h C g

A

F

B D

B C D E F

A 0 1 3 3 2

B C D E 1 3 2 3 2 0 2 1 1 1

E

85 / 114


DiBIP crossover algorithm

DiBIP crossover algorithm Algorithm 3 DiBIP(T1 , T2 , δT , ∆, ⊕, Λ) Di ← ∆(Ti ) for (i = 1, 2) D ∗ ← D1 ⊕ D2 T ∗ ← Λ(D ∗ ) return T ∗

86 / 114


DiBIP example Tree 1 D I A K J B L N G C M F E H

87 / 114


DiBIP example

Tree 1 - topological matrix D1 A B C D E F G H I J K L M N

A 6 5 1 5 5 5 5 0 5 2 7 5 7

B B 3 5 5 5 3 5 6 1 4 1 5 1

C

D

E

F

G

H

I

J

K

L

M

N

C 4 4 4 0 4 5 2 3 4 4 4

D 4 4 4 4 1 4 1 6 4 6

E 2 4 0 5 4 3 6 2 6

F 4 2 5 4 3 6 0 6

G 4 5 2 3 4 4 4

H 5 4 3 6 2 6

I 5 2 7 5 7

J 3 2 4 2

K 5 3 5

L 6 0

M 6

N -

88 / 114


DiBIP example Tree 2 M B F L J K A E D H C G I N

89 / 114


DiBIP example

Tree 2 - topological matrix D2 A B C D E F G H I J K L M N

A 8 4 1 0 9 4 2 6 7 4 9 6 6

B B 6 7 8 1 6 6 4 1 4 1 2 4

C

D

E

F

G

H

I

J

K

L

M

N

C 3 4 7 0 2 4 5 2 7 4 4

D 1 8 3 1 5 6 3 8 5 5

E 9 4 2 6 7 4 9 6 6

F 7 7 5 2 5 0 3 5

G 2 4 5 2 7 4 4

H 4 5 2 7 4 4

I 3 2 5 2 0

J 3 2 1 3

K 5 2 2

L 3 5

M 2

N -

90 / 114


DiBIP example

D ∗ = D1 + D2 D2 A B C D E F G H I J K L M N

A 14 9 2 5 14 9 7 6 12 6 16 11 13

B B 9 12 13 6 9 11 10 2 8 2 7 5

C

D

E

F

G

H

I

J

K

L

M

N

C 7 8 11 0 6 9 7 5 11 8 8

D 5 12 7 5 6 10 4 14 9 11

E 11 8 2 11 11 7 15 8 12

F 11 9 10 6 8 6 3 11

G 6 9 7 5 11 8 8

H 9 9 5 13 6 10

I 8 4 12 7 7

J 6 4 5 5

K 10 5 7

L 9 5

M 8

N -

91 / 114


DiBIP example T ∗ = Λ(D ∗ ), UPGMA M F B L J N H E D A K I C G

92 / 114


Memetic algorithm

Memetic algorithm for MP combination of a GA helped by a LS improver [19] Implementations : [16, 26] ¨ Hydra Goeffon, Richer, Hao, 2007 [11] is an implementation of a MA

93 / 114


Other optimization methods

Other optimization methods Sectorial Search Goloboff, 1999 [13] Disc-Covering Methods (DCM) [20, 29] Fast character optimization techniques [12, 9, 28, 35] Multi-character optimization techniques [28, 27]

94 / 114


Sectorial Search

Sectorial Search focus on a part of the tree decomposition into subproblems : divide and conquer

95 / 114


Fast character optimization

Fast character optimization a tree modification does not imply to recompute the overall tree complex method but very effective TNT (Tree analysis using New Technology) Goloboff, 1999 [13] : billions of tree in a few seconds

96 / 114


Multi-character optimization

Multi-character optimization use of vector registers of modern CPU to compute in parallel first version Ronquist 2000 [28] for PowerPC Richer 2008 [27] release of the code for Intel and AMD processors with SSE instructions

97 / 114


Multi-character optimization algorithm Algorithm 4 fitch(x, y, z : array[1..m] of bytes) : int changes ← 0 for i ← 1 to m do z[i] ← x[i] ∩ y[i] if (z[i] == 0) then z[i] ← x[i] ∪ y[i] changes ← changes + 1 end if end for return changes 98 / 114



algorithm Code each nucleic acid by a power of 2 : char A C G T -

value 1 2 4 8 16

power 20 21 22 23 24

binary 00001 00010 00100 01000 10000

A ∪ C = 3 = 00011

99 / 114



Intel and AMD processors

SSE Register 16 bytes long xmm4 1

0

4

xmm1 1

4

6

xmm2

3

2

5

xmm3

3

6

7

AND Combine OR

100 / 114



Multi-character optimization for Intel and AMD time in seconds compiled with nasm for Linux CPU / method Pentium-M 2.0 Ghz Pentium 4 2.8 Ghz Athlon 64 2.2 Ghz Intel Q6600 2.4 Ghz Core i7 860 2.8 Ghz

-O2 48.28 60.15 43.87 40.91 30.58

SSE 7.38 15.61 11.86 2.47 1.89

% -84 -74 -72 -93 -93⋆

⋆ : -95% if use of POPCNT

101 / 114

Metaheuristics applied to Bioinformatics problems Other problems

Other problems

Other problems

102 / 114

Metaheuristics applied to Bioinformatics problems Other problems

Other problems

Other problems DNA fragment assembly : build DNA sequence from thousands of overlapping fragments Gene expression profiling : find smallest subset of genes that regulate other genes or involved in diseases Structure prediction : determine 2D or 3D structure of protein docking : find the best candidate for a substrate ...

103 / 114

Metaheuristics applied to Bioinformatics problems Conclusion

Conclusion

Conclusion

104 / 114


Reusability and genericity

Reusability and generecity of Metaheuristics Metahreuristics can be applied to a wide range of problems and especially to problems in bioinformatics but components of Metaheuristics must be tailored to the problem

105 / 114


Decrease complexity by Parallelism

Decrease complexity by Parallelism calculations are carried out simultaneously on the same processor (multicore) on different processors : cluster use of n processors : breaks complexity by a factor of n divide time by a factor < n

106 / 114


Decrease complexity by Cloud / Grid computing

Decrease complexity by Cloud / Grid computing a different way to obtain the power of a cluster provision of computational resources on demand via a network

107 / 114


Questions and answers

Questions and answers

108 / 114

Metaheuristics applied to Bioinformatics problems Bibliography

Bibliography I [1]

A.A. Andreatta and C.C. Ribeiro. Heuristics for the phylogeny problem. Journal of Heuristics, 8 :429–447, 2002.

[2]

D. Barker. Parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics, 20 :274–275, 2004.

[3]

Richard Bellman. The theory of dynamic programming. Bulletin of the American Mathematical Society, 60 :503–516, 1954.

[4]

H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal of Applied Mathematics, 59(48) :1073–1082, 1988.

[5]

C.B. Congdon. Gaphyl : An evolutionary algorithms approach for the study of natural evolution. Proceedings of the 6th Joint Conference on Information Science, 2002.

[6]

C.B. Congdon and K.J. Septor. Phylogenetic trees using evolutionary search : Initial progress in extending gaphyl to work with genetic data. Proceedings of the 2003 Congress on Evolutionary Computation, pages 320–326, 2003.

[7]

R. C. Edgar. Muscle : multiple sequence alignment with high accuracy and high throughput. Nucl. Acids. Res., 32(5) :1792–1797, 2004.

109 / 114


Bibliography II [8]

O. Gascuel. Bionj : an improved version of the nj algorithm based on a simple model of sequence data. Mol Biol Evol, 14(7) :685–695, 1997.

[9]

D.S. Gladstein. Efficient character optimization. Cladistics, 13 :21–26, 1997.

¨ [10] A. Goeffon, J-M. Richer, and J.K. Hao. A distance-based information preservation tree crossover for the maximum parsimony problem. Lecture Notes in Computer Science, 4193 :761–770, 2006. [11] A. Goeffon, J-M. Richer, and J.K. Hao. Progressive tree neighborhood applied to the maximum parsimony problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(1) :136–145, 2008. [12] P.A. Goloboff. Character optimization and calculation of tree lengths. Cladistics, 9 :433–436, 1993. [13] P.A. Goloboff. Analyzing large data sets in reasonable times : solutions for composite optima. Cladistics, 15 :415–428, 1999. [14] O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, Vol. 162 :705–708, 1982.

110 / 114


Bibliography III [15] M.D. Hendy and D. Penny. Branch and bound algorithms to determine minimal evolutionary trees. Mathematical Biosciences, 59 :277–290, 1982. ¨ [16] Tobias Hill, Andor Lundgren, Robert Fredriksson, and Helgi B. Schioth. Genetic algorithm for large-scale maximum parsimony phylogenetic analysis of proteins. Biochimica et Biophysica Acta (BBA) - General Subjects, 1725(1) :19 – 29, 2005. [17] I. Horovitz. A report on one day symposium on numerical cladistics. Cladistics, 15 :177–182, 1999. [18] A. Moilanen. Searching for most parsimonious trees with simulated evolutionary optimization. Cladistics, 15(3) :39–50, 1998. [19] P. Moscato. Chapter Memetic Algorithms : A Short Introduction in New Ideas in Optimization. McGraw-Hill, 1999. [20] L. Nakhleh, U. Roshan, K. St John, J. Sun, and T. Warnow. Designing fast converging phylogenetic methods. Bioinformatics Supplement, 17 :190–198, 2001. [21] Wunsch C.D. Needleman S.B. A general method applicable to the search for similarities in the amino acid sequence of two proteins. JMB, 3(48) :443–453, 1970.

111 / 114


Bibliography IV [22] K.C. Nixon. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics, 15 :407–414, 1999. [23] C. Notredame and D. Higgins. Saga : sequence alignment by genetic algorithm. Nucleic Acids Research, 8(24) :1515–1524, 1996. [24] C. Notredame, L. Holm, and D. G. Higgins. COFFEE : an objective function for multiple sequence alignments. Bioinformatics, 14(5) :407–422, 1998. [25] C. C. Ribeiro and D. S. Vianna. A grasp/vnd heuristic for the phylogeny problem using a new neighborhood structure. International Transactions in Operational Research, 12 :1–14, 2005. [26] C.C. Ribeiro and D.S. Vianna. A genetic algorithm for the phylogeny problem using an optimized crossover strategy based on path-relinking. Proceedings of 2nd Bresil Workshop on Bioinformatics, pages 97–102, 2003. [27] J-M. Richer. Three new techniques to improve phylogenetic reconstruction with maximum parsimony. Technical report, LERIA, 2008. [28] F. Ronquist. Fast fitch-parsimony algorithms for large data sets. Cladistics, 14 :387–400, 2000.

112 / 114


Bibliography V [29] U. Roshan, B.M.E. Moret, T.L. Williams, and T. Warnow. Rec-i-dcm3 : A fast algorithmic technique for reconstructing large phylogenetic trees. Proceedins of IEEE Computational Systems Bioinformatics Conference (CSB 04), pages 98–109, 2004. [30] T. F. Smith and M. S. Waterman. Identification of common molecular sequences. JMB, 147 :195–197, 1981. [31] D.L. Swofford and G.J. Olsen. Molecular Systematics. D.M. Hillis and C. Moritz, 1990. [32] Rene´ Thomsen, Gary B. Fogel, and Thiemo Krink. Improvement of clustal-derived sequence alignments with evolutionary algorithms. In IEEE Congress on Evolutionary Computation (1)’03, pages 312–319, 2003. [33] L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1 :1 :337–348, 1994. [34] M.S. Waterman and T.F. Smith. On the similarity of dendograms. Journal of Theoretical Biology, 73 :789–800, 1978. [35] M. Yan and D.A. Bader. Fast character optimization in parsimony phylogeny reconstruction. Technical Report TR-CS-2003-53, University of New Mexico, Albuquerque, NM, USA, 2003.

113 / 114


Bibliography VI

[36] L. Yu-Min, F. Shu-Cherng, and T. L. Jeffrey. A tabu search algorithm for maximum parsimony phylogeny inference. European Journal of Operational Research, 176(3) :1908–1917, February 2007.

114 / 114

Metaheuristics applied to Bioinformatics problems

Metaheuristics applied to Bioinformatics problems

Suggest Documents

Metaprogramming Applied to Numerical Problems

Comparison between Three Metaheuristics Applied to Robust Power ...

A Comparison between Two Metaheuristics Applied to the Cell ...

Applied Bioinformatics in Lotus japonicus

Journal of Applied Bioinformatics & Computational

Regular Algebra Applied to Path-finding Problems

Systems techniques applied to river basin problems

Nature Inspired Metaheuristics for Optimizing Problems at a Container ...

Applications of metaheuristics in real-life problems - Springer Link

New metaheuristics for solving MOCO problems ... - Google Sites

New metaheuristics for solving MOCO problems - Google Sites

How to solve relevant problems in bioinformatics using agents

Metaheuristics versus spectral and multilevel methods applied on an ...

An Introduction to Applied Bioinformatics: a free, open

Introduction to Bioinformatics Introduction to Bioinformatics - Helsinki.fi

ATria: a novel centrality algorithm applied to ... - BMC Bioinformatics

A web-based bioinformatics interface applied to the ... - SciELO

New trends in biomedical engineering and bioinformatics applied to ...

Association Analysis Techniques for Bioinformatics Problems

SOLVING APPLIED MATHEMATICAL PROBLEMS ...

FUNDAMENTAL AND APPLIED SCIENCES PROBLEMS

Population Metaheuristics to solve the Professional Staff

Combining Metaheuristics and Exact Methods to ...

Some Analytical Methods Applied to Lake Water Quality Problems