Host Code: C++. The string comparison process is made in parallel. Device Code: CUDA for C. Raul Torres. Parallel Comput
Methodology Results Conclusions More information
Parallel Computing System for ecient computation of Molecular Similarity based on Negative Electrostatic Potential: First results Raul Torres1 1 Grupo de Química Teórica - Universidad Nacional de Colombia
Research Seminar, 2009
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
TARIS Method
Figure: A. Isopotential Surface Size B. Isopotencial value
Raul Torres
Each node saves:
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Data set
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Classication
The similarity matrix obtained with the GPU computing process (CUDA) was analyzed by means of hierarchical clustering using the average linkage method(R Statistical Package).
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
General Representation of molecules Every node is represented by [] characters When a node has children, each child is established inside the [] of the parent: [ [][] ] If a weight is associated to a non-leaf node, this value is written after the rst [: [45,889[78,76[][]][987,5[][]] The leaf nodes have no weight associated We propose a canonical representation The sub-trees with more nodes are translated rst Next the sub-trees with more levels are listed rst Next the sub-trees with greater weight are listed rst
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Proposed Kernel Simple kernel:
k (x , y ) = A more complex kernel:
∑ nums (x )nums (y )
s ∈B
∑s ∈B nums (x )nums (y )ws k (x , y ) ws is 1 if wy and wx are 0 (wy and wx are the respective weights of x and y trees) ws = wwyx if wx ≤ wy Otherwise, ws = wwyx k w (x , y ) =
B is the set of balanced sub-string [...] Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Proposed Kernel
The weights can be calculated in 9 dierent ways The process is reduced to nd the number of balanced sub-strings founded in both molecules;in other words, a sub-tree. The leaf nodes are not counted as sub-trees The whole string is considered a sub-string
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Proposed Kernel
Tree representation: The red and green circles denote sub-trees that appears in both trees. In this case, the simple kernel is 2 Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Proposed Kernel
String representation: The gray area is the same green circle in the previous gure. The red square is related to the red circle too.
Raul Torres
Parallel Computing System for ecient computation of Mo
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Parallel programming considerations
The general process is executed over the CPU Host Code: C++
The string comparison process is made in parallel Device Code: CUDA for C
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experimental congurations Variables (to construct the weight): Isopotential surface size: ISS Isosurface value: IV Factorial design (32 experiments), three states (N)Don't use (A)Accumulated: the summation of all the values of each node gives the weight for the sub-tree (S)Simple: the value of the root node of the sub-tree gives the weight for the sub-tree
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experimental congurations
W0: There are no weights. Only the structure is important. (N)ISS x (N) IV W1: Accumulated isopotential surface size. (A)ISS x (N) IV W2: Accumulated isopotential surface size times Accumulated isosurface value. (A)ISS x (A) IV W3: Simple isopotential surface size. (S)ISS x (N) IV W4: Accumulated isosurface value. (N)ISS x (A) IV
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experimental congurations
W5: Simple isosurface value. (N)ISS x (S) IV W6: Accumulated isopotential surface size times Simple isosurface value. (A)ISS x (S) IV W7: Simple isopotential surface size times Simple isosurface value. (S)ISS x (S) IV W8: Simple isopotential surface size times Accumulated isosurface value. (S)ISS x (A) IV
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experiment W3
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experiment W8
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experiment W7
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experiment W2
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Experiment W0
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Execution time
In general, the execution time is approximately 2 seconds.
Raul Torres
Parallel Computing System for ecient computation of Mo
Methodology Results Conclusions More information
Conclusions We don´t need to accumulate the sizes of the children In general terms, the kernel method used achieves a good classication The pre-assumptions that a string kernel can be applied to tree-like structured data was veried The next eorts of this research will be focused in the application of a kernel that uses the co-rooted tree and a more robust representation of strings named sux tree. The use of CUDA as a programming environment has allow us to perform a several concurrent operations in a fast way than in the serial paradigm Without the need of a cluster of computers, GPU Computing oers a tremendous computational power at low cost. Raul Torres
Parallel Computing System for ecient computation of Mo
Appendix
For Further Reading
For Further Reading I
Raul Torres
Parallel Computing System for ecient computation of Mo