Computing With Words With the Ontological Self ... - Semantic Scholar

2 downloads 29 Views 1MB Size Report
puter Engineering, University of Missouri, Columbia, MO 65211 USA (e-mail: ... this proposition, among others that can be developed from the. GO annotations of ...
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

473

Computing With Words With the Ontological Self-Organizing Map Timothy C. Havens, Student Member, IEEE, James M. Keller, Fellow, IEEE, and Mihail Popescu, Senior Member, IEEE

Abstract—This paper addresses the computing-with-words paradigm by presenting an ontological self-organizing map (OSOM), which produces visualization and summarization information about datasets composed of words, namely, ontological data. The specific data that are used in this paper are the Gene Ontology (GO) annotations of genes and gene products. The OSOM is an extension of the SOM, which was initially developed by Kohonen. We adapt the SOM by integrating ontology-based similarity measures and relational-clustering distance measures. We also develop a novel prototype update. We present results on two datasets composed of GO annotations of genes and gene products. An OSOM-based summarization, which produces the term-based summarizations of the trained OSOM network, is also demonstrated. The results show that the OSOM-based visualization method correctly shows the cluster tendency of the genes and gene products and that the summarization provides useful information about the mapped groups of genes and gene products. Index Terms—Bioinformatics, computing with words (CW), fuzzy logic, ontologies, self-organizing maps (SOMs).

I. INTRODUCTION HE BIRTH of the linguistic variable by Zadeh [1] was the start of the, at first widely panned, concept of computing with words (CWs). Since 1973, fuzzy logic, which is the underpinning of CW, has become widely accepted, and as a result, CW is becoming more of a reality than a theory. As of 2009, there were more than 100 articles on “computing with words” on IEEE Xplore alone. CW is considered by many to be the precisiation of natural language coupled with approximate reasoning in order to translate antecedents, which are in the form of words, to consequents, which are also in the form of words. Zadeh presented CW as the bridge between fuzzy sets and the computational theory of perceptions (CTPs) [2], where CTP is the expression of “propositions in a natural language.” Examples of perceptions included in [2] are as follows:

T

Manuscript received May 12, 2009; revised October 29, 2009 and January 14, 2010; accepted March 26, 2010. Date of publication April 12, 2010; date of current version May 25, 2010. T. C. Havens and J. M. Keller are with the Department of Electrical and Computer Engineering, University of Missouri, Columbia, MO 65211 USA (e-mail: [email protected]; [email protected]). M. Popescu is with the Department of Health Management and Informatics, University of Missouri, Columbia, MO 65211 USA (e-mail: popescum@ missouri.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TFUZZ.2010.2048113

1) Robert is highly intelligent. 2) Hans loves wine. 3) Overeating causes obesity. The traditional CW system takes inputs in the form of naturallanguage propositions and translates these into outputs that are also natural-language propositions. For example, consider the propositions Steph likes to exercise, and exercise is healthy. If we consider the query Is Steph healthy?, the answer of an approximate reasoning system might be Steph is likely healthy. There have been other interpretations of the mechanisms of CW, most notably the concept of the perceptual computer [3]–[5]. The key components are that inputs are words, computation is performed by soft-computing methodologies that aggregate the word interpretations, and then, words are produced as outputs. In this paper, we take a different view toward applying CW principles by using ontologies: Collections of words organized in hierarchical taxonomies. Much like Zadeh’s CW, our system has words as inputs and produces a linguistic output. However, unlike the standard definition of CW, the uncertainty is not in the language itself but in the relationships between the words and, more importantly, how those words are used to describe objects. Words that are connected in the taxonomy are related by phrases such as “is a,” “part of,” and “adjacent to.” Hence, the ontology itself is composed of a constrained (albeit crisp) collection of linguistic perceptions. For example, in the Gene Ontology (GO), collagen (GO:0005581) is a part of proteinaceous extracellular matrix (GO:0005578), which is an extracellular matrix part (GO:0044420) [6]. Furthermore, objects, such as genes and gene products, are annotated by the terms from the ontology. Thus, propositions can be developed. For example, the human Collagen alpha-I chain (COL1A1) gene product is annotated by the GO term plasma membrane (GO:0005886), which indicates that this collagen’s function is related to “the membrane surrounding a cell that separates the cell from its external environment” [6]. However, this proposition, among others that can be developed from the GO annotations of COL1A1, is not crisp. This annotation could indicate that this collagen is located in the plasma membrane, or interacts with the plasma membrane, or has been shown to cause changes in the plasma membrane, etc. Thus, soft-computing tools are powerful to relate two gene products to one another based on their GO annotations because, although two gene products may be annotated by the same term, the reason they were annotated could be vastly different [7]. Some annotations are also more reliable—say, based on direct biological experiments— than others that might be inferred from other genes with similar sequences. We believe that this is a completely valid

1063-6706/$26.00 © 2010 IEEE Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

474

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

TABLE I CHARACTERISTICS OF THE GP D194 1 2 . 1 0 . 0 3 DATASET EXTRACTED FROM ENSEMBL [14]

Fig. 1.

OSOM training block diagram.

instantiation of CW. Using the terminology from [2], the initial dataset is a collection of propositions, which take the form of the GO annotations of a group of gene products. This initial dataset is transformed into a terminal dataset, which is a set of soft relations between gene products. Grouping and summarization of these relations is specifically our aim in this paper. The algorithm proposed in this paper is designed to work with ontological data. These are typically composed of hundreds of dimensions (i.e., individual words or phrases) and can also be very large—on the order of 10 000 samples. One way that researchers have dealt with conventional high-dimensional datasets is by employing self-organizing maps (SOMs), which are initially proposed by Kohonen [8]. The SOM allows these types of data to be effectively visualized in two or three dimensions by combining the goals of both projection and clustering algorithms [9]. We apply a novel extension to the SOM that allows us to use the SOM with ontological data. Ontological data are unique in that the data samples are composed of collections of terms or words taken from a predefined corpus. Unlike conventional object data, the samples do not have a numerical location. Examples of ontological data include web sites, medical-record annotations, and publications. In this paper, we apply our ontological SOM (OSOM) to produce cluster visualization and functional summarization of annotated gene products in the GO. The relational data of the gene products are produced by GO similarity measures, as described in [10]; however, any quantitative similarity measure could be used, such as those described in [11] and [12]. Fig. 1 shows the block diagram of the OSOM training algorithm. The inputs are the ontological data and the pairwise term-similarity matrix. The OSOM itself operates very much like a conventional SOM: 1) A random test signal is chosen; 2) the winning prototype is selected; and 3) all prototypes are moved toward the test signal according to a predefined network topology. Section III describes the training procedure in more detail as well as proposes a batch version of the OSOM, which is based on the batch SOM [13].

Section III-C describes how we utilize the OSOM to produce cluster visualization of the ontological data. The visualization method maps the ontological profiles (i.e., the OSOM prototypes) of the OSOM network to a 2-D toroidal grid (although any predefined network topology could be chosen). Cluster tendency is shown by the relations between neighboring ontological prototypes on the grid, which are displayed as gray levels—black represents no relation and white represents highly related. Summarization of each ontological prototype (e.g., gene or gene-product cluster) is achieved as a direct result of our formulation of the OSOM. The OSOM prototypes are represented by a vector of weights, where each element of the weight vector is associated with a term from the corpus. The value of these weights are the memberships of the associated terms in the description of the ontological prototype. Thus, the summarization of each prototype is the term(s) with the largest corresponding weightvector element(s). If the ontological data are genes represented by their GO annotations, then this summarization describes the shared function of the groups of genes. Section III-D describes our summarization method in more detail. Throughout the algorithm description in Section III, we present illustrative results computed on a set of 194 sequences of human gene products. These gene products were retrieved on December 10, 2003, using the ENSEMBL browser [14]. Table I shows the attributes of the families present in the dataset according to Markov clustering [15]. We have used this geneproduct dataset in past publications, and for comparative purposes, we use it to illustrate the results of our current method. We call this set GP D19412.10.03 . Section IV presents more detailed results on the GP D194 data, as well other datasets. We also compare the OSOM with the self-organizing semantic map (SOSM) [16] and a batch-relational version of the SOM [17]. A. Related Work Since Kohonen developed the SOM [8], it has been adapted to many types of data, including document data. Ritter and Kohonen developed the SOSM [16], which represents the semantic features of objects by a binary-valued vector. However, the SOSM does not incorporate the similarities between the semantic features. Hence, correlated semantic dimensions are considered to be independent. The WEBSOM algorithm [18] addresses this weakness of the SOSM by first computing a word-category map from a given corpus and then using this to create a document map. The similarities of words or terms are encoded in the word-category map by imputing similarity from their relative placement or context in documents—words that are often used together are more similar. Next, the WEBSOM encodes documents by mapping them word-by-word onto the

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

HAVENS et al.: COMPUTING WITH WORDS WITH THE ONTOLOGICAL SELF-ORGANIZING MAP

word-category map. The documents’ histograms of “hits” on the word-category map are the input to a second SOM, where document-to-document similarity can be visualized. Although we can imagine that WEBSOM could be modified for use with directed acyclic graph (DAG)-based ontological data, we do not focus on that in this paper. Most recently, Hasenfuss and Hammer [17] developed a relational variant of the SOM by extending the relational duals of c-means clustering algorithms [19] to topographic maps. Hence, one could compute the SOM of ontological data by using the relational SOM (RSOM) on the dissimilarity matrix of the objects: genes, documents, etc. The drawback of using this method is that the prototypes do not directly encode the terms in ontological data; hence, linguistic summarization is not straightforward— additional postprocessing is necessary to produce summarizations that are akin to the output of the OSOM. Additionally, the relational SOM is not generalized to the soft-similarity measures that are presented in Section II, which further reduces the ability to generalize the relational SOM to ontological data, specifically the GO. Our method is similar in spirit to the relational SOM and there are noticeable parallels between the methods, but we are able to directly encode the ontological data in the OSOM. Brameier and Wiuf [20] proposed a clustering and visualization method using both gene-expression data and GO terms. However, Brameier and Wiuf mapped the gene-product annotations to a reduced vocabulary of generalized GO terms. We establish a similarity-measure-based method to utilize the full set of ontological data, thereby preserving the specificity of the terms in the training data. Also, we utilize only the ontological information (e.g., the GO or MeSH terms) to produce the cluster and visualization information. Hence, our method allows for knowledge discovery within the term-based database itself. There are many freely available tools to visualize the relationships of or cluster the GO representations of genes and gene products (see http://geneontology.org/GO.tools.shtml). GOToolBox [21] uses a crisp-clustering algorithm, which is based on the Czekanowski–Dice distance, to cluster genes based on their annotations. However, as shown in [11], set-based distance measures are unable to show the relations between genes that do not share terms (this is a common deficiency in many GO analysis tools, including Cluster Assignment For biological Inference (CLASSFI) [22]). Other tools, such as CLuster ENriCHment (CLENCH) [23] and ClueGO [24], produce cluster summarization based on the GO terms for a set of genes but do not find the clusters. Still other tools [25] combine GO annotations with other sources of information (such as sequences and microarrays) to analyze groups of genes, but these tools are unable to be used only on the ontological data themselves. Gene Semantic Similarity Analysis and Measurement Tools (G-SESAME) [26] presents a semantic similarity measure and validates this measure by using single-linkage clustering to produces hierarchical clusters of genes based on their GO terms. However, the spirit of this method is in the similarity measure and not the clustering itself. Next, we describe the similarity measures that we use to determine the proximity of GO terms and, hence, genes represented by GO terms.

475

II. GENE-ONTOLOGY SIMILARITY MEASURES Information about gene products and how they are similar to one another is of great importance in bioinformatics. Traditional approaches use the DNA sequence as well as the expression values from microarray experiments. However, additional information, which is more symbolic in nature, is available about genes and gene products. This symbolic information comprise the GO terms [6] and index terms in publications about gene products [27]. We use these symbolic data to build visualizations and functional summarizations of the genes or gene products. Previously, we developed methods to compute the similarity of two gene products that are annotated by GO terms. These similarity measures are described in detail in [10] and [28]. To summarize, each gene product Gi is represented by a collection of terms Gi = {Ti1 , . . . , Tin }, where Ti1 is the first annotation of Gi (e.g., GO:0016740—transferase activity). Based on these sets of terms, a similarity between two gene products can be found by performing an aggregation on the pairwise similarities among each set of terms. For example, the average is computed as n m Rk l (1) s(Gi , Gj ) = k =1 l=1 mn where Gi is annotated by n terms, Gj is annotated by m terms, and Rk l is the pairwise similarity between the kth term in Gi and lth term in Gj . The pairwise term similarity sk l (or dissimilarity) can be computed in many ways; shortest-path-based similarity and information-theoretic constructs are the most widely used [29], [30]. The similarity measure in (1) can be the case of a matrix– vector multiplication by representing each gene product Gi as a binary vector gi . First, the union set of terms from a set of geneproduct annotations is produced. Let us assume that there are NT terms in this union set. Thus, gi ∈ {0, 1}N T , where gik = 1 indicates that the gene product is annotated by the kth term in the set of unique terms, and gik = 0 indicates otherwise (where now 0 ≤ k ≤ NT ). The pairwise similarity matrix between each of the NT terms is denoted by R, where Rk l is the similarity between the kth term and the lth term. One can now see that the similarity measure in (1) is given by s(Gi , Gj ) =

giT Rgj . gi 1 gj 1

(2)

We also use a distance measure in this paper, the generalized outer product (GOP), which has not previously been used for GO-based gene similarity. We discovered this distance measure in our work on generalizing cluster validity indexes to relational data [31], [32]. Essentially, the GOP distance is the well-known A-norm (x − y A ), which is generalized to relational data. Hasenfuss and Hammer [17] used this distance in their relational SOM. First, let us assume that gi is the binary-vector representation of Gi and Dk l = 1 − Rk l is the normalized distance between the kth and lth terms. The GOP distance is given by gj − 0.5˜ giT D˜ gi − 0.5˜ gjT D˜ gj d(GOP) (Gi , Gj ) = g˜iT D˜ where g˜ = g /g 1 , and D = 1 − R.

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

(3)

476

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

A detailed discussion on the derivation of this distance measure in the context of relational clustering and medoids is given in [32]. We will show in Section IV that the OSOM and the standard SOM are related when the d(GOP) distance measure is used. We combine all the terms from a set of gene products into one large set and compute a pairwise GO-term-similarity matrix R using the Lin information-theoretic similarity measure [29], which is calculated as 2 log pNCA (Ti , Tj ) log p(Ti ) + log p(Tj )

sLin (Ti , Tj ) =

(4)

where p(T ) is the probability of term T and all its children in the GO and is given by p(T ) =

count (T + children(T )) count(all terms in GO)

and pNCA (Ti , Tj ) is the probability of the nearest-commonancestor (which is also called the lowest-common-parent) of terms Ti and Tj . The Lin similarity measure is effective at computing the similarity of two GO terms because it considers both the relative closeness of two terms as well as the depth (specificity) of the terms in the hierarchy. The GP D194 dataset [10] contains 64 total unique GO terms; thus, the Lin-based similarity matrix R is 64 × 64. The precomputed similarity matrix allows us to quickly compute similarities by casting many of the operations in the OSOM as matrix–vector multiplications. III. ONTOLOGICAL SELF-ORGANIZING MAP The SOM is a two-layer lateral-feedback neural network that topologically maps itself to the training data. The network structure is often set to a 2-D square, toroidal, or hexagonal grid, where each network node, or prototype, is laterally connected to its neighbors. The network-learning algorithm is as follows. 1) Randomly draw a sample from the training data xd . 2) Find the closest SOM prototype p according to a chosen distance metric (old)

p = arg min{xd − ai i

}.

(5)

3) Update SOM prototypes by (new )

ai

(old)

= ai

(old)

+ (t) · hip · (xd − ai

)

(6)

where (t) is the learning rate, and hip is the neighborhood function, which is defined as   |ai − ap |2 hip (t) = exp − (7) σ 2 (t) where ai is the location of the SOM prototype in the predefined neighborhood (e.g., square or hexagonal grid). This algorithm is repeated until a maximum number of iterations or convergence is reached. Typically, the learning rate (t) and the width of the neighborhood function σ 2 (t) are reduced during iteration, with the effect that late iterations are only applying small updates to network prototypes that are local to the winning prototype p.

A. Prototype Representation The algorithm that we propose as the OSOM is an adaption of the standard SOM to ontological data. First, we construct an ontological weight vector for each node in the OSOM grid. This weight vector is a fuzzy-membership representation of all the terms present in the training data. For example, the GP D194 dataset contains a total of 64 terms among all the gene products combined; thus, the OSOM weight vector has a length of 64. Each weight-vector element is associated with one term and the value of the weight is the membership of the associated term in the description of the ontological prototype. We denote the OSOM weight vectors as w  i ∈ [0, 1]N T . Second, we replace the distance metric in step 2 of the SOM with a similarity measure. The measures that we use are vectormatrix-multiplication-based operations that are simple extensions of the measures described in Section II, [10], and [32]. What makes these similarity measures different is that they are the similarity between an OSOM protoype and a gene or gene product; the similarity measures in Section II were for two genes or gene products. In practice, one could choose any similarity measure that measures the similarity of two sets of terms; there are many measures that fit this description. However, we recommend using similarity measures that perform some aggregation on the pairwise similarity matrix R. Set-based similarity measures such as the Cosine and Jaccard index are ill-suited to this problem (for information on this topic, see [10]) as these measures basically assume that R is diagonal. We adapt gene–gene similarity measures, such as the average in (2), for use with the OSOM as follows. 1) GOP:  i , gj ) = 1 − w ˜iT D˜ gj + 0.5w ˜iT Dw ˜i + 0.5˜ gjT D˜ g s(GOP) (w (8) where w i w ˜i = |w i| g˜j =

gj |gj |

and D = 1 − R. 2) Average (AVG): s(AVG) (w  i , gj ) =

w  iT Rgj |w  i ||gj |

(9)

3) Ordered weighted average (OWA) [33]: l = (Rw  i ) · ∗gj where ·∗ represents element-by-element multiplication. The vector l is then sorted in descending order, l(1) > l(2) > . . . > l(N T ) , and the OWA similarity is computed by  i , gj ) = s(OWA) (w

NT 1  bk l(k ) NT2

(10)

k =1

where NT is the number of terms, and bk is the weight of the kth term in the OWA. In this paper, we use bk = 1,

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

HAVENS et al.: COMPUTING WITH WORDS WITH THE ONTOLOGICAL SELF-ORGANIZING MAP

477

k ≤ 4, and bk = 0, k > 4. However, one could certainly choose a different set of bk ’s, e.g., b1 = 1 and bk = 0, k > 1 is the maximum single-term similarity between w i and the gene-product-term vector gj . s(AVG) and s(OWA) are adaptations of the soft-similarity measures in [10], which were developed specifically for ontologies. s(GOP) is adapted from relational clustering and the distance measure d(GOP) . An important point is that with the s(GOP) similarity measure, the OSOM can be shown to be equivalent to Kohonen’s SOM (see the first example in Section II). However, unlike the SOM, the OSOM can also be used with ontological data. B. Prototype Update The OSOM can use the standard weight-vector update [see (6)] of the SOM by substituting gd for xd (new ) w i

=

(old) w i

+ (t) · hip · (gd −

(old) w i ).

(11)

Let us recall that g is binary; hence, this update simply moves the prototype toward the corresponding corner of the NT dimensional hypercube. This update, however, ignores the term– term similarities. We can also replace the standard form of the weight-vector update equation with a similarity-based update. In order to create a similarity-based update equation, we defined the following two axioms. 1) At each iteration, the weight-vector elements that correspond to the terms in the test signal gd must increase, as in (11). 2) At each iteration, the weight-vector elements that are similar to the terms in gd , as evidenced by R, must also increase. With these axioms in mind, we created the following update equation:  (new ) (old) (old)  ∀i =w i + (t) · hip (t) · F (R, gd ) − w i w i (12) where p denotes the closest OSOM prototype to the randomly (old) chosen training vector gd , and (F (R, gd ) − w i ) is the update operator. As shown below, the update operator is computed from the columns of the similarity matrix that correspond to nonzero elements of the training vector gd . These columns of the similarity matrix represent the similarity between the terms in gd and all other terms (e.g., Rij is the similarity of the ith and jth terms). Hence, the update operator, i.e., F (R, gd ), computes a row aggregation on the columns of the similarity matrix R that correspond to the terms in the training vector gd . The operator F can be modeled after any aggregation operator [34], e.g., one can define F as one of the following: 1) AVG: Rgd . |gd |

F (AVG) (R, gd ) =

(13)

2) Maximum (MAX): (M AX)

Fk

(R, gd ) = max{Rk i } i

(14)

where i = {l ∈ N|l ≤ NT ; (gd )l = 1}, k = 1, . . . , NT , and Rk i is the ith column of the kth row of the similarity matrix R. The operator chosen for F determines the convergence behavior of the values of the prototype weight vectors {w  i }. For example, F (AVG) causes the weight vectors to have maximum values around 0.5, as the operator averages the similarity values for all terms in the training vectors. Contrastively, F (M AX) causes the maximum weight-vector values to tend to a value of 1, as there are exactly |gd | terms equal to 1 in the matrix–vector multiplication Rgd (each term in gd has similarity of 1 to itself). Simply put, F (M AX) pushes the OSOM prototypes toward the terms present in gd and, additionally, pushes the prototypes toward all terms represented in R that are similar to any one of the terms in gd . Both (13) and (14) are one form of the generalized mean, where F (AVG) is M1 and F (M AX) is M∞ . The ith element of Mp is 1/p NT  1 Mp (R, gd )i =  (Rij (gd )j )p  . |gd | j =1 

Algorithm 1 outlines the standard OSOM algorithm. The parameters, such as the learning rates and maximum iterations, are set according to the problem (viz., just like the original SOM, use what works for you). For this paper, we use a toroidal grid-based network as this grid topology does not experience the edge effects that a standard square grid does. The learning rates are set to {0 = 0.5, f = 0.005}, the widths of the lateral influence function in (7) are {σ0 = Nnet + 1, σf = 0.1}, and the maximum number of iterations is tm ax = 2000. The width of the network Nnet is adjusted depending on the size of the dataset; in this case, the number of genes or gene products. The illustrative results in this section use a 8 × 8 toroidal network topology (i.e., 64 prototypes) to map the 194 gene products in GP D194. We chose a toroidal grid because the neighborhood of each prototype is consistent, thus avoiding the well-known boundary effects seen in square-grid network topologies. Algorithm 2 outlines the batch version of the OSOM. The strength of the batch SOM, in general, is that it is proven to converge in a finite number of steps [35]. In Section IV, we

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

478

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

Fig. 2. Colormap used in visualizations shows relative similarity between network prototypes.

will compare the different forms of the OSOM and describe the resulting trained networks. C. Cluster Visualization The visualization method that we propose is composed of two distinct steps. First, the objects (e.g., gene products, articles, etc.) are mapped to the trained OSOM network by the nearest prototype rule—for each object g , find the best-match prototype  i , g ). The prototype p is then annotated with p = arg maxi S(w with the object information of g (e.g., the gene product id, the GO annotations). This groups similar objects (i.e., gene products) into cluster-like arrangements, where each OSOM prototype essentially represents a cluster (sometimes an empty cluster). Second, the similarity between neighboring OSOM prototype nodes is mapped into a gray-scale or color image [9]—for this paper, red indicates very similar, blue indicates very dissimilar. The colormap used in this paper is shown in Fig. 2. Fig. 9(a) illustrates this mapping for GP D194 using the similarity s(GOP) [see (8)] and the batch OSOM update. The red regions correspond to groups of similar gene products, while the blue regions show the boundaries between dissimilar regions. Please note that because of the toroidal network topology, the top and bottom, as well as the sides, of the images in Fig. 9 wrap around. Also, because GP D194 contains multiple (but different) sequences of each gene product, the gene-product labels (e.g., FGFR1) can appear in more than one place on the OSOM map. We compute the similarity between nodes with a GOP operator, as in [17], as follows: j ) = 1 − w ˜iT Dw ˜j + 0.5w ˜iT Dw ˜i + 0.5w ˜jT Dw ˜j s(w i, w (15) w ˜=

w  |w| 

and D = 1 − R. The reason that we use a square root in this calculation is so that the colormap appears more linear with distance (recall that the GOP distance is equivalent to squared Euclidean distance), which we have found is more effective. One could include the square root in the s(GOP) calculation if desired; however, because s(GOP) is only used to find the closest prototype to the random test signal, the square root is unnecessary. This similarity is calculated between each connected node of the OSOM network. Thus, for the toroidal grid, each prototype node has four surrounding pixels that correspond to its relation to its neighboring nodes. The colormap is set such that i, w  j )], and dark blue correred corresponds to max∀i,∀j [s(w i, w  j )] for a given network. The color sponds to min∀i,∀j [s(w grid is then upsampled by cubic interpolation to achieve a visually pleasing map. As a result of this coloring scheme, regions that are red represent groups of similar objects, while blue and cyan regions signify boundaries or objects that are dissimilar to the surrounding groups. In addition, the degree of similarity can be inferred from the color intensity of the regions. For example, in Fig. 9(a), the red islands indicate groups of similar gene products— e.g., the red region centered at (4,6) is a group of collagens— while the surrounding blue–cyan regions signify boundaries. Additionally, the dark-blue region at the map location (1,4) (which is labeled FGFR1) denotes a dissimilar gene product. The three GP D194 families can be seen in Fig. 9(a) as spatially grouped gene products. The collagen alpha chains (COL) are located in the lower part of the image, with one group mapped to the top right. Recall that the grid is toroidal; hence, these regions are connected. The myotubularins (MTMRs) are located at the upper center. Finally, the receptor precursors—i.e., fibroblast growth factor receptor (FGFR), tyrosine kinase endothelial (TEK), tie-like receptor tyrosine kinase (TIE), ret oncogene (RET)—are scattered throughout the red islands on the upper left and upper right, with a group also in the lower right. Note that the MTMR island is connected to the FGFR islands. This shows that these gene products are related to a degree. These relations are corroborated by our other work with these gene products [36]. D. Cluster Summarization Cluster summarization, i.e., the potential output of a CW engine, of the ontological prototypes is achieved by examining the OSOM prototype weight vectors. For the case of genes or gene products, this summarization is a functional summarization of each group. The ontological content of each OSOM prototype is represented by a weight vector, as discussed in Section III. Each element of the weight vector can be viewed as the relative influence of a specific annotation in defining the profile of its associated OSOM prototype. Thus, high values in a weight vector signify a high likelihood that the objects mapped to a location are annotated by the associated term(s). We define the most-representative term (MRT) of an ontological prototype as the term that has the highest associated weight in the OSOM prototype weight vector. If there is more than one maximum weight-vector element, then the MRT is defined as the

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

HAVENS et al.: COMPUTING WITH WORDS WITH THE ONTOLOGICAL SELF-ORGANIZING MAP

Fig. 3. Prototype weight vectors for map locations (a) (2,8) and (b) (1,6) in Fig. 9(a).

term with the highest information content [36]. This provides a simple linguistic output for a potentially complex organization of the ontological description of groups of genes. In general, the OSOM weight vectors represent the set of linguistic descriptions of the mapped objects because each element of the weight vector is the strength of an ontology term. The relative value of each vector element is the degree to which an ontology term or word describes the objects mapped to a specific location. For example, Fig. 3 illustrates the prototype weight values for the map locations (2,8) and (1,6) in Fig. 9(a). For the batch OSOM, the prototypes are normalized; hence, only the relative values within each prototype is important. The OSOM weight vectors can be used to construct a linguistic proposition about each map location and the gene products that are mapped to that location. Take, for example, the network prototype weight vectors plotted in Fig. 3. The gene products mapped to prototype (2,8) are the collagens indexed 168–184. The gene products mapped to prototype (1,6) are the receptor precursors that are indexed 76, 93, and 96. Fig. 4(a) shows that the weight vector of prototype (2,8) has six nonzero elements. These nonzero elements are the membership of the respective terms that define that prototype; thus, they can be used as an input to a fuzzy-rule system, which computes a linguistic summarization. Fig. 4 shows the rule base for the example we show here. Note that the ordinate of the output is a linguistic proposition in the form of a hedge. We also add the ontology from which each summarizing term comes (i.e., molecular function, biological process, or cellular component) to the linguistic summarization, shown in bold in this example. The output linguistic proposition for (1,6) is given by: The collagen gene products indexed 168–184 are summarized by the following perceptions: 1) The molecular function is MOSTLY extracellular matrix structural constituent; 2) the cellular component is MOSTLY collagen type IV; 3) the biological process is MOSTLY extracellular matrix organization. For this example, the linguistic propositions are redundant because the weight values are all equal. However, for the gene products

479

Fig. 4. Fuzzy-rule-based system—input is normalized prototypes of trained OSOM network. T i indicates the associated GO term description for prototype element w i .

Fig. 5. Zoomed-in view of lower left portion of Fig. 9(a) that shows mapping of GP D194 gene products and MRT of each location.

mapped to location (2,8), the weight values are not equal. Thus, the linguistic outputs define the relative strength of each term in the proposition. For example, the receptor precursors, which are indexed 76, 93, and 96, are summarized by the following perceptions: 1) The molecular functions are SLIGHTLY protein serine/threonine kinase activity, MOSTLY protein tyrosine kinase activity, SOMEWHAT receptor activity, MOSTLY ATP binding, and SOMEWHAT transferase activity; 2) the biological process is SOMEWHAT protein amino-acid dephosphorylation. Fig. 5 shows the MRTs for a zoomed-in portion of the trained OSOM network shown in Fig. 9(a). Table II contains the MRTs for all prototypes in the trained OSOM network shown in Fig. 9(a). IV. RESULTS A. Two Gaussian Clouds This example illustrates how the OSOM and Kohonen’s SOM are equivalent for the s(GOP) similarity measure and the standard prototype update [see (11)]. The dots in Fig. 6 show the object data T for this example—two 2-D Gaussian-distributed clouds. The object data represent the terms. Table III outlines the properties of each cloud of data. The pairwise term-similarity

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

480

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

TABLE II MRTS OF OSOM NETWORK TRAINED ON GP D194—FULL MAP SHOWN IN FIG. 9(a)

TABLE III TWO GAUSSIAN CLOUDS PROPERTIES

Fig. 7. Comparison of (a) OSOM and (b) SOM trained on two Gaussian clouds data shows identical results.

corresponding weights of the SOM prototypes as follows: Fig. 6.

Two Gaussian clouds.

ai = matrix is

100 

(w ˜i )j Ti .

j =1

Rij = 1 −

xi − xj  . arg maxk l xk − xl 

Assume that the training data are composed of randomly drawn normalized binary combinations of the data in each cloud, much like the gene products are combinations of terms. The circles in Fig. 6 represent the 60 training data points, i.e., the genes or gene products. Both the OSOM and the SOM are initialized to the same starting points with a 10 × 10 toroidal grid. Identical initialization is accomplished by randomly initializing the OSOM weight vectors w  i ∈ [0, 1]2 and then computing the

Each algorithm was run for 1000 iterations with equivalent learning rates and neighborhood functions. Fig. 7 shows the comparison of the OSOM and SOM mappings of the two Guassian clouds data. The results are identical.

B. GPD194 Gene Products To show the strength of the similarity-measure-based OSOM, we compared against the SOSM [16] and the batch-relational SOM [17]. As in [16], we use a dot-product similarity measure

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

HAVENS et al.: COMPUTING WITH WORDS WITH THE ONTOLOGICAL SELF-ORGANIZING MAP

Fig. 8.

OSOM network mapping of GP D194 using F (M A X ) update [see (14)]. (a) OSOM—s(G O P ) . (b) OSOM—s(AV G ) . (c) OSOM—s(O WA ) .

Fig. 9.

Batch OSOM network mapping of GP D194. (a) Batch OSOM—s(G O P ) . (b) Batch OSOM—s(AV G ) . (c) Batch OSOM—s(O WA ) .

for the SOSM, which is given by  i , gj ) = w  i · gj . s(SOSM ) (w

(16)

The SOSM update equation is equivalent to (11) and is given by  (new ) (old) (old)  ∀i. (17) =w i + (t) · hip · gd − w i w i The batch-relational SOM trains on objects represented by dissimilarity data. However, this algorithm is unable to directly encode the ontological data as a binary vector of terms; thus, a pairwise training-object dissimilarity matrix must be genes = computed. We computed the dissimilarity matrix by Dij (GOP) d (Gi , Gj )∀i, j. The batch-relational SOM trains in a similar fashion to relational-duals of c-means clustering algorithms [19]. Fig. 8 shows the visualizations of the trained OSOM networks using the F (M AX) prototype update equation for the three different similarity measures. All three visualizations show that the OSOM is able to correctly show the groupings of the three gene-product families, i.e., the collagens, receptor precursors, and myotubularins (the three families are described in Table I). Interestingly, the OWA and AVG similarity measures produced smaller (but more populated) groups of gene products from each family. All three visualizations show that the OSOM mapped the

481

myotubularins (MTMR) family onto a tightly grouped region. This family is annotated by nearly identical GO terms; hence, this follows our expectation. We also expected that the F (M AX) update would tend to group objects together as the update is moving the prototypes toward all terms that are similar to the terms in the test signal. This produces less-specific prototypes. Fig. 9 shows the visualizations of the trained batch OSOM networks for the three different similarity measures. In contrast with the F (M AX) networks, as shown in Fig. 8, the batch OSOM produces a more spread-out mapping of the gene products (with the exception of the OWA similarity). This is because the update only moves the prototypes toward the terms present in the gene products. Thus, the prototypes are more specific. The batch OSOM using s(GOP) , as shown in Fig. 9(a), produces the mostpleasing result from an informational standpoint. The families are grouped in connected regions on the map, but the color visualization shows that there is substructure within the families. For example, the collagen gene products are separated into three red islands; the islands are centered at (5,6), (7,8), and (7,5). This substructure has been identified in our other work [36] and is also corroborated by the experiments described in [37]. Another pleasing aspect of this map is that gene products mapped to the locations (4,8), (1,8), and (1,4) are known outliers. The FGFR1, TEK, and COL21A1 gene products mapped to those

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

482

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

updated using a batch algorithm. There are differences between the OSOM and batch-relational SOM, however. First, the RSOM map does not show the boundaries between the families. It looks as though all prototypes are fairly similar to each other, except for the outliers at (8,8), (8,2), and (2,4). Also, the strength of the OSOM formulation is that the linguistic content of each prototype is directly encoded in the prototype weight vector—each weight-vector element represents a term in the GO. Thus, the summarizations of the network locations are directly encoded in the map. In contrast, the batch-relational SOM prototypes encode the weight of each object (in this case, gene product) in defining the map. Hence, each weight-vector element is the contribution of a training object, rather than the terms themselves. Term-based summarizations could be computed from the batch-relational SOM by aggregation or counting methods, such as those described in [36]. C. Cell-Apoptosis-Related Genes

Fig. 10. (a) SOSM and (b) batch-relational SOM network mappings of GP D194.

locations are known to have erred or incomplete annotations that cause them to be erroneously grouped (note that these annotation errors have since been corrected, but we use these data for consistency and validation purposes). Fig. 10 shows the results of the SOSM and batch-relational SOM on the GP D194 data. The SOSM visualization, as shown in Fig. 10(a), is perhaps pleasing from the standpoint that it clearly shows the delineation between the three families, including the substructure in the collagens and the known outliers at locations (8,4) and (7,4). However, the SOSM, as shown in Fig. 11(a), fails to capture the underlying structure of the similarity between the three families. This family separation in the SOSM is caused because the SOSM does not consider the termbased similarity; it only calculates the similarity based on a dot product. The three families do not share any terms between them; thus, the locations on the SOSM map are very separated. The families do, however, share similar terms, and this similarity is not captured in the SOSM. The results of the batch-relational SOM, as shown in Fig. 10(b), are pleasing. The map is well-populated, the family structure is somewhat evident, and the underlying similarities between certain members of different families are shown. As expected, this map is very similar to the batch OSOM map shown in Fig. 9(a). Each of these maps are computed using equivalent similarity measures, namely, the GOP measure, and are

Table IV outlines 30 genes that are known to be related to cell apoptosis, that is, programmed cell death [7]. Genes 1–10 are known to be antiapoptotic, i.e., they prevent cell death. Genes 11–19 are proapoptotic, i.e., they are involved in initiating cell death. Genes 20–30 are involved in apoptosis, but the GO annotations do not define them to be either pro- or antiapoptotic. These genes are important to understand the mechanisms of several cell-related diseases. Some cancer-causing viruses, including human papilloma virus (HPV), prevent cell apoptosis in the compromised cells [38]. Other diseases, such as acquired immune deficiency syndrome (AIDS) and Alzheimer’s, cause cell death [39]. Thus, it is important to understand the role(s) that genes play in cell apoptosis. Fig. 11 shows the results of training the batch OSOM, the SOSM, and the batch-relational SOM with the cell-apoptosis genes in Table IV. Again, the SOSM shows the obvious result— all the antiapoptosis genes are mapped to the same location, all the proapoptosis genes are mapped to two locations in the lower left, and most “unknown” genes (i.e., 20–30) are mapped to the top center. Note that the “known” genes (i.e., 1–19) are located on a red island that is connected, while most of “unknown” genes (i.e., 20–30) are separated from the rest by a dark-blue boundary. As shown with the GP D194 dataset, the SOSM captures the obvious relationships of genes, but shows none of the underlying similarities. Additionally, the functions of most of the “unknown” genes (i.e., 20–30) cannot be inferred from this mapping. In contrast, both the batch OSOM in Fig. 11(a) and the batchrelational SOM in Fig. 11(c) show some interesting mixing of the genes. Note that several of the antiapoptotic genes (4,6,8,10) are mapped to location (3,3) in the OSOM map. However, one proapoptotic gene (19) and two of the “unknown” set (23,25) are mapped to this location as well. Although the MRT for this location is GO:0006916, antiapoptosis, this term is not shared by genes 19, 23, and 25; the other term-based similarities map these genes into this location. We found that a couple of these specific associations shown by the batch OSOM are supported by the biomedical literature. BAG1 (4) and GAS2 (23) are involved

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

HAVENS et al.: COMPUTING WITH WORDS WITH THE ONTOLOGICAL SELF-ORGANIZING MAP

Fig. 11.

483

Network mapping of cell-apoptosis-related genes. (a) Batch OSOM—s(G O P ) . (b) SOSM. (c) Batch-relational SOM. TABLE IV CELL-APOPTOSIS-RELATED GENES [7]

in cocaine-induced changes in fetuses [40], and BAG1 (4) and BAD (19) have been associated in survival of neuronal cells [41]. Another interesting map location in the batch OSOM is (1,1). There are three antiapoptotic genes (3,5,7), four proapoptotic genes (12–14,16), and one of the unknown set (29). All of these genes share the term GO:0005515, protein binding, as well as other protein-binding terms. These types of relationships are important for biologists who wish to determine the links between genes that may not share annotations but do share similar terms (as measured by the ontology-based similarity measures). The batch-relational SOM mapping of the cell-apoptosisrelated genes is effective in showing the relationships between the genes and is very similar in appearance to the OSOM. Upon further inspection, there are similar groupings in the batch-relational SOM and OSOM maps. However, recall that the batch-relational SOM does not directly encode the term content in the network prototypes. Thus, cluster summarization is not straightforward, as it is with the OSOM. One must also go through the additional step of computing the pairwise gene-similarity matrix, upon which the batch-relational SOM operates. Table V contains the MRTs for each map location in the batch OSOM map of the cell-apoptosis-related genes. Two of the groups have an MRT that suggests antiapoptotic- or proapoptotic-related function, with OSOM locations (1,3) and (3,3). The locations (2,3) and (3,2) have an MRT that is apoptosis-related, but these apoptosis annotations are specific and only suggest that the genes are apoptosis-related. Table VI provides summarizing remarks about the variants of the OSOM. Overall, we found the most pleasing results to be the batch OSOM with the s(GOP) similarity measure both

TABLE V MRTS OF OSOM MAP OF CELL-APOPTOSIS-RELATED GENES

TABLE VI SUMMARY OF ALGORITHMS’ BEHAVIORS

for the quality of the visualization, as well as the linguistic summarizations.

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

484

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

V. CONCLUSION The results in Section IV show that the OSOM is a powerful tool to visualize the relationships between objects composed of ontological data. Because these data are represented as collections of terms, the standard SOM is ill-equipped for these data. The results of the SOSM show that it is able to show the obvious relationships between the genes and gene products. However, the substructure and, more importantly, the relationships between gene products that do not share identical terms are not shown. Section IV illustrated that the OSOM with the GOP similarity measure is equivalent to the SOM for object data. However, the SOM cannot take ontological data as input. Thus, the OSOM can do everything the SOM is able to, but it can also analyze ontological data. The OSOM encodes the ontological data directly and computes a visualization of the gene products that shows how they are related to one another. Additionally, the weight values in the OSOM prototypes are the relative strength of each GO term in defining the genes and gene products mapped to that prototype. Each prototype is essentially a sentence that describes the mapped genes and/or gene products. Similar to the OSOM, the batch-relational SOM provided accurate and meaningful visualizations of the genes and gene products. However, this is achieved at an additional cost: The pairwise gene-dissimilarity matrix must be precomputed. Additionally, it is not straightforward to compute a term-based summarization of each prototype. The drawback to the OSOM is that it requires the ontologybased objects to be described as vectors, where each element represents a specific term. The GO, as of 2009, has approximately 30 000 terms. Thus, if one were to generalize the OSOM to the entire GO, each gene or gene-product vector would have 30 000 elements, with only a few being nonzero. For this case, the OSOM would be computationally expensive. To combat this drawback, we are currently developing algorithms that operate within the ontology tree itself. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their insightful suggestions for improving this paper. Tim would like to thank Steph for her scrutiny and editing. REFERENCES [1] L. Zadeh, “Outline of a new approach to the analysis of complex system and decision processes,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 1, pp. 28–44, Jan. 1973. [2] L. Zadeh, “From computing with numbers to computing with words— From manipulation of measurements to manipulation of perceptions,” Int. J. Appl. Math. Comput. Sci., vol. 12, no. 3, pp. 307–324, 2002. [3] J. Mendel, “The perceptual computer: An architecture for computing with words,” in Proc. FUZZ-IEEE, Melbourne, Vic., Australia, Dec. 2001, pp. 35–38. [4] J. Mendel, “Computing with words and its relationships with fuzzistics,” Inf. Sci., vol. 177, no. 4, pp. 988–1006, Feb. 2007. [5] J. Mendel, “Computing with words: Zadeh, Turing, Popper, and Occam,” IEEE Comput. Intell. Mag., vol. 2, no. 4, pp. 10–17, Nov. 2007. [6] The Gene Ontology Consortium, “The Gene Ontology (GO) database and informatics resource,” Nucl. Acids Res., vol. 32, pp. D258-D261, 2004.

[7] D. Xu, J. Keller, M. Popescu, and R. Bondugula, Applications of Fuzzy Logic in Bioinformatics. London, U.K.: Imperial College, 2008. [8] T. Kohonen, “Automatic formation of topological maps of patterns in a self-organizing system,” in Proc. SCIA, E. Oja and O. Simula, Eds. Helsinki, Finland, 1981, pp. 214–220. [9] S. Kaski and T. Kohonen, “Exploratory data analysis by the self-organizing map: Structures of welfare and poverty in the world,” in Proc. 3rd Int. Conf. Neural Netw. Cap. Markets, London, U.K., 1996, pp. 498–507. [10] J. Keller, M. Popescu, and J. Mitchell, “Taxonomy-based soft similarity measures in bioinformatics,” in Proc. IEEE Int. Conf. Fuzzy Syst. Budapest, Hungary: IEEE Press, Jul. 2004, pp. 23–30. [11] M. Popescu, J. Keller, and J. Mitchell, “Fuzzy measures on the Gene Ontology for gene product similarity,” IEEE Trans. Comput. Biol. Bioinformatics, vol. 3, no. 3, pp. 263–274, Jul.–Sep. 2006. [12] S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, “Gapped BLAST and PSI-BLAST: A new generation of protein database search programs,” Nucl. Acids Res., vol. 25, pp. 3389– 3402, 1997. [13] T. Kohonen, Self-Organizing Maps (Information Sciences Series 30). Berlin, Germany: Springer-Verlag, 2001. [14] T. Hubbard, B. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, G. Coates, S. Fairley, S. Fitzgerald, F. J. Banet, L. Gordon, S. Graf, S. Haider, M. Hammond, R. Holland, K. Howe, A. Jenkinson, N. Johnson, A. Kahari, D. Keefe, S. Keenan, R. Kinsella, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, K. Megy, P. Meidl, B. Overduin, A. Parker, B. Pritchard, D. Rios, M. Schuster, G. Slater, D. Smedley, W. Spooner, G. Spudich, S. Trevanion, A. Vilella, J. Vogel, S. White, S. Wilder, A. Zadissa, E. Birney, F. Cunningham, V. Curwen, R. Durbin, F. X. M. Suarez, J. Herrero, A. Kasprzyk, G. Proctor, J. Smith, S. Searle, and P. Flicek, “Ensembl 2009,” Nucl. Acids Res., vol. 37, pp. D690–D697, 2009. [15] A. Enright, S. VanDongen, and C. Ouzounis, “An efficient algorithm for large-scale detection of protein families,” Nucl. Acids Res., vol. 30, no. 7, pp. 1575–1584, 2002. [16] H. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biol. Cybern., vol. 61, no. 4, pp. 241–254, Aug. 1989. [17] A. Hasenfuss and B. Hammer, “Relational topographic maps,” in Advances in Intelligent Data Analysis VII. Berlin, Germany: SpringerVerlag, 2007, pp. 93–105. [18] T. Konkela, S. Kaski, K. Lagus, and T. Kohonen, “Exploration of full-text databases with self-organizing maps,” in Proc. ICNN, Washington, DC, vol. 1, Jun. 1996, pp. 56–61. [19] R. Hathaway, J. Davenport, and J. Bezdek, “Relational duals of the c-means clustering algorithms,” Pattern Recognit., vol. 22, no. 2, pp. 205– 212, 1989. [20] M. Brameier and C. Wiuf, “Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps,” J. Biomed. Inf., vol. 40, no. 2, pp. 160–173, Apr. 2007. [21] D. Martin, C. Brun, E. Remy, P. Mouren, D. Thieffry, and B. Jacq, “Gotoolbox: Functional analysis of gene datasets based on gene ontology,” Genome Biol., vol. 5, no. 12, pp. R101-1–R101-8, 2004. [22] J. Lee, R. Sinkovits, D. Mock, E. Rab, J. Cai, P. Yang, B. Saunders, R. Hsueh, S. Choi, S. Subramaniam, R. Scheuermann, and in collaboration with the Alliance for Cellular Signaling, “Components of the antigen processing and presentation pathway revealed by gene expression microarray analysis following B cell antigen receptor (BCR) stimulation,” BMC Bioinformat., vol. 7, no. 237, pp. 1–19, 2006. [23] N. Shah and N. Fedoroff, “Clench: A program for calculating cluster enrichment using the gene ontology,” Bioinformat., vol. 20, no. 7, pp. 1196– 1197, May 2004. [24] G. Bindea, B. Mlecnik, H. Hackl, P. Charoentong, M. Tosolini, A. Kirilovsky, W.-H. Fridman, F. Pages, Z. Trajanoski, and J. Galon, “ClueGO: A Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks,” Bioinformat., vol. 25, no. 8, pp. 1091–1093, Apr. 2009. [25] C. Henegar, R. Cancello, S. Rome, H. Vidal, and K. Cl´ement, J. Zucker, “Clustering biological annotations and gene expression data to identify putatively co-regulated biological processes,” J. Bioinformat. Comput. Biol., vol. 4, no. 4, pp. 833–852, Aug. 2006. [26] J. Wang, Z. Du, R. Payattakool, P. Yu, and C. Chen, “A new method to measure the semantic similarity of GO terms,” Bioinformat., vol. 23, pp. 1274–1281, 2007. [27] S. Raychaduri and R. Altman, “A literature-based method for assessing the functional coherence of a gene group,” Bioinformat., vol. 19, no. 3, pp. 396–401, 2003.

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.

HAVENS et al.: COMPUTING WITH WORDS WITH THE ONTOLOGICAL SELF-ORGANIZING MAP

[28] J. Keller, J. Bezdek, M. Popescu, N. Pal, J. Mitchell, and J. Huband, “Gene ontology similarity measures based on linear order statistics,” Int. J. Uncertainty, Fuzziness Knowl.-Based Syst., vol. 14, no. 6, pp. 639–661, 2006. [29] P. Lord, R. Stevens, A. Brass, and C. Goble, “Semantic similarity measure as a tool for exploring the gene ontology,” in Proc. Pac. Symp. Biocomput., 2003, pp. 601–612. [30] J. Jiang and D. Conrath, “Semantic similarity based on corpus statistics and lexical ontology,” presented at the Int. Conf. Res. Comput. Linguistics X, Taipei, Taiwan, 1997. [31] I. Sledge, T. Havens, J. Bezdek, and J. Keller, “Relational generalizations of cluster validity indexes,” IEEE Trans. Fuzzy Syst., doi: 10.1109/TFUZZ.2010.2048114, 2010. [32] I. Sledge, J. Bezdek, T. Havens, and J. Keller, “Relational duals of cluster validity functions for the c-means family,” IEEE Trans. Fuzzy Syst., to be published. [33] R. Yager and J. Kacprzyk, The Ordered Weighted Averaging Operators: Theory and Applications. Amsterdam, The Netherlands: SpringerVerlag, May 1997. [34] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Upper Saddle River, NJ: Prentice-Hall, 1995. [35] M. Cottrell, B. Hammer, A. Hasenfuss, and T. Villmann, “Batch and median neural gas,” Neural Netw., vol. 19, pp. 855–863, 2006. [36] M. Popescu, J. Keller, J. Mitchell, and J. Bezdek, “Functional summarization of gene product clusters using Gene Ontology similarity measures,” in Proc. 2004 ISSNIP, Piscataway, NJ: IEEE Press, pp. 553–559. [37] J. Myllyharju and K. Kivirikko, “Collagens, modifying enzymes, and their mutation in humans, flies, and worms,” Trends Genet., vol. 20, no. 1, pp. 33–43, 2004. [38] T. Garnett and P. Duerksen-Hughes, “Modulation of apoptosis by human papillomavirus (hpv) oncproteins,” Arch. Virol., vol. 151, no. 12, pp. 2321–2335, Dec. 2006. [39] R. Cameron and G. Feuer, Eds., Apoptosis and Its Modulation by Drugs. (Handbook of Experimental Pharmacology Series). New York: SpringerVerlag, 2000. [40] S. I. Novikova, F. He, J. Bai, I. Badan, I. A. Lidow, and M. S. Lidow, “Cocaine-induced changes in the expression of apoptosis-related genes in the fetal mouse cerebral wall,” Neurotoxicol. Teratol., vol. 27, pp. 3–14, 2005. [41] R. G¨otz, S. Wiese, S. Takayama, G. C. Camarero, W. Rossoll, U. Schweizer, J. Troppmair, S. Jablonka, B. Holtmann, J. C. Reed, U. R. Rapp, and M. Sendtner, “Essential role of bag-1 in differentiation and survival of hematopoietic and neuronal cells,” Nat. Neurosci., vol. 8, no. 9, pp. 1169–1178, 2005.

Timothy C. Havens (S’06) received the B.S. and M.S. degrees in electrical engineering from Michigan Technological University, Houghton, in 1999 and 2000, respectively. He is currently working toward the Ph.D. degree in electrical and computer engineering at the University of Missouri, Columbia. He was with the Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, where he specialized in the simulation and modeling of directed energy and global positioning systems. His current research interests include clustering in relational data and ontologies, fuzzy logic, bioinformatics, and pattern recognition.

485

James M. Keller (F’00) received the Ph.D. degree in mathematics from the University of Missouri, Columbia, in 1978. He holds the Curators Professorship with the Department of Electrical and Computer Engineering and the Department Computer Science, University of Missouri, Columbia, where he is also the R. L. Tatum Professor with the College of Engineering. His industrial and government funding sources include the Electronics and Space Corporation, the Union Electric, the Geo-Centers, the National Science Foundation, the Administration on Aging, The National Institutes of Health, The National Aeronautics and Space Administration/Johnson Space Center, the Air Force Office of Scientific Research, the Army Research Office, the Office of Naval Research, the National Geospatial Intelligence Agency, the Leonard Wood Institute, and the Army Night Vision and Electronic Sensors Directorate. He has authored or coauthored more than 300 technical publications. He is currently an Associate Editor of the International Journal of Approximate Reasoning and a member of the Editorial Board of Pattern Analysis and Applications, Fuzzy Sets and Systems, the International Journal of Fuzzy Systems, and the Journal of Intelligent and Fuzzy Systems. His research interests include computational intelligence: fuzzy-set theory and fuzzy logic, neural networks, and evolutionary computation with a focus on problems in computer vision, pattern recognition, and information fusion, including bioinformatics, spatial reasoning in robotics, geospatial intelligence, sensor and information analysis in technology for eldercare, and landmine detection. Prof. Keller is a Fellow of the International Fuzzy Systems Association, a Distinguished Lecturer of the IEEE Computational Intelligence Society, a National Lecturer for the Association for Computing Machinery from 1993 to 2007, and a Past President of the North American Fuzzy Information Processing Society (NAFIPS). He received the 2007 Fuzzy Systems Pioneer Award from the IEEE Computational Intelligence Society. He has had a full six-year term as Editor-in-Chief of the IEEE TRANSACTIONS ON FUZZY SYSTEMS. He was the Vice President for Publications of the IEEE Computational Intelligence Society from 2005 to 2008 and is currently an elected Adcom member. He was the Conference Chair of the 1991 NAFIPS Workshop, a Program Cochair of the 1996 NAFIPS meeting, a Program Cochair of the 1997 IEEE International Conference on Neural Networks, and the Program Chair of the 1998 IEEE International Conference on Fuzzy Systems. He was the General Chair of the 2003 IEEE International Conference on Fuzzy Systems.

Mihail Popescu (SM’08) received the M.S. degree in medical physics in 1995, the M.S. degree in electrical engineering in 1997, and the Ph.D. degree in computer science in 2003 from the University of Missouri, Columbia. He is currently an Assistant Professor with the Department of Health Management and Informatics, University of Missouri. His research interests include eldercare technologies, fuzzy logic, and ontological pattern recognition.

Authorized licensed use limited to: Timothy Havens. Downloaded on June 09,2010 at 14:48:14 UTC from IEEE Xplore. Restrictions apply.