Supplementary Information: Tutorial

0 downloads 0 Views 294KB Size Report
PHP front-end to a MySQL database. Figure T1 shows an example of a more complex query. In ... Figure T1. Gene Ontology Database Advance Search Option.
Supplementary Information: Tutorial The Gene Ontology Specificity Quantifier Database allows for simple as well as more complex (compound) queries. The Gene Ontology Specificity Quantifier Database was written with a PHP front-end to a MySQL database. Figure T1 shows an example of a more complex query. In this example, the search involves membrane-related genes with a desire for more general term (information less than 4 bits). The search is narrowed to the cell component branch of the Gene Ontology.

Figure T1. Gene Ontology Database Advance Search Option

As shown in Figure T2, information can be projected onto a scale from general terms (low bits) to high specify (large number of bits). Using this information, gene sets can be partitioned into unbiased sets- with information distributed more evenly across the set (than by using the Gene Ontology’s graphical structure as a proxy for specificity).

Figure T2. Gene Ontology Terms on an Information Theoretic Scale

The results are shown in Figure T3. From here, the user can see which partitions (ref) a particular GO term belongs to as well as various information theoretic metrics used to determine this. The user can then print or export the results to a variety of formats, including XML, Excel, CSV, and Word (see Figure T4 and Figure T5). Figure T6 shows an example of exporting to Excel.

Figure T3. Gene Ontology Database Results

Figure T4. Ontology Database Print Option

Figure T5. Gene Ontology Database Export Option

Figure T6. Gene Ontology Database Export Option to Excel

An example of how GO can be partitioned into a set of nodes with similar information (8 nodes in this case) is shown in Figure T7 (with the numbers representing GO identifiers for the unselected nodes). An example of the improvement in selecting uniform partitions is shown in Figure T8. These show histograms of GO level nodes versus GO partitions with a tighter distribution for the GO partition-based information compared to that of graphical structure-derived GO level node information.

Figure T7. An 8-node GO Partition

Figure T8. Histogram of GO Level 2 Versus GO Partitions Level 2

An example of using partitions for analysis is described next. This example uses the “HOX_LIST_JP” set from Gene Set Enrichment Analysis (GSEA) dataset. This data contains HOX proteins involved in hematopoiesis. Figure T9 shows the 6-node GO term partition applied to this set. Several findings are apparent: “regulation of metabolism” and “transcription” are highly enriched, and their p-values are confirmed to be 1.38 x 10-38 and 7.91 x 10-38 respectively. However, several smaller clusters of proteins exist within this set, which correspond to annotation with the “response to stimulus,” “organismal physiological process,” and “biopolymer metabolism” nodes. Thus, Figure T9 demonstrates what appear to be several functional subclassifications of HOX genes.

P28062 Q9UJM3

P52951

Q03014

Q92826

biopolymer metabolism

organismal physiological process

response to stimulus

Q9NYY1

O00257Q9HC52

P14652

Q15910

P35226

Q15022

transport

P35227

Q92800

O43248

P17482

P35452

P20719 P40424 P09067

P31267

P31260 P09016

O00470

Q96GD3

regulation of metabolism Q9NYD6

Q8IXJ9

P09629

Q92988

P31268

P14651

P31276 P17509

Q9UQR0

P31249

P31273 O43365

P54821

transcription

P28358

P28356

P35453

P31269

O43364

P31270

Q14774 P09017 P50458

Q9UN30

P31277

Q00056

Figure T9. Using a 6-node GO term partition, visual gene enrichment for regulation metabolism and transcription is evident in these HOX proteins involved in hematopoiesis.