Decision Making Biases in using Sequence Logo ...

4 downloads 702 Views 228KB Size Report
2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES). Decision ... sequence logo as an evaluation metric for transcription factor.
2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES)

Decision Making Biases in using Sequence Logo Visualization Nung Kion, Lee1, Yin Bee, Oon2 Faculty of Cognitive Sciences and Human Development Universiti Malaysia Sarawak Kota Samarahan, 94300 Sarawak {1nklee, 2yinbee}@fcs.unimas.my Abstract— Sequence Logo is a visualization method for displaying conservations characteristics of a sequence (DNA, RNA, proteins) motif profile obtained from either wet-lab or computational analysis. Usage of visualization in decision making carries some elements of subjectivity. In addition, people’s decisions are often biased in favor of their proposed hypotheses. The objectives of this paper were to examine the biases in using sequence logo as an evaluation metric for transcription factor analysis and identify some critical weaknesses in sequence logo for possible future improvements. Document analysis and subject matter expert interviews method were used for information gathering. We found that sequence logo has been frequently misused to support the results obtained from computational transcription factor analysis. In addition, we suggest that current sequence logo can be improved in several aspects to support various users' needs and minimize elements of subjectivity in decision making.

I.

INTRODUCTION

In scientific studies, visualization is important for communicating methodology or empirical result to readers. It aims to improve the quality and clarity of information presentation. Sequence logo [1] has been a popular and widely accepted method for visualizing motif composition of biological sequences in the past 20 years. A motif is the characteristic feature of transcription factor (TF) protein binding sites or short segments in protein/RNA sequences that have specific functional roles in cells. In this paper, we focus our study on DNA motif. A probabilistic motif profile of a TF, which is constructed from a set of real or putative binding sites, represents the specificity of binding sites it bound. The sequence logo shows the conservation property and relative frequencies of nucleotide symbols (i.e., A, C, G, T) in each position on a multiple-aligned binding sequences. This can facilitate the identification of a motif’s characteristic signature. A sequence logo has typically used for several purposes in leading journal articles: (a) to illustrate motif characteristics obtained from wet-lab or computational analysis; (b) to compare and contrast motifs obtained in an evaluation study; or (c) as performance evaluation metric in various experimental studies including wet-lab or computational tools. Despite being useful, the sequence logo has some apparent weaknesses. Since a logo only visualizes the summarized information in a motif, it has some associated risks when used

Both authors contributed equally in this study.

978-1-4673-1734-4/12/$31.00 ©2012 IEEE

Figure 1. An example of a motif sequence logo. Shown is the NFIL3 motif profile obtained from JASPAR. The top sub-figure is the PFM of the multiple aligned sites whereas the bottom one is its sequence logo.

for objective comparison purposes. The interpretation of a logo might be subject to the researcher’s confirmation biases [4]. For example, in computational motif prediction, the raw binding sites used to generate the sequence logo are not known to a reader. As such, performance comparison of computational motif prediction tools based on logo could lead to inaccurate claims. In addition, the use of sequence logos for comparison purposes is relied on individual perception and experiences and therefore the interpretation can be very subjective [6]. As a result, some conclusion deduced from a visualized motif logo can be misleading and portray inaccuracies when use in some contexts of studies. Other than that, there could be mismatch in terms of the amount of information convey to the reader and how the information is perceived. That is, visualizations may appear more convincing and sound than they really are [6]. For example, the use of colors in a sequence logo can impress the reader more than the actual quality of the motifs presented. These weaknesses can, to a certain extent, jeopardize the quality and precision needed in scientific result publications. Several novel and improved versions, based on the original sequence logo visualization methods, have been proposed. Reference [7] proposed CorreLogo to visualize RNA/DNA logos in 3D. RNAlogo was proposed to visualize RNA motif [8]. Two Sample Logo has extended the original sequence logo to include visualizing of statistical differences between two sets of sequence alignment [9]. Reference [10] proposed BerryLogo, a motif visualization method based on the log-odd scores instead of the information content used in the sequence logo. enoLogo allows visualization of logo based on the

2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES) standard count matrix as well as binding energy matrix [11]. Despite many improvements recorded in the literature, the original sequence logo remains as the most widely used method in publications. The aims of this paper are (a) to identify biases associated with the use of sequence logo for information communication and decision making; (b) to identify the user needs on the use of the sequence logo; (c) to propose some possible improvements on the sequence logo. In this study, we focused our analysis on DNA motif profiles but our results are applicable to other types of sequences as well. This paper is organized into five sections. Section II presents the background on DNA motif profile visualization using sequence logo; Section III provides some interpretations on the sequence logo; Section IV present our document analysis and subject matter experts interview results; Section V gives the conclusion and possible future work. II.

BACKGROUND

The task of transcription factor analysis is to identify binding sites, i.e. short DNA sequences of 6-20 base-pairs (bp) in length located in the upstream or downstream of genes they regulate that are bound by TF proteins. Locating binding sites is critical because proteins-DNA interactions are involved in the regulation of the gene expression for the production of proteins. A motif profile of a TF represents the specificity of binding sites it recognizes (bound). Once some binding sites of a TF are available, its profile can be represented by using a Position Frequency Matrix (PFM) (a n × k matrix, where k is the motif length and n is the number of symbols) [12]. A PFM is obtained by counting the occurrences of symbols in each column of multiple alignment of a set of binding sites. Reference [3] showed the detail steps on how a motif profile is computed and how the sequence logo is generated. Figure 1 illustrates an example of a sequence logo obtained from the JASPAR database. In transcription factor analysis, scientists are interested to identify the specificity of a TF protein–signature of the binding sites it recognizes [12] and the locations of these binding sites in the genome under study. Motif signatures can be identified with relatively ease using the sequence logo compared to PFM. A. Elements in a Sequence Logo Table I describes the core elements in a sequence logo. B. Usages of Sequence Logo A sequence logo can be directly or indirectly used for many purposes in scientific studies related to transcription analysis. In the following discussion we summarize its usages. 1) Tools evaluation. To evaluate the effectiveness of a newly proposed algorithm for solving a DNA motif discovery problem, researchers often use the ability to discover known motif profiles in benchmark datasets as measurement of a tool’s success rates (e.g., in [5, 13]). A predicted motif is typically considered as true positive if the discovered motif profile, to a certain degree, is “similar” to an annotated profile.

TABLE I. Nucleotides

Color

Height

Axes

ELEMENTS OF A SEQUENCE LOGO

The symbols A,C,G,T correspond to the four types of nucleotides (Adenine, Cytosine, Guanine, Thymine) that form a DNA sequence. These symbols are shaped to become the “bar” in the chart. Each nucleotide symbol is associated with a distinct color (users can choose to use gray scale as well). Primary colors are usually employed. There is no formal standard for the coloring scheme used. The total height of the four stacked symbols indicates the conservation level measured by using the information content concept (max 2 bits) at a particular alignment column. Each symbol height represents its relative frequency. The horizontal axis shows the actual positions relative the genes where they are located or simply showing positions in the multiple alignment columns. The vertical axis marks the relative frequencies and conservation level of symbols.

2) To support the usefulness of a biological methodology (e.g., ChIP-ChIP or microarray gene expression analysis) or computational framework (e.g., footprinting), sequence logo is used to visualize the motif profile obtained (e.g., in [14-15]). 3) Motif characteristic analysis. Sequence logo has been used to compare and contrast characteristics of different binding site specificities of the same TF. It is also being simply used to show the conservation property of a novel motif profile discovered by computational or biological analysis. Motif profile characteristic analysis will enable identification of regions in a motif profile that are preferences to be in contact with protein. Also, it helps in evolutionary rates analysis in different positions of a motif [16]. From those usages, it can be seen that sequence logo plays an important role as a method to support hypotheses, claims and decisions made in a research study. Therefore, proper usages are needed to ensure the decision obtained is unbiased. III.

INTERPRETATION OF SEQUENCE LOGO

A. Biases in decision making Scientists in biotechnology or medical fields often rely on evidence to interpret certain results either in comparing the species relationships or identifying certain diseases. It is critical to have correct decision without biases. In the decision making process, some factors such as experience, time pressure, unusual situation, number of cues could influence the user into certain decision [17]. There are several types of human biases which might lead to unintended decisions. Reference [18] mentioned a few common biases in making judgments confirmation bias, availability bias and excess focus on certainty. Sometimes users tend to pay attention to a limited number of cues in the visualization that leads to ignoring of possible important information. In addition, confirmation bias might occur causing the user then to focus only on the cues that are confirmed. The disconfirmed cues could be somehow important for the visualization [17]. Reference [19] studied the de-biasing approaches of using a graphical layout rather than with a text baseline to help in

2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES) decision making for intelligence analysis; for example in medical and military decisions. By using graphical visualization, experts are able to provide a more balanced and less biased selection [19]. It emphasized and confirmed the needs and importance of using graphical visualization rather than only showing complex information for decision making. B. Usability of the output In Bioinformatics, the ultimate goal is to produce accurate and usable visual representations that help the users to understand the complex relationships efficiently and allocate their attention to the critical information [20]. Usability is essential in three areas: (a) presentation of data; (b) interaction with data; and (c) usability of the data. Data set clustering, viewpoint manipulations and additional control for details were suggested to improve the usability [21]. Reference [22] also investigated the usability aspects of visualization techniques on user decision making. The usability of Human-System Interpretation tools taxonomy was derived. Information visibility, interface logical flow, memory load, human errors, and so on, contributed to the overall usability of the system. Visualization displays should not solely depend on static information. The user should be able to interact with the data with more controls, for example, adding and reducing information display, categorizing information, comparing visualization outputs, allowing collaboration, and use interactive output details [22]. IV.

METHODS

A. Document Analysis We wanted to understand how researchers used sequence logos in published articles. We randomly selected 20 quality journal articles published in year 2000 onwards. They were loosely grouped into computational or biological analysis articles. We reviewed the following items for each article: (a) kind of research study; (b) purpose of study; (c) purpose of logo use; (d) claims derived from logo; and (e) whether additional information was used to support claims. From our reviews, Figure 2 illustrates a typical process of a sequence logo being used in making decisions or claims regarding the hypothesis in the research study. As can be seen, the sequence logo was generated from the motif profiles obtained from biological wet-lab analysis or a computational tool. This illustrates how a sequence logo may serve as a critical tool for decision making. From sequence logos the claims/decisions/conclusions derived were used to support the hypothesis in the methodology used. The following are our major findings from the document analysis. 1) We found that the use of sequence logos for result presentation in biological analysis articles is more accurate because the researchers are usually focused on a single TF analysis in their study (e.g., [14-15]). The researchers usually have expert knowledge regarding which characteristics of motifs to expect. Therefore, the claims made are based on the clear analysis of the motif profiles they discovered (e.g., the type of motif profile to expect

Figure 2. Decision making showing use of sequence logo

should conform to the protein binding domains). Also, in addition to visualization, texts are included in all reviewed articles to discuss the justification of their judgement. Quality of the reported motif profile is often supported by statistical significance value or additional biological data [15]. 2) In computational analysis, we found frequent biases in the comparison of results using sequence logo to visualize predicted motif profiles. The bias was caused by the use of sequence logo to claim by article authors that their proposed tool performed better than others in evaluation study. The claims were made after they found their tool had higher success rates in discovering motif profiles that are similar to the annotated profiles. From our document analysis, the following words were used by researchers to indicate the similarity between two sequence logos: resemblance, corresponded, like, similar, as (previously indicated), agreement, and identical. Adjectives such as “strong”, “relatively”, “near”, “good”, and “remarkably” were used to indicate the degree of similarity. Biases in decision making happened because a sequence logo does not carry adequate information so objective claims could not be made. For example, in [23], the authors claimed that different tools AMADEUS, WEEDER and TRAWLER performed equally well because they accurately discovered the CREB-like motif profile. But closer examination reveals flaws in this claim. First, the sequence logo does not “show the binding sites” predicted by each tool and therefore cannot determine the

2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES) quality of the motif profile obtained. For instance, if one tool returned 100 sites and another 50, and both show quite similar motif profiles, which one has performed better? From the visualized logos, we can only claim that a tool can discover a true motif profile but cannot claim on its effectiveness. Secondly, the quality of the predicted motif profile was not further evaluated in any other way. The sequence logo was not designed to allow easy comparison of similarity between two logos without some expert knowledge regarding the motif under study. Hence, decision based on visual comparison is prone to error and subjectivity. In addition, different users have rather distinct “sets of heuristic rules” when comparing two sequence logos. Therefore, the success rates of a tool based on visual inspection is flawed. Figure 3 shows an example of two logos reported in [5]. The authors claimed that their proposed tool has successfully discovered the real motif profile. However, closer examination of these two profiles reveals that, despite having the same most frequent symbol in most of their aligned positions, they have quite distinct conservation levels. Therefore, this illustrates that define a successful discovery of true motif profile can be subjected to bias interpretation. Also the use of “true sequence logo” discovery rates as demonstration of the effectiveness of a tool is a very weak support criterion because it correlates weakly to the actual sensitivity and specificity rates. For example, in Figure 3, we can see that the predicted motif profile has apparently clear high false positive or negative hits. 3) Non-comparable comparisons. We found that some articles compared sequence logo to non-compatible motif representation methods such as consensus string or frequencies plot [5, 24-25]. This kind of comparison could be misleading because they carry different amount and type of information. Also, authors are unable to fully utilize information in a sequence logo for comparison purposes. 4) Expert knowledge is needed. In computational analysis, some salient features of a motif might not be known to researchers. It seems in computational analysis, researchers’ analyses are mostly based on the most frequent symbols in each position of a motif profile to make justification that it is similar to the annotated ones.

Other less apparent features could be of importance as well but were often ignored. 5) Supporting confirmation. People’s visual perception is far from complete. Hence, comparing two visualized sequence logo is a very subjective task and depends on experience, background knowledge and perception. Scientific findings that are based solely on visual interpretations are not reliable nor systematic. It is necessary to support those arguments with quantitative data. From our analysis, a majority of the articles did support quality of the discovered motif profile using some additional information such as statistical significance test or foot- printing data. In biological transcription factor analysis, they usually employed a wet-lab technique to verify the functional of some of the predicted binding sites. Such verification is desired because it strongly supports the usefulness of a proposed analysis methodology. B. Interview with Subject Matter Experts Interview sessions were conducted with subject matter experts (SME) in Bio-technology to further understand the user needs and existing problems of sequence logo visualization. 1) Demographics of Interviewed Users Two SMEs participated in the interviews, in two one-to-one sessions for each SME. Each session was roughly about 1.5 hours. The first SME is a researcher specializing in the forest genomics and tree improvement. He has been involved in the field for more than 15 years. Currently he is studying the genome of a tree species which produces best wooden surface for furniture. The species will be selected for cultivation later. The second SME specializes in identifying RNA signatures of harmful algae systems and toxicology. She has been involved in the field for the past 10 years. Her research contribution is more towards studying the harmful algae species that could produce toxin in the sea or other living species in the sea. 2) User needs The SMEs, in different fields appeared to have different needs for the sequence visualization. SME 1s goal was to find the best gene through the comparison of different DNA in a species. Detailed information on genes alignments, bits, bases, and the like, is important to select the best genes. Normally if a set of good visualizations is provided, comparison analysis could be performed within one to two minutes. Thus, more information is needed compare to the sequence logo example that we provided for discussion (as shown in Figure 1). SME 2s goal was to understand the RNA molecular relationship for poison algae. A bundle of sequence logos will be compared in a glance. Therefore, the SME 2 did not need to have much information displayed. In fact, the output sequence logo needed to be as simple as possible so that the signatures of a motif could be observed easily. Based on the reports generated, we observed that the SME 2 group needed to compile and rearrange the results manually as the sequence logo system did not provide such an option.

Figure 3. Comparison of two motif sequence logos. The top logo is the annotated motif profile whereas the bottom logo was predicted by Trawler.

3) Existing problems occurring Throughout the interview sessions, some problems found in relation to the research needs were documented: inflexibility in

2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES) information customization: Existing sequence logo might not be able to suit the user needs as different users would require different specification for the information displayed. (b) lack of the information needed: Certain information users would still need to manually key in the report relevant to the sequence visualization generated. This could be a tedious job for the user. (c) difficulties for novice users: SME 2 mentioned her students were always facing problems in aligning the RNA sequences to generate the correct sequence logo; and (d) lack of interactivity: Not much interactivity provided for the user on this existing system. V.

DISCUSSIONS AND FUTURE WORK

Sequence logo has been very useful to visualize the signature of a TF motif. However, our document analysis revealed that researchers have a tendency to misuse it in making judgments on results obtained from computational transcription factor analysis. Two main reasons caused the biased judgments: (a) a sequence logo can only display summarized information of a motif to ease motif signature pattern identification but this does not imply anything about the quality and quantity of binding sites used in constructing a motif nor does it show the quality of the motif when one has no expert knowledge of the motif under study; (b) researcher interpretation tends to be biased toward favouring their empirical results. Therefore, they would misuse the sequence logo as a tool to make very subjective claims.

height differentiation, color standards, layout, etc., including design and development of new enhanced interface for the users. ACKNOWLEDGMENT We would like to thank Wong SS for help with some information gathering. REFERENCES [1]

[2] [3]

[4] [5]

[6]

[7]

As stated in [26], making people aware of biases associated with decision making will not mitigate them. However, requiring people to offer alternative hypotheses for an observation would reduce the degree of biases [26]. Hence, it is necessary to use a suitable decision model when employing sequence logo to support decision making or claims in a scientific study. Our study also found that current sequence logo has some limitations in its usefulness to support more objective analysis activity such as comparing the motif signatures or highlighting the similarities or differences between two logos. Definitely, improvements on motif profile visualization are necessary in order to reduce the biases in serving as decision making tool. This study has gathered indepth understanding and the special needs for targeted users. Some of the criteria below will be investigated and input into the next stage of study to propose a newly enhanced interface to the users: (a) the system should provide flexibility in features or output information customization to cater to most users; (b) the system needs to provide more options especially for important information, e.g., position, species name, etc. to be displayed. Besides, automated report generation for comparison study will be more helpful to the user than the existing system whereby significant manual compiling works needs to be done; (c) the system (if possible) should provide alerts to the user, especially the novice user on the comparison position to avoid making doing wrong comparisons; (d) the system could add extra- interactivity features to enrich the user interaction e.g. pop-up details for certain bases.

[8]

This study has provided us with a preliminary input for the existing usability issues found on the sequence visualization system. The next stage of study will involve more detailed feature comparison on interface design itself, e.g. optimal

[17]

[9]

[10]

[11]

[12] [13]

[14]

[15]

[16]

T. D. Schneider and R. M. Stephens, “Sequence logos: a new way to display consensus sequences.” Nucleic Acids Research, vol. 18, no. 20, pp. 6097–6100, 1990. G. D. Stormo, “DNA binding sites: representation and discovery,” Bioinformatics, vol. 16, no. 1, pp. 16–23, 2000. W. W. Wasserman and A. Sandelin, “Applied bioinformatics for the identification of regulatory elements,” Nature Reviews Genetics, vol. 5, no. 4, pp. 276–287, 2004. R. S. Nickerson, “Confirmation bias: A ubiquitous phenomenon in many guises,” Review of General Psychology, vol. 2, pp. 175–220, 1998. L. Ettwiller, B. Paten, M. Ramialison, E. Birney, and J. Wittbrodt, “Trawler:de novo regulatory motif discovery pipeline for chromatin immunoprecipitation,” Nature Biotechnology, vol. 4, no. 7, pp. 563–565, 2007. S. Bresciani and M. J. Eppler, “The risks of visualization,” Institute for Corporate Communicatin, Faculty of Communication Sciences, University of Lugano (USI), ICA Working Paper No 1/2008, 2008. E. Bindewald, T. D. Schneider, and B. A. Shapiro, “Correlogo: an online server for 3d sequence logos of rna and dna alignments.” Nucleic Acids Research, vol. 34, no. Web Server issue, pp. W405–W411, 2006. T.-H. Chang, J.-T. Horng, and H.-D. Huang, “Rnalogo: a new approach to display structural rna alignment.” Nucleic Acids Research, vol. 36, no. Web Server issue, pp. W91–W96, 2008. V. Vacic, L. M. Iakoucheva, and P. Radivojac, “Two sample logo: a graphical representation of the differences between two sets of sequence alignments.” Bioinformatics, vol. 22, no. 12, pp. 1536–1537, 2006. C. Berry, S. Hannenhalli, J. Leipzig, and F. D. Bushman, “Selection of target sites for mobile dna integration in the human genome.” PLoS Computational Biology, vol. 2, no. 11, p. e157, 2006. C. T. Workman, Y. Yin, D. L. Corcoran, T. Ideker, G. D. Stormo, and P. V. Benos, “enologos: a versatile web tool for energy normalized sequence logos.” Nucleic Acids Res, vol. 33, no. Web Server issue, pp. W389–W392, 2005. G. D. Stormo, “DNA binding sites: representation and discovery,” Bioinformatics, vol. 1, pp. 16–23, 2000. L. Li, “Gadem: A genetic algorithm guided formation of spaced dyads coupled with an em algorithm for motif discovery,” Journal of Computational Biology, vol. 16, no. 2, pp. 317–329, 2009. J. S. Carroll, C. A. Meyer, J. Song, W. Li, T. R. Geistlinger, J. Eeckhoute, A. S. Brodsky, E. K. Keeton, K. C. Fertuck, G. F. Hall, Q. Wang, S. Bekiranov, V. Sementchenko, E. A. Fox, P. A. Silver, T. R. Gingeras, X. S. Liu, and M. Brown, “Genome-wide analysis of estrogen receptor binding sites,” Nature Genetics, vol. 38, no. 11, pp. 1289–1297, 2006. F. Vallania, D. Schiavone, S. Dewilde, E. Pupo, S. Garbay, R. Calogero, M. Pontoglio, P. Provero, and V. Poli, “Genome-wide discovery of functional transcription factor binding sites by comparative genomics: the case of stat3.” Proceedings of the National Academy of Sciences, vol. 106, no. 13, pp. 5117–5122, 2009. A. Moses, D. Chiang, M. Kellis, E. Lander, and M. Eisen, “Position specific variation in the rate of evolution in transcription factor binding sites,” BMC Evolutionary Biology, vol. 3, no. 1, p. 19, 2003. C. D. Wickens, J. Lee, Y. D. Liu, and S. Gordon-Becker, Introduction to Human Factors Engineering, 2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2003.

2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES) [18] D. Kahneman and A. Tversky, “On the psychology of prediction,” Psychological Review, vol. 80, no. 4, pp. 237–251, 1973. [19] M. B. Cook and H. S. Smallman, “Human factors of the confirmation bias in intelligence analysis: decision support from graphical evidence landscapes.” Human Factors, vol. 50, no. 5, pp. 745–754, 2008. [20] J. Scholtz, “Beyond usability: Evaluation aspects of visual analytic environments,” Symposium On Visual Analytics Science And Technology, vol. 0, pp. 145–150, 2006. [21] C. M. D. S. Freitas, P. R. G. Luzzardi, R. A. Cava, M. Winckler, M. S. Pimenta, and L. P. Nedel, “On evaluating information visualization techniques,” in Proceedings of the Working Conference on Advanced Visual Interfaces, ser. AVI ’02. NY, USA: ACM, 2002, pp. 373–374. [22] J. F. Coley, “Visualizing subjective decision criteria for system evaluation,” Master’s thesis, MIT Electrical Engineering and Computer Science, 2010. [23] Y. Haudry, M. Ramialison, B. Paten, J. Wittbrodt, and L. Ettwiller, “Using trawler standalone to discover overrepresented motifs in dna and

rna sequences derived from various experiments including chromatin immunoprecipitation.” Nat Protoc, vol. 5, no. 2, pp. 323–334, 2010. [24] P. Hatzis, L. G. van der Flier, M. A. van Driel, V. Guryev, F. Nielsen, S. Denissov, I. J. Nijman, J. Koster, E. E. Santo, W. Welboren, R. Versteeg, E. Cuppen, M. van de Wetering, H. Clevers, and H. G. Stunnenberg, “Genome-wide pattern of tcf7l2/tcf4 chromatin occupancy in colorectal cancer cells.” Molecular Cell Biology, vol. 28, no. 8, pp. 2732–2744, 2008. [25] G.-H. Wei, G. Badis, M. F. Berger, T. Kivioja, K. Palin, M. Enge, M. Bonke, A. Jolma, M. Varjosalo, A. R. Gehrke, J. Yan, S. Talukder, M. Turunen, M. Taipale, H. G. Stunnenberg, E. Ukkonen, T. R. Hughes, M. L. Bulyk, and J. Taipale, “Genome-wide analysis of ets-family dnabinding in vitro and in vivo.” EMBO Journal, vol. 29, no. 13, pp. 2147– 2160, 2010. [26] P. E. Lehner, L. Adelman, R. J. DiStasio, M. C. Erie, J. S. Mittel, and S. L. Olson, “Confirmation bias in the analysis of remote sensing data,” IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, vol. 39, no. 1, pp. 218–226, 2009.