Provenance-Enabled Data Exploration and Visualization ... - inf.ufrgs

0 downloads 0 Views 4MB Size Report
The tutorial will also discuss uses of provenance that go beyond the ability to .... for capturing, modeling, storing and querying provenance. (e.g., [54], [55], [50], ... access for the attendees. IV. TUTORIAL .... [10] “Microsoft Workflow Foundation,”.
Provenance-Enabled Data Exploration and Visualization with VisTrails Sibgrapi 2010 — Tutorial Submission Cl´audio Silva, Juliana Freire, Emanuele Santos and Erik Anderson Scientific Computing and Imaging Institute & School of Computing University of Utah Salt Lake City, USA Email: {csilva, juliana, emanuele, eranders}@sci.utah.edu

Tutorial Type: Both regular and hands-on lab. Level: Elementary to advanced. Abstract—Scientists are now faced with an incredible volume of data to analyze. To explore and understand the data, they need to assemble complex workflows (pipelines) to manipulate the data and create insightful visual representations. Provenance is essential in this process. The provenance of a digital artifact contains information about the process and data used to derive the artifact. This information is essential for preserving the data, for determining the data’s quality and authorship, for both reproducing and validating results – all important elements of the scientific process. Provenance has shown to be particularly useful for enabling comparative visualization and data analysis. This tutorial will inform computational and visualization scientists, users and developers about different approaches to provenance and the trade-offs among them. Using the VisTrails project as a basis, we will cover different approaches to acquiring and reusing provenance, including techniques that attendees can use for provenance-enabling their own tools. The tutorial will also discuss uses of provenance that go beyond the ability to reproduce and share results. Keywords-visualization system; provenance; comparative visualization;

I. T UTORIAL D ESCRIPTION Computing has been an enormous accelerator to science and has led to an information explosion in many different fields. To analyze and understand scientific data, complex computational processes must be assembled, often requiring the combination of loosely-coupled resources, specialized libraries, and grid and Web services. These processes may generate even more final and intermediate data products, adding to the overflow of information scientists need to deal with. Ad-hoc approaches to data exploration (e.g., Perl scripts) have been widely used in the scientific community, but have serious limitations. In particular, scientists and engineers need to expend substantial effort managing data (e.g., scripts that encode computational tasks, raw data, data products, and notes) and recording provenance information so that basic questions can be answered, such as: Who created this data product and when? When was it modified and by whom? What was the process used to create the data product? Were two data products derived from the same raw

data? Not only is the process time-consuming, but also errorprone. In the visualization domain the situation is not different. Although scientific visualization has matured and now is an indispensable tool for scientists and engineers, they still lack mechanisms to help them not only share visualizations but also share and reuse information about the process used to generate those visualizations. Examples of widely used visualization tools are ParaView [1] and VisIt [2]. ParaView [1] is an an open-source, multi-platform application designed to visualize data sets of size varying from small to very large. It is very popular and it has been used by researchers and engineers in both industry and academia. VisIt [2] is another free visualization and graphical analysis tool for viewing scientific data. Recently, workflow systems [3], [4], [5], [6], [7], [8], [9], [10], [11], [12] have emerged as an alternative to these ad-hoc approaches. Not only do they support the automation of repetitive tasks, but they can also capture complex analysis processes at various levels of detail and systematically capture provenance information for the derived data products [13]. The provenance (also referred to as audit trail, lineage, and pedigree) of a data product contains information about the process and data used to derive the data product. It provides important documentation that is key to preserving the data, to determining the data’s quality and authorship, and to reproduce as well as validate the results. These are all important elements of the scientific process. Provenance in scientific exploration and visualization is thus of paramount and increasing importance, as evidenced by recent specialized workshops [14], [15], [13], [16], [17], [18] and surveys [19], [20], [21], [22]. Provenance has also shown to be particularly useful for enabling comparative visualization and data analysis [23], [24], [25]. Our goal with this tutorial is to inform computational and visualization scientists, users and developers about different approaches to provenance and the trade-offs among them. The instructors will use the open-source VisTrails system1 (Figure 1) and a series of examples from real applications to 1 www.vistrails.org

demonstrate different approaches to acquiring and reusing provenance, including techniques that attendees can use for provenance-enabling their own tools. VisTrails is a provenance management and scientific workflow system that was designed to support the scientific discovery process. VisTrails provides unique support for data analysis and visualization, a comprehensive provenance infrastructure, and a user-centered design. The system combines and substantially extends useful features of visualization and scientific workflow systems. A beta version of VisTrails was first released in January 2007. Since then, the system has been downloaded over fifteen thousand times. The VisTrails wiki has had over 339,000 page views, and Google Analytics reports that visitors to the site come from 61 different countries. VisTrails has been adopted in several scientific projects, both nationally and internationally, and in different areas, including Environmental Sciences [26], [27], [28], [29], [30], Psychiatry [31], Astronomy [32], Cosmology [25], HighEnergy Physics [24], molecular modeling [33]. VisTrails is being combined with CDAT (Climate Data Analysis Tools) [30], a popular platform for accessing and analyzing climate data2 available in major portals such as the Earth System Grid.3 VisTrails has been successfully used as a tool for teaching [34], [35] and it has been adopted at universities in the United States and abroad. VisTrails has recently been featured as an NSF Discovery.4 In this tutorial we will also discuss uses of provenance that go beyond the ability to reproduce and share results. We will we will cover different approaches to capture and reuse provenance. We will show how the availability of provenance supports reflective reasoning as users perform exploratory tasks, including the ability to navigate through versions of analysis and visualization pipelines in an intuitive way, to undo changes but not lose any results, to visually compare different pipelines and their results, and to examine the actions that led to a result. We will also present a series of operations and user interfaces that leverage provenance simplify pipeline design, use and sharing, as well as the publication of documented, reproducible results. Last, but not least, we will discuss techniques and present an API that attendees can use to provenance-enable their own tools. A. Tutorial History This tutorial is a modified version of five other tutorials, focusing on specific aspects of provenance that are relevant for visualization scientists, users and developers: • Provenance-Enabled Data Exploration and Visualization Emanuele Santos, Cl´audio Silva, Juliana Freire, Erik Anderson and David Koop. Tutorial at the 2 http://www-pcmdi.llnl.gov/software-portal/cdat 3 https://www.earthsystemgrid.org 4 http://www.nsf.gov/discoveries/disc summ.jsp?org=NSF& cntn id=114322&preview=false

Figure 1.

Provenance in the VisTrails system.

IEEE VisWeek 2009. Atlantic City, October 2009. http://www.vistrails.org/index.php/Tutorials/Vis2009. • Streamlining Data Exploration and Visualization through Scientific Workflows and Provenance. Juliana Freire, Emanuele Santos and Cl´audio Silva. Tutorial at the 4th IEEE International Conference on e-Science. Indianapolis, Indiana, December 2008. • Visualization and Data Analysis with VisTrails. Cl´audio Silva and Carlos Scheidegger. Tutorial at the Scientific Discovery through Advanced Computing Program (SciDAC), Seattle, Washington, July 2008. • Provenance and Scientific Workflows: Challenges and Opportunities. Susan Davidson and Juliana Freire. Tutorial at the ACM International Conference on Management of Data (SIGMOD), Vancouver, Canada, June 2008. http://www.vistrails.org/index.php?title=Tutorials/SIGMOD2008

• Provenance Management: Challenges and Opportunities. Juliana Freire and Cl´audio Silva. Invited tutorial at Brazilian Symposium on Databases. Jo˜ao Pessoa, Brazil, October 2007. http://www.vistrails.org/index.php?title=Tutorials/SBBD2007

New material in this tutorial includes: interactive demos and hands-on material with a larger focus on data exploration and visualization; provenance-enabling third-party tools and new applications that are enabled by the availability of provenance. II. C ONTENTS AND S CHEDULE This 1-day tutorial will consist of four blocks, each of about 90 minute length. Module I: Motivation and Provenance Basics • What is Provenance? Why is it important? How can Provenance be leveraged? • Background: Visualization Systems • Background: Scientific Workflow Basics

• Provenance in Scientific Workflow Systems • Provenance in Databases • The Open Provenance Model [36] We discuss the motivation for and importance of provenance for computation and visualization tasks. We also present real scenarios in different application domains (including environmental observatories, forecasting systems, and clinical studies) to illustrate the need for and benefits of provenance. We then review work on visualization systems (e.g., [1], [2], [8], [37], [5], [38], [39]) and provenance for data products that are derived by complex computational tasks modeled as workflows (e.g., [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50]), and contrast these to recent work on provenance for structured data [51], [52], [53]. We survey approaches to provenance adopted by scientific workflow systems. We present and compare different proposals for capturing, modeling, storing and querying provenance (e.g., [54], [55], [50], [56], [57], [22], [21]). Module II: Provenance for Data Exploration and Visualization

Figure 2.

Provenance-enabled VisIt.

• Provenance-enabling your own tool We show how to wrap libraries and command-line tools in VisTrails so they can benefit from VisTrails’ provenance mechanism. We also show how existing interactive tools can be (easily) extended to support provenance. We will present case studies on adding provenance to two distinct tools: VisIt [2] (see Figure 2) and ParaView [1] (see Figure 3). We will also describe guidelines on how to add provenance to user-custom tools. Module IV: New Applications

• Building visualizations • Operations that support exploration: scalable exploration of large parameter spaces; comparison of data products and their corresponding workflows [58], [59]. • Operations that support visualization design: pipeline refinement by analogy [54]; a recommendation system for visualization pipelines [60]. • Provenance analytics: Techniques to support provenance mining and illustration through examples of useful information that can be extracted from provenance ([61], [62]). • Provenance for collaboration [63]. We show how provenance can be used to simplify exploratory processes. In particular, we give examples that show how to build pipelines and combine visualization with other tasks [32]. We also present mechanisms that allow the flexible re-use of visualization pipelines; scalable exploration of large parameter spaces; and comparison of data products as well as their corresponding pipelines [58], [59]. We also show that useful knowledge is embedded in provenance which can be re-used to simplify the construction of visualization through pipeline refinement by analogy [54] and an auto-completion mechanism for visualization pipelines [60]. Last, but not least, we present techniques to support provenance analytics and illustrate through examples, useful information that can be extracted from provenance (see e.g., [61], [62]).

• Science 2.0: collaborative, open science • Provenance-rich publications • Visualization Mashups • Provenance in teaching We describe applications that are enabled by the availability of provenance. We demonstrate how provenancerich objects can be included in documents. We also show how how the VisTrails infrastructure can be used to create provenance-aware custom visualization applications called visualization mashups [64]. We also explore the benefits of provenance in teaching [34].

Module III: Provenance-Enabling Existing Libraries and Tools

V. I NSTRUCTORS BACKGROUND

• Provenance-enabling libraries and command-line tools: how to write VisTrails packages • Provenance-enabling ParaView • Provenance-enabling VisIt

III. P RESENTATION REQUIREMENTS Equipment and software: a digital projector for the instructors and computers with VisTrails installed and Internet access for the attendees. IV. T UTORIAL R ESOURCES This tutorial will provide the slides used on each module, examples and datasets used on the hands-on parts and a survey paper. We will also make available the related articles. We will maintain a Wiki page at http://www.vistrails.org/index.php/Tutorials/Sibgrapi2010 that will contain the most up-to-date information about the tutorial. For an example of slides used in previous versions of this tutorial, please visit the url above. Cl´audio Silva received the BS degree in mathematics from the Federal University of Ceara, Brazil, in 1990, and the PhD degree in computer science from the State University of New York at Stony Brook in 1996. He is a Professor of computer science and a faculty member of the Scientific

workflow and provenance management system.

Figure 3.

Provenance-enabled ParaView.

Computing and Imaging (SCI) Institute at the University of Utah. Before joining Utah in 2003, he worked in industry (IBM and AT&T), government (Sandia and LLNL), and academia (Stony Brook and OGI). He has co-authored over 140 technical papers and 7 patents, primarily in visualization, geometric computing, and related areas. He has served as papers co-chair for IEEE Visualization conference in 2005 and 2006. He received IBM Faculty Awards in 2005, 2006 and 2007. Dr. Silva a co-creator of VisTrails (www.vistrails.org), an open-source scientific workflow and provenance management system. Juliana Freire is an Associate Professor at the School of Computing, University of Utah. Before joining the University of Utah, she was member of technical staff at the Database Systems Research Department at Bell Laboratories (Lucent Technologies) and an Assistant Professor at OGI/OHSU. She has co-authored over 100 technical publications and holds 4 U.S. patents. She has chaired or co-chaired several workshops and conference, and she has participated as a program committee member in over 60 events. Dr. Freire’s research has focused on extending traditional database technology and developing techniques to address new data management problems introduced by the Web and scientific applications. Within scientific data management, she is best known for her work on provenance management and for being a co-creator of the open-source VisTrails system. In 2008, Dr. Freire received an NSF CAREER Award and an IBM Faculty Award. Her research has been funded by grants from the National Science Foundation, Department of Energy and the University of Utah. Emanuele Santos is a research assistant and graduate student at the University of Utah. She received M.S. and B.S. degrees in computer science from the Federal University of Ceara in Brazil. Between 2002 and 2005, she lectured college-level computer science courses in Brazil. Her research interests include scientific data management, comparative visualization and provenance-rich publications. She is a Fulbright scholar and one of the main developers of VisTrails (www.vistrails.org), an open-source scientific

Erik W. Anderson is a research assistant and graduate student at the University of Utah. He received a B.S. degree in computer science and a B.S. degree in electrical and computer engineering from Northeastern University. His research interests include scientific visualization, signal processing, computer graphics, and multimodal visualization. He is regularly holding seminars on neuroscience analysis and visualization at the SCI Institute. He is also one of the main developers of VisTrails (www.vistrails.org), an open-source scientific workflow and provenance management system. R EFERENCES [1] Kitware, “Paraview,” http://www.paraview.org. [2] “VisIt Visualization Tool,” https://wci.llnl.gov/codes/visit. [3] “The Kepler Project,” http://kepler-project.org. [4] “The Taverna Project,” http://taverna.sourceforge.net. [5] S. G. Parker and C. R. Johnson, “SCIRun: a scientific programming environment for computational steering,” in Supercomputing, 1995, p. 52. [6] “The Triana Project,” http://www.trianacode.org. [7] “VDS - The GriPhyN Virtual Data System,” http://www.ci.uchicago.edu/wiki/bin/view/VDS/ VDSWeb/WebMain. [8] “The VisTrails Project,” http://www.vistrails.org. [9] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, “Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems,” Scientific Programming Journal, vol. 13, no. 3, pp. 219–237, 2005. [10] “Microsoft Workflow Foundation,” http://msdn2.microsoft.com/en-us/netframework/ aa663322.aspx. [11] I. Foster, J. Voeckler, M. Wilde, and Y. Zhao, “Chimera: A virtual data system for representing, querying and automating data derivation,” in Proceedings of SSDBM, 2002, pp. 37–46. [12] Y. L. Simmhan, B. Plale, D. Gannon, and S. Marru, “Performance evaluation of the karma provenance framework for scientific workflows,” in International Provenance and Annotation Workshop (IPAW), Chicago, IL, ser. Lecture Notes in Computer Science, L. Moreau and I. T. Foster, Eds., vol. 4145. Springer, 2006, pp. 222–236. [Online]. Available: http://www.cs.indiana.edu/ ysimmhan/l/pubs/simmhanlncs-2006.pdf [13] E. Deelman and Y. Gil, “NSF Workshop on Challenges of Scientific Workflows,” NSF, Tech. Rep., 2006, http://vtcpc.isi.edu/wiki/index.php/Main Page.

[14] R. Bose, I. Foster, and L. Moreau, “Report on the International Provenance and Annotation Workshop,” SIGMOD Rec., vol. 35(3), pp. 51–53, 2006. [15] J. Freire, L. Moreau, and D. Koop, Eds., Provenance and Annotation of Data - International Provenance and Annotation Workshop, vol. 5272. Springer-Verlag, 2008, to appear. [16] D. Gannon et al., “A Workshop on Scientific and Scholarly Workflow Cyberinfrastructure: Improving Interoperability, Sustainability and Platform Convergence in Scientific And Scholarly Workflow.” NSF and Mellon Foundation, Tech. Rep., 2007, https://spaces.internet2.edu/display/SciSchWorkflow. [17] “First provenance http://twiki.ipaw.info/bin/view/Challenge/ nanceChallenge, 2006, s. Miles, and (organizers).

challenge,” FirstProveL. Moreau

provenance challenge,” [18] “Second http://twiki.ipaw.info/bin/view/Challenge/ SecondProvenanceChallenge, 2007, j. Freire, S. Miles, and L. Moreau (organizers).

[28] B. Howe, P. Lawson, R. Bellinger, E. Anderson, E. Santos, J. Freire, C. Scheidegger, A. Baptista, and C. Silva, “End-toend escience: Integrating workflow, query, visualization, and provenance at an ocean observatory,” in IEEE International Conference on e-Science 2008, 2008, pp. 127–134. [29] A. Baptista, B. Howe, J. Freire, D. Maier, and C. T. Silva, “Scientific exploration in the era of ocean observatories,” Computing in Science and Engineering, vol. 10, no. 3, pp. 53–58, 2008. [30] “CDAT Newsletter: CDAT v5.0 - Highlights,” http://www-pcmdi.llnl.gov/softwareportal/Newsletter/Vol3/news.html, June 2007. [31] E. W. Anderson, G. A. Preston, and C. T. Silva, “Towards development of a circuit based treatment for impaired memory: A multidisciplinary approach,” in IEEE EMBS Neural Engineering, 2007. [32] J. E. Tohline, J. Ge, W. Even, and E. Anderson, “A customized python module for cfd flow analysis within vistrails,” Computing in Science and Engineering, vol. 11, no. 3, pp. 68–73, 2009.

[19] J. Freire, D. Koop, E. Santos, and C. T. Silva, “Provenance for computational tasks: A survey,” Computing in Science and Engineering, vol. 10, no. 3, pp. 11–21, 2008.

[33] R. Heiland, M. Swat, B. Zaitlen, J. A. Glazier, and A. Lumsdaine, “Workflows for Parameter Studies of Multi-Cell Modeling,” in Proceedings of the HPC 2010 Symposium of the Spring Simulation Multiconference 2010, 2010, pp. 140–145.

[20] S. B. Davidson, S. C. Boulakia, A. Eyal, B. Lud¨ascher, T. M. McPhillips, S. Bowers, M. K. Anand, and J. Freire, “Provenance in scientific workflow systems,” IEEE Data Eng. Bull., vol. 30, no. 4, pp. 44–50, 2007.

[34] C. Silva, E. Anderson, E. Santos, and J. Freire, “Using vistrails and provenance for teaching scientific visualization,” in Proceedings of Eurographics 2010 Education Program, L. Kjelldahl and G. Baronoski, Eds., 2010, (to appear).

[21] R. Bose and J. Frew, “Lineage retrieval for scientific data processing: a survey,” ACM Computing Surveys, vol. 37, no. 1, pp. 1–28, 2005.

[35] M. C. van Langeveld and R. Kessler, “Educational impact of digital visualization and auditing tools on a digital character production course,” in FDG, 2009, pp. 316–323.

[22] Y. L. Simmhan, B. Plale, and D. Gannon, “A survey of data provenance in e-science,” SIGMOD Record, vol. 34, no. 3, pp. 31–36, 2005.

[36] L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson, “The open provenance model: An overview,” in IPAW, 2008, pp. 323–326.

[23] J. E. Tohline, “Scientific Visualization: A Necessary Chore,” Computing in Science & Engineering, vol. 9, no. 6, pp. 76– 81, 2007.

[37] Kitware, “The http://www.kitware.com.

[24] A. Dolgert, L. Gibbons, C. D. Jones, V. Kuznetsov, M. Riedewald, D. Riley, G. J. Sharp, and P. Wittich, “Provenance in high-energy physics workflows,” Computing in Science & Engineering, vol. 10, no. 3, pp. 22–29, 2008. [25] E. Anderson, C. Silva, J. Ahrens, K. Heitmann, and S. Habib, “Provenance in comparative analysis: A study in cosmology,” Computing in Science & Engineering, vol. 10, no. 3, pp. 30– 37, May-June 2008. [26] B. Howe, C. Silva, and J. Freire, “A science cloud on your desktop: Vistrails + gridfields,” 2009, http://clue.cs.washington.edu. [27] “NSF Center for Coastal Margin Observation and Prediction (CMOP),” http://www.stccmop.org.

visualization

toolkit,”

[38] IBM, “OpenDX,” http://www.research.ibm.com/dx. [39] C. Upson, J. Thomas Faulhaber, D. Kamins, D. H. Laidlaw, D. Schlegel, J. Vroom, R. Gurwitz, and A. van Dam, “The Application Visualization System: A Computational Environment for Scientific Visualization,” IEEE Computer Graphics and Applications, vol. 9, no. 4, pp. 30–42, 1989. [40] Y. L. Simmhan, B. Plale, and D. Gannon, “Karma2: Provenance management for data driven workflows,” International Journal of Web Services Research, Idea Group Publishing, vol. 5, p. 1, 2008, to Appear. [Online]. Available: http://www.cs.indiana.edu/ ysimmhan/l/pubs/simmhanjwsr-2008.pdf [41] S. Cohen, S. C. Boulakia, and S. B. Davidson, “Towards a model of provenance and user views in scientific workflows,” in DILS, 2006, pp. 264–279.

[42] S. Callahan, J. Freire, E. Santos, C. Scheidegger, C. Silva, and H. Vo, “Managing the evolution of dataflows with vistrails (Extended Abstract),” in IEEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow), 2006.

[55] S. Cohen-Boulakia, O. Biton, S. Cohen, and S. Davidson, “Addressing the provenance challenge using zoom,” Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 497–506, 2008.

[43] Y. Zhao, M. Wilde, and I. Foster, “A virtual data provenance model by yong zhao, michael wilde, and ian foster,” in Proceedings of the International Provenance and Annotation Workshop (IPAW), 2006, pp. 148–161.

[56] L. Moreau, Ed., Concurrency and Computation: Practice and Experience– Special Issue on the First Provenance Challenge, 2008.

[44] J. Kim, E. Deelman, Y. Gil, G. Mehta, and V. Ratnakar, “Provenance trails in the wings/pegasus system,” Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 587–597, 2008. [45] J. Golbeck and J. Hendler, “A semantic web approach to tracking provenance in scientific workflows,” Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 431–439, 2008. [46] I. Altintas, O. Barney, and E. Jaeger-Frank, “Provenance collection support in the kepler scientific workflow system,” in Proceedings of the International Provenance and Annotation Workshop (IPAW), 2006, pp. 118–132. [47] T. Oinn, M. Greenwood, M. Addis, M. N. Alpdemir, J. Ferris, K. Glover, C. Goble, A. Goderis, D. Hull, D. Marvin, P. Li, P. Lord, M. R. Pocock, M. Senger, R. Stevens, A. Wipat, and C. Wroe, “Taverna: lessons in creating a workflow environment for the life sciences: Research articles,” Concurrency and Computation: Practice & Experience, vol. 18, no. 10, pp. 1067–1100, 2006. [48] B. Clifford, I. Foster, M. Hategan, T. Stef-Praun, M. Wilde, and Y. Zhao., “Tracking provenance in a virtual data grid,” Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 565–575, 2008. [49] R. S. Barga and L. A. Digiampietri, “Automatic capture and efficient storage of escience experiment provenance,” Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 419–429, 2008. [50] S. Bowers, T. McPhillips, and B. Ludaescher, “A provenance model for collection-oriented scientific workflows,” Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 519–529, 2008. [51] P. Buneman and W.Tan, “Provenance in databases,” in Proceedings of ACM SIGMOD, 2007, pp. 1171–1173. [52] W. C. Tan, “Provenance in databases: Past, current, and future,” IEEE Data Eng. Bull., vol. 30, no. 4, pp. 3–12, 2007. [53] O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom, “Uldbs: Databases with uncertainty and lineage.” in VLDB, 2006, pp. 953–964. [54] C. Scheidegger, D. Koop, H. Vo, J. Freire, and C. Silva, “Querying and creating visualizations by analogy,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1560–1567, 2007, papers from the IEEE Visualization Conference 2007.

[57] L. Moreau and I. Foster, Eds., Provenance and Annotation of Data - International Provenance and Annotation Workshop, vol. 4145. Springer-Verlag, 2006. [58] J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo, “Managing rapidly-evolving scientific workflows,” in International Provenance and Annotation Workshop (IPAW), ser. LNCS 4145, 2006, pp. 10–18, invited paper. [59] C. Silva, J. Freire, and S. P. Callahan, “Provenance for visualizations: Reproducibility and beyond,” Computing in Science & Engineering, vol. 9, no. 5, pp. 82–89, 2007. [60] D. Koop, C. Scheidegger, S. Callahan, J. Freire, and C. Silva, “Viscomplete: Data-driven suggestions for visualization systems,” IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1691–1698, 2008, papers from the IEEE Visualization Conference 2008. [61] L. Lins, D. Koop, E. W. Anderson, S. P. Callahan, E. Santos, C. E. Scheidegger, J. Freire, and C. T. Silva, “Examining statistics of workflow evolution provenance: A first study,” in Proceedings of SSDBM, 2008, pp. 573–579. [62] E. Santos, L. Lins, J. P. Ahrens, J. Freire, and C. T. Silva, “A first study on clustering collections of workflow graphs,” in IPAW, 2008, pp. 160–173. [63] T. Ellkvist, D. Koop, E. W. Anderson, J. Freire, and C. T. Silva, “Using provenance to support real-time collaborative design of workflows,” in IPAW, 2008, pp. 266–279. [64] E. Santos, L. Lins, J. Ahrens, J. Freire, and C. Silva, “Vismashup: Streamlining the creation of custom visualization applications,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1539–1546, 2009.