Jacobs University Bremen – School of Engineering and Science
Ph. D. Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science
Semantic Web Collaboration on Semiformal Mathematical Knowledge Christoph Lange Submitted: DD. MM. 2010
Dissertation Committee: Prof. Dr. Michael Kohlhase, Jacobs University Bremen (supervisor) Prof. Dr. Peter Baumann, Jacobs University Bremen Prof. Dr. Stefan Decker, National University of Ireland, Galway
I hereby declare that this thesis has been written independently except where sources and collaborations are acknowledged and has not been submitted at another university for the conferral of a degree. @MK: references to publications needed?
As the chapters are organized by topic, whose sections cover diverse subtopics, reviews of the state of the art and related work of others closely precede the descriptions of my own research on that, but are marked accordingly. I will report of work that I have done myself in the first person singular. Work done in collaboration with others will be reported in the first person plural; the exact contributions of collaborators will be acknowledged at the end of each chapter. Christoph Lange Notes for readers of the draft: The level of detail in the table of contents is intended to facilitate navigation. This is not the final layout.
ii
@MK: copied from FR – OK?
Contents 1 Introduction 1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Degrees of Formality . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Mathematical Knowledge . . . . . . . . . . . . . . . . . . . . 1.2 Formal and Informal Mathematics on the Web and its Applications 1.2.1 Mathematical Knowledge Management . . . . . . . . . . . . 1.3 A Semantic Web for Mathematics . . . . . . . . . . . . . . . . . . . . 1.3.1 Earlier Approaches . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
2 Representing Semiformal Mathematical Knowledge 2.1 Foundations of the Semantic Web (State of the Art) . . . . . . . . . . . . . . . . . 2.1.1 URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2.1 Syntax/Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2.2 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2.3 Authoring/Processing Tools . . . . . . . . . . . . . . . . . . . . 2.1.3 RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Structures of Mathematical Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Logical and Functional Structures . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Rhetorical Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Document Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Graph and Tree Models of Mathematical Structures . . . . . . . . . . . . . 2.2.5 Presentational Information: Notation of Symbols . . . . . . . . . . . . . . 2.2.6 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6.1 Categories of Metadata . . . . . . . . . . . . . . . . . . . . . . . 2.2.6.2 Usage of Metadata in Mathematical Knowledge . . . . . . . . . 2.2.6.3 Metadata Vocabularies . . . . . . . . . . . . . . . . . . . . . . . 2.2.7 Environmental Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Formats for Semiformal Mathematical Knowledge Representation (State of the Art) 2.3.1 MathML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 OpenMath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 OMDoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Notation Definitions in MathML, OpenMath, and OMDoc . . . . . . . . 2.3.4.1 XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4.2 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 2 3 3 4 6 6 9 9 9 10 10 11 11 12 15 15 15 16 17 17 18 18 19 19 19 22 22 22 23 25 27 28 28
iii
Contents
2.4
2.3.4.3 Declarative Notation Definitions . . . . . . . . . . . . . . . . . . 2.3.4.4 Pattern Matching vs. Declarations . . . . . . . . . . . . . . . . . 2.3.4.5 Brackets and Operator Precedences . . . . . . . . . . . . . . . . 2.3.5 MathLang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 CNXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 LATEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7.1 sTeX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.8 RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.9 Formats for Technical Manuals . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.10 Formats for Formalized Mathematics . . . . . . . . . . . . . . . . . . . . . Making Semiformal Markup Semantic . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1.1 Logical and Functional Structures . . . . . . . . . . . . . . . . . 2.4.1.2 Rhetorical and Document Structures . . . . . . . . . . . . . . . 2.4.1.3 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1.4 Environmental Structures . . . . . . . . . . . . . . . . . . . . . . 2.4.1.5 Upper Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Extracting Structures from Markup . . . . . . . . . . . . . . . . . . . . . . 2.4.3 OMDoc as an ontology language . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3.2 Correspondences between OMDoc and semantic web ontology languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3.3 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . 2.4.3.4 Connecting OMDoc and Semantic Web URIs . . . . . . . . . . 2.4.3.5 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3.6 Documentation and Presentation . . . . . . . . . . . . . . . . . 2.4.4 Markup for Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4.2 Metadata in OMDoc 1.2 (State of the Art) . . . . . . . . . . . . 2.4.4.3 The new OMDoc Metadata Framework . . . . . . . . . . . . . . 2.4.5 Preserving Semantic Structures when Publishing . . . . . . . . . . . . . . 2.4.5.1 Preserving Object-Level Structures . . . . . . . . . . . . . . . . 2.4.5.2 Preserving Statement-/Theory-/Document-Level Structures . .
3 Services for Mathematical Knowledge Management 3.1 Browsing . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Rendering . . . . . . . . . . . . . . . . . . . . 3.1.1.1 Rendering Formulæ . . . . . . . . 3.1.1.2 Rendering Notation Definitions . 3.1.1.3 Rendering Non-Formula Markup 3.1.2 Navigation . . . . . . . . . . . . . . . . . . . . 3.1.3 Interactive Exploration . . . . . . . . . . . . 3.2 Arguing . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 State of the Art . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
29 30 30 31 32 32 33 34 35 35 35 37 38 45 45 46 46 46 47 48 50 50 51 53 54 55 55 56 58 65 65 70 71 71 71 71 72 73 74 74 74 75
iv
Contents
3.3
3.4 3.5 3.6
3.7
3.2.1.1 Bug and Issue Tracking . . . . . . . . . . . . . . . . . . . . 3.2.1.2 Argumentation Ontologies . . . . . . . . . . . . . . . . . . 3.2.2 The SIOC Argumentation Module . . . . . . . . . . . . . . . . . . . . 3.2.2.1 Usage Recommendations . . . . . . . . . . . . . . . . . . . 3.2.3 Domain-specific Extensions . . . . . . . . . . . . . . . . . . . . . . . 3.2.3.1 A Survey on Issues in Mathematics . . . . . . . . . . . . . 3.2.4 Automated Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Manifestation of Discourse into Documents . . . . . . . . . . . . . . Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1.1 Raw Access with Support . . . . . . . . . . . . . . . . . . . 3.3.1.2 Custom Input Syntax . . . . . . . . . . . . . . . . . . . . . 3.3.1.3 WYSIWYG/WYSIWYM . . . . . . . . . . . . . . . . . . . 3.3.1.4 Editing Large Structured Documents . . . . . . . . . . . . 3.3.2 Statements, Theories, Rhetorics, and Document Sections . . . . . . . 3.3.3 Formulæ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Notation Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Validating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Searching and Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integrating Services into Interactive Documents . . . . . . . . . . . . . . . . 3.6.1 Related Work and Motivation . . . . . . . . . . . . . . . . . . . . . . 3.6.2 The JOBAD Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2.1 Service Advertisement . . . . . . . . . . . . . . . . . . . . . 3.6.3 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.4 In-Document Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.4.1 Folding Subterms and Undoing Interactions . . . . . . . . 3.6.4.2 Flexible Elision and Display of Reading Aids . . . . . . . . 3.6.5 Symbol-Based Services . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.5.1 Definition Lookup . . . . . . . . . . . . . . . . . . . . . . . 3.6.5.2 Interactive Notation Switching . . . . . . . . . . . . . . . . 3.6.6 Expression-Based Services . . . . . . . . . . . . . . . . . . . . . . . . 3.6.6.1 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.6.2 Unit Conversion . . . . . . . . . . . . . . . . . . . . . . . . 3.6.7 Services Beyond Formulæ . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.8 Integrated Backends and Environments . . . . . . . . . . . . . . . . . 3.6.8.1 Proxy to Third-Party Web Services . . . . . . . . . . . . . 3.6.8.2 Integrated Backend Implementations . . . . . . . . . . . . 3.6.9 Further Possible Services . . . . . . . . . . . . . . . . . . . . . . . . . Integration with Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Storage Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Import/Export Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2.1 Translating between different Knowledge Representations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75 76 77 79 80 81 83 83 83 83 84 84 86 86 87 87 89 90 90 91 92 92 93 94 95 95 96 97 97 101 101 105 105 106 106 107 107 108 108 108 108 108 109 110
v
Contents
3.7.3
3.7.2.2 Splitting and Reassembling Documents . . . . . . . . . . . . 3.7.2.3 An Advanced Import/Export Infrastructure . . . . . . . . . Extracting Structures from Semantic Markup . . . . . . . . . . . . . . 3.7.3.1 Krextor – an extensible XML→RDF extraction framework . 3.7.3.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 SWiM – An Integrated Collaboration Environment 4.1 Tools for Math Collaboration (State of the Art) . . . . . . . 4.1.1 PlatΩ . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 PlanetMath . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Connexions . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 vdash . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Proof Wiki . . . . . . . . . . . . . . . . . . . . . . . . 4.1.6 Logiweb . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.7 ASciencePad . . . . . . . . . . . . . . . . . . . . . . 4.1.8 (web)Mathematica . . . . . . . . . . . . . . . . . . . 4.1.9 SlugMath . . . . . . . . . . . . . . . . . . . . . . . . 4.1.10 Math-Net . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Wikis (State of the Art) . . . . . . . . . . . . . . . . . . . . . 4.2.1 Reporting and Discussing Issues . . . . . . . . . . . 4.2.2 Semantic Wikis . . . . . . . . . . . . . . . . . . . . . 4.2.2.1 Semantic Discussion Threads with SIOC 4.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 IkeWiki, the Underlying System . . . . . . . . . . . 4.3.2 Storage Backend . . . . . . . . . . . . . . . . . . . . 4.3.2.1 Document Storage . . . . . . . . . . . . . 4.3.2.2 Other (Notations, RDF) . . . . . . . . . . 4.3.2.3 Subversion Integration . . . . . . . . . . 4.3.3 Structure Extraction . . . . . . . . . . . . . . . . . . 4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Browser . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Editor . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Argumentation . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. 110 . 111 . 111 . 112 . 114 . 115
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
117 117 117 117 118 118 118 118 118 118 118 118 119 119 121 122 123 123 123 123 124 125 126 127 127 127 127
5 Case Studies 5.1 OpenMath Content Dictionary Wiki . . . . . . . . . . . . . . . 5.1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Traditional Ways of Working on CDs . . . . . . . . . . 5.1.2.1 Minor Edits . . . . . . . . . . . . . . . . . . . 5.1.2.2 Discussing and Implementing Revisions . . 5.1.2.3 Editing and Verifying Notations . . . . . . . 5.1.3 How SWiM Supports the CD Maintenance Use Cases 5.1.3.1 Minor Edits . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
133 133 133 133 135 135 136 136 136
. . . . . . . . . . . . . . . . . . . . . . . . . .
vi
Contents
5.2 5.3 5.4
5.1.3.2 Discussing and Implementing Revisions . . . . . . . 5.1.3.3 Editing and Verifying Notations . . . . . . . . . . . . 5.1.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4.1 Quantitative Analysis of the Argumentation Support 5.1.4.2 User Survey . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4.3 Personal Experiments . . . . . . . . . . . . . . . . . . Semantic Web Ontology Engineering . . . . . . . . . . . . . . . . . . . . Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 RDFa Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 JOBAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Argumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Conclusion 6.1 RDF to XML . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 JOBAD . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Argumentation . . . . . . . . . . . . . . . . . . . . . . . 6.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Editing . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Knowledge Representation . . . . . . . . . . . 6.5.3 Ontology Engineering . . . . . . . . . . . . . . 6.5.4 Scientific Publishing . . . . . . . . . . . . . . . 6.5.5 Argumentation . . . . . . . . . . . . . . . . . . 6.5.6 Change Management . . . . . . . . . . . . . . 6.5.7 Social Software . . . . . . . . . . . . . . . . . . 6.5.8 Integration with other Mathematical Systems 6.5.8.1 JOBAD Future . . . . . . . . . . . . A Namespace Prefixes
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
137 138 138 138 139 139 140 142 142 143 143 144 145
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
147 147 147 148 148 149 149 150 150 151 152 153 153 153 154 155
B Surveys 157 B.1 Reporting and Solving Issues with Mathematical Knowledge Items . . . . . . . . . 157 Bibliography
158
vii
List of Figures 1.1 1.2
2.1 2.2 2.3 2.4
TODO: this is my challenge for combining SemWeb and MKM . . . . . . . . . . TODO: the following chapters in a big picture; solid: done; dashed: partly done; dotted: future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The (in)famous Semantic Web Layer Cake . . . . . . . . . . . . . . . . . . . . . . . Definition of the plus symbol within the arith1 CD . . . . . . . . . . . . . . . . . . Main classes and properties of the SIOC Core ontology [Ber+09] . . . . . . . . . Parallel markup: Presentation markup elements point to content markup elements. The light gray range is the user’s selection, with the start and end node in bold face. We first look up their closest common ancestor that points to content markup, and then look up its corresponding content markup – here: E.2 . . .
7 8 10 26 46
67
Rendered notation definitions for the arith1#plus symbol of OpenMath, from the OpenMath wiki at http://wiki.openmath.org/?title=ntn:arith1 [Lan]. . . . 73 3.2 The SIOC argumentation module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3 The SIOC/OMDoc argumentation ontology . . . . . . . . . . . . . . . . . . . . . . 82 3.4 Editing a document in the extended TinyMCE, formulæ marked yellow. . . . . . 88 3.5 The formula editor window, when editing three different formulæ. The Variables palette allows to declare variables as functions. All symbols have Unicode and ASCII variants (∞/inf), and outermost parentheses do not need to be complete as seen in the bottom example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.6 JOBAD Architecture. Note the central role of the rendering service, which both generates JOBAD-compliant documents and is needed for many other services. . 94 3.7 Demo of bracket and type elision with global visibility threshold control and color depending on elision levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.8 Looking up a definition (left: selecting the action, right: the result); example taken from our lecture notes; cf. Sect. ??. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.9 Krextor’s extraction process and modules . . . . . . . . . . . . . . . . . . . . . . . 113 3.10 A hCalendar extraction module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.1
4.1 4.2 4.3 4.4 4.5 4.6
Finding pages (depicted as stacks of nodes) affected by changes to a notation definition. Both sym and the symDef s are instances of the class SymbolDefinition. . . Navigation links for an OpenMath symbol definition . . . . . . . . . . . . . . . . . A complete discourse (mind the chronological order when reading!) . . . . . . . Warning about an issue and the offer to solve it . . . . . . . . . . . . . . . . . . . . Editing the newly created example . . . . . . . . . . . . . . . . . . . . . . . . . . . RDF graph of the sample discussion (cf. figure 4.3) . . . . . . . . . . . . . . . . . .
125 128 129 130 131 131
viii
List of Figures
5.1 5.2 5.3 6.1
The arith1 OpenMath CD in SWiM . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Part of a discussion page from the OpenMath wiki. Notice the post types and the specialised reply buttons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 The Exhibit timeline view of fig. 4.6, starting at a hard theorem and ending at a helpful example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Todo list @MK: copied from FR – OK? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . @MK: references to publications needed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes for readers of the draft: The level of detail in the table of contents is intended to facilitate navigation. This is not the final layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Big picture/vision? (bigger than this thesis) My contribution to that? Consider the thesis a workarea of the big plan, the chapters are workpackages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OK? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . define collaboration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . exact ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . web service architectures: MathWeb-SB, MONET . . . . . . . . . . . . . . . . . . . . . . . . rewrite wrt. MKM section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cite TBL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . linkeddata.org . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . early approaches to MKM on the SemWeb: mathweb-sb. HELM; mostly formal. Guidi TR 3.1.2 comparison mathweb vs HELM. Outline how they failed to address my four problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . previous work on semiformal math knowledge: OMDoc . . . . . . . . . . . . . . . . . . . . MMT: linked data for formal math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . better word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . also includes “collaboration with yourself ” . . . . . . . . . . . . . . . . . . . . . . . . . . . . Layered Cake aus Jürgen Zimmers Doktorarbeit . . . . . . . . . . . . . . . . . . . . . . . . . do we have “proof ” anywhere? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . move CURIEs here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . also draw graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref http://purl.org/vocab/bio/0.1/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . maybe introduce some running example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . correct term? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . more exactly? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . account for rigorous informality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii ii
ii
1 2 2 3 3 4 5 6 6 6
6 6 6 7 7 7 9 9 12 12 15 16 16 16
ix
List of Figures
say that I don’t cover automatic translation here? . . . . . . . . . . . . . . . . . . . . . . . . . 16 Wiedijk: statistics of formal math libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 maybe elaborate into longer example, use Ştefan’s thesis on applicable theorem search, Zinn’s figure 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 maybe use term “mereology” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 refer to some CoP literature? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ODRL for non-free/general-purpose licenses: http://odrl.net/, http://www.w3.org/TR/odrl/ 21 maybe concrete example? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 introduce simple content markup here (explaining important element names), parallel markup 23 but say that writing CDs will only be covered in section 2.3.2 . . . . . . . . . . . . . . . . . . 23 parallel markup example (maybe use the same as for JOBAD’s lookup, but with CMML) . . 23 citations needed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 switch to lstlisting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Mention metatheories and uncommittedness to a particular logical foundation (useful when documenting ontologies in OMDoc) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 ActiveMath variant forked off OMDoc 1.1 with different notations and metadata [Lib; Man+06] 27 exact ref sentido . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 maybe move to editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 say later (with backref here) that an editor should support this . . . . . . . . . . . . . . . . . 30 verständlich, oder Beispiel? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 aspects: CGa: step, expression; weak types “the goal of CGa’s type system is not to ensure full correctness, but merely to check whether the reasoning parts of a document are coherently built in a sensible way.” TSa: natural language ↔ symbolic formulas, souring . . . . . . . . . . . . . . . . . . . 31 texmacs for authoring and presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 schöner formatieren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 @MK/HS: anything more concrete from arXMLiv? . . . . . . . . . . . . . . . . . . . . . . . 33 weitere? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 connect to section 2.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 RDF(a) on stmt/thy level; presentation trivial = XHTML. No RDFa specified for MathML specified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 challenged by HTML5 microdata [TL09] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 OPS, DocBook, DITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Flyspeck? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 mention at least Isabelle, as we mention is elsewhere. Isar’s literate programming, embedding LATEX, however only for presentation . . . . . . . . . . . . . . . . . . . . . . . . . 35 right? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 maybe more specific section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 view this as an ontology engineering methodology, compare SIOC ME . . . . . . . . . . . . 36 @MOLE: OOR mostly on ontology-level metadata, we are fine-grained; EXPRESS doesn’t have a formal semantics?! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 related: TEI, CIDOC CRM: [OE09] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
x
List of Figures
general remarks on reasoning complexity: transitive closures: try once with DL-lite simplification of the ontology (on realistic database, e.g. OM). Maybe it only performed badly because of OWL-DL, or because of Pellet? logical programming, e. g. as in TRIPLE http://triple.semanticweb.org or KAON2 http://kaon2.semanticweb.org as an alternative; OWA vs. CWA, local closed world review HELM, MoWGLI, MKM-NET [GMA03]. . . . . . . . . . . . . . . . . . . . . . . . . for informal mathematical structures there is nothing so far – to what extent is MathLang related? – quote from [KWZ08]: “At least one choice of degree of formality should be both inexpensive and useful” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sell this advantage of an ontology elsewhere, too . . . . . . . . . . . . . . . . . . . . . . . . . dependency relations (partly expressible within this ontology); transitivity . . . . . . . . . . CMP/@xml:lang → dc:language? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . We have not (yet?) modeled the truth/falsity of assertions, e.g. that a conjecture neither has a proof nor counter example, or that a false conjecture has been proven false, or that a proof should be true . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rule: proof relies on ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . reuse and update existing graph of the ontology . . . . . . . . . . . . . . . . . . . . . . . . . the ontology captures lots of relations that are too informal for being meaningful to, e. g., an automated theorem prover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OWL 2 QL (or other profile) compliance?[Mot+09] . . . . . . . . . . . . . . . . . . . . . . . Florian’s relations: X is theory/theory-inclusion defined in document Y X is symbol/axiom/import declared in theory Y X is import from theory Y X is imported by Y; X has preimage Z (here Y is an import, which imports the symbol, axiom, or import Z declared in the source theory of Y; this generates the symbol, axiom, or import X in the target theory) X depends on Y (here "depends on" means for a symbol X: Y occurs in the type or definition of X for an axiom X: Y occurs in the formula of X for an import X: Y occurs in the morphism of X) . . . . . . . . . . . . . . . . . . . . . OMDoc vs. OpenMath: Cruz’ ontology integration approach for XML schemas [CX05] . . same as for 2.4.1.1, but extend it beyond OMDoc 1.2, where the semantics of rhetorical structures was not really well-defined (just inspired by RST, but otherwise “obvious meanings”) Reuse SALT [Gro+07] rhetorical blocks and RST-like phrase structures [MT]. . . . . (sub)sections and cross-refs follow the SALT approach and how to link mathematical/rhetorical structures to document structures as annotations ([Gro+07]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODRL: XML Schema is normative, RDF(a) under way . . . . . . . . . . . . . . . . . . . . . Upper ontologies [MCR07] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38 38
38 41 41 43
43 43 43 43 43
43 44
45
45 45 46
xi
List of Figures
Cruz/Xiao09: “does not solve the essential probem of query answering across heterogeneous sources” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 make up some “theoretical” part of Krextor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 rewrite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 duplicate above . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 explain better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 more exact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 differentiable/ continuous in OWL, Matthias? . . . . . . . . . . . . . . . . . . . . . . . . . . 50 maybe connect with what we say about little theories in the OMDoc intro . . . . . . . . . . 50 ref fig swim/PIC/sentido-owl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 RDF to OMDoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 ref FOAF screenshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 ref navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 FOAF evaluation: section 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 formalized math languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 relocate to rhetorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 relocate to rhetorics ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 HTML5 itemscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 cite mail thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 metadata element now optional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 morphism dct↔OMV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Also there is potential for interaction rules between DC and CC, e.g. if BY(D) and dc:creator(D,A) then ....) – Interesting. Yes, why not. But then I vote for the following plan 1. first do this as a part of the OMDoc spec to learn how it works 2. but then don’t keep it within the OMDoc standard, but try to convince the CC developers 3. only if they don’t like our approach, keep it in the OMDoc standard, otherwise contribute it to CC and refer to it from OMDoc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 align with 2.4.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 model formally in DL (give example): insection ○ creator ⊑ contributor . . . . . . . . . . 65 @inherits, compare ActiveMath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 new contribution: can also add metadata from RDF ontologies to OM terms (as attributions) 65 require importing ontologies when used as CURIEs? . . . . . . . . . . . . . . . . . . . . . . 65 introduce this notation somewhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Paul’s example: select b + c inside a + b + c . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 elaborate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 explain wide scope of MKM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 JOMDoc, Krextor, RDFa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 focus on web browsing, but maybe say a few words about other output formats (LATEX, PDF) 73 Where am I? What’s here? Where can I go? (Veen 2001, The Art and Science of Web Design) 74 offer interactive services that work on the preserved semantic markup from 2.4.5 (JOBAD integration) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
xii
List of Figures
Particularly, services that rewrite a formula should retain the previous state of the formula as an alternative to which the user can switch back. . . . . . . . . . . . . . . . . . . . . http://terrytao.wordpress.com/, John Baez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . temporal reasoning on versioned metadata would be helpful for retracing discussions that led to knowledge item revisions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . check ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . for future work (not done so far): how to make previous argumentations persistent as e. g. rhetorically structured documents in the knowledge base . . . . . . . . . . . . . . . . what would we like to edit? Formulæ, statement/theory level, metadata – we want a dedicated editor for each; edit locally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . how do we make the editor accessible in situations where we need it (cf. MCS notation article) short summary of the different approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . related: Lurch (search e-mail) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . verify with Alberto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QMath also partly acts as an input syntax for whole OMDoc . . . . . . . . . . . . . . . . . . LaTeX2OQMath: LaTeXML→*.tex.xml→heuristic XSLT yields OQMath (OMDoc with embedded QMath), depends on ActiveMath integration [And] . . . . . . . . . . . . . . . ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . formeleditoren: No linear input syntax is used, but the formula is composed, not only previewed, two-dimensionally, and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . introduce term earlier, at WYSIWYG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Word 2007? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the art: e. g. Connexions’s edit-in-place . . . . . . . . . . . . . . . . . . . . . . . . . . An editor can treat statements, theories, and rhetorical structures like ordinary sections of a document – except that they carry additional annotations. . . . . . . . . . . . . . . . Concrete use case: how do symbol and notation definitions evolve? . . . . . . . . . . . . . . cut a part on qmath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Treat notation definitions separately? After all, they cover both formulæ and statements. Thus they are partly covered above, but they also have different properties . . . . . . Reuse parts of MCS notation article (“maintaining/editing notations”) – what information is needed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the art: forms, maybe mention semantic forms (SMW extension) . . . . . . . . . . pragmatic vs. strict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . forward-ref SWiM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I have not actually done anything about it and won’t get it integrated into SWiM, but it’s too important to be neglected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . formal correctness is neither verified nor enforced, but we stay interoperable with more formal tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MMT in TNTBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cite same source as elsewhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . what has actually been done? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [GM05] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74 76 76 76 83 83 83 83 83 85 85 85 86 86 86 86 86 87 87 87 89 89 90 90 90 90 90 91 91 91 91
xiii
List of Figures
Except SPARQL, this is not my contribution, but it deserves being mentioned, as it is an important part of an integrated environment; forward-ref to OpenMath wiki queries, maybe another example from Flyspeck here . . . . . . . . . . . . . . . . . . . . . . . . text search with indexing of math markup: not yet done, but maybe easy? . . . . . . . . . . SPARQL [PS08] search: useful examples from OpenMath wiki; link math, discussion, etc. formula search: not my business, but mention MWS (structural indexing), vs. (e.g.) ActiveMath (reduction to Lucene text search), MML with LSI (Cairns . . . . . . . . . . . . . Active Essays (Yamamiya) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . this is actually future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . any study supporting this claim? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . how to express this? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . find out which ones of the two is more readable! . . . . . . . . . . . . . . . . . . . . . . . . . file under “future work” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . later: subsection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . any useful reference to the discussion on DefMPs? . . . . . . . . . . . . . . . . . . . . . . . . ref validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . maybe a bit more detail on that, so that we can compare to SPARQL . . . . . . . . . . . . . //o:symbol doesn’t yet work for same symbol name in nested theory . . . . . . . . . . . . . ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cite ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Is this unification? How hard is it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . importance of conversion, Mars orbiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cite CM, or ref sect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rhetorics visualization: based on markup for SALT/RST-like structures integrated into OMDoc; translated to XHTML + microformats plus JavaScript by XSLT [Gic08], demo available at [Job]. Next steps: integrate into JOBAD, use RDFa . . . . . . . . . . OK, or elaborate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . elaborate; e. g. transitivity in OWL and why one shouldn’t do this with large ABoxes . . . . Maybe mention semantic desktop scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref XML document model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . maybe graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . metadata handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TNTBase wouldn’t require splitting/reassembling, as we can access fragments by their XPath. Or otherwise get the same split/merge effect using virtual files (for fragments of a larger document, or for collections of smaller fragments) → more flexibility, easier to use, not hard-coded but accessible to the user . . . . . . . . . . . . . . . . . . . . . . . RDFa output in JOMDoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mention LATEXML’s XMath for LaMaPUn [Gin+09] . . . . . . . . . . . . . . . . . . . . . . . adapt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WEESA [RGJ05]; also compare concerning complexity; push vs. pull . . . . . . . . . . . . . also reuse development pages of Krextor Trac for the choice of programming language . . Editing (exact ref): Alberto González Palomo [LGP08] . . . . . . . . . . . . . . . . . . . Metadata validation 3.4.1: Michael Kohlhase . . . . . . . . . . . . . . . . . . . . . . . . . . maybe Siarhei for Krextor OWL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92 92 92 92 94 95 96 97 98 98 101 102 102 102 102 102 103 104 106 107
107 109 109 110 110 111 111
111 112 112 114 115 116 116 116 116
xiv
List of Figures
Say sth. about added value considerations. Wikipedia’s motivation (fun, benefitting from the large community) are not valid here, as the knowledge is less general-purpose, and the community is small. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TODO: more non-wiki systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adessoweb, Active Essays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . reuse that section from the TR? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . compare K¯uk¯ak¯uk¯a [SX02] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . semantic features in mainstream wikis, e.g. phpwiki, tikiwiki . . . . . . . . . . . . . . . . . . How do semantic wikis improve on the above-mentioned things? . . . . . . . . . . . . . . . cite SWiM TR for reasons for preferring IkeWiki . . . . . . . . . . . . . . . . . . . . . . . . . wiki-style granularity, look into OM09 and SemWiki08 papers . . . . . . . . . . . . . . . . . my mail on notation management in SWiM vs. TNTBase/JOMDoc . . . . . . . . . . . . . . relation to queries (SemWiki08) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref some section, maybe JOBAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . text vs. XML-based diff/patch/merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . metadata handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future: TNTBase on database level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krextor integration into SWiM (cf. section 3.7.3) . . . . . . . . . . . . . . . . . . . . . . . . . credit to IkeWiki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . open subsection as a page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The same holds for the revision history. (not yet implemented) . . . . . . . . . . . . . . . . maybe also survey replies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . say sth. about MOLE/OMDoc editor future . . . . . . . . . . . . . . . . . . . . . . . . . . . . specific subsection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . specific subsection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . align terminology with section 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . give bad real examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref or refactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref notation editing section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . give exact ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . partly move to editing section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . maybe elaborate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . incorporate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . adapt/revise model accordingly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . summarize results of qualitative evaluation of the 3 use cases by questionnaire (some 15 participants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . gave useful feedback, but only one feature (= argumentation) had really been used – made tests with test persons to get more focused feedback on the three use cases (following section) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117 117 117 119 119 121 121 123 123 124 124 125 126 126 126 126 127 127 127 127 127 133 133 136 136 136 136 136 138 138 138 138 138 139 139 139
139
xv
List of Figures
3 use cases in personal experiments; replay them and think aloud. 7 persons done so far (coincidentally disjoint with survey participants, all familiar with one or more of OMDoc, OpenMath, SWiM or SemWikis in general), more to come . . . . . . . . . . . . procedure for each use case: 1. let them read a one-page descriptions of the feature 2. gave additional explanations (e. g. about the OpenMath setting) as appropriate, when people were not familiar with it 3. let them do the task (edit description of a symbol, change rendering of an operator, discuss about something); let them explore, but gave hints if they didn’t know what to do 4. thinking aloud; feedback collected: what am I trying to do, how am I trying to accomplish it, what do I think about the interface I’m using, how would I expect it to be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . most of the feedback was about the user interface, taking the knowledge model and the task to be performed for granted but (suggestion by Andrea) certain feedback also reveals insights about the knowledge model and the reasonability of the tasks . . . . . . . . . . . . . . . . . . . . . . . the content of this section is complete, but the integration into the thesis still has to be done summarize Siarhei’s RDF →OMDoc translation; use this to translate FOAF . . . . . . . . . move this example here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . maybe drop this completely; still too early for research-oriented evaluation (SAMSDocs) . properly integrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . very preliminary results: feedback from a few persons . . . . . . . . . . . . . . . . . . . . . . preliminary coverage evaluation; no user evaluation done . . . . . . . . . . . . . . . . . . . now they are in, but not considered for assistance . . . . . . . . . . . . . . . . . . . . . . . . Diesen Spruch vorne nochmal bringen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . compare SIOC ME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neue Möglichkeiten eröffnet: 1. Es gibt einen neuartigen RDFa-Browser: Dem müsste man eine einzige Spezialität der RDFa-Integration in OMDoc beibringen, und schon könnte man OMDoc damit browsen. 2. Es gibt eine neue Ontologie, z.B. für Zertifizierung oder Lernen. Dann annotieren wir OMDoc (als Autoren) erstmal in strikter RDFa-Syntax mit dieser Ontologie, weil es erstmal nur so geht. Später, wenn sich bestimmte Praktiken der Verwendung dieser Ontologie in OMDoc herauskristallisieren, erweitern wir OMDoc um eine pragmatische Syntax für diese Ontologie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . left out MoC, as we do not yet have it for OMDoc docs . . . . . . . . . . . . . . . . . . . . . order properly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Therefore, SWiM needs better support for creating new CDs . . . . . . . . . . . . . . . . . . discuss whether it was reasonable to go the RDF path for embedding OWL into OMDoc. Alternative: functional representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . mehr? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . keep aligned with MOLE WPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
139
139
139 140 140 141 142 142 143 143 144 144 147 147
147 148 149 149 150 150 150
xvi
List of Figures
Acknowledgments Detailed contributions will be acknowledged in the respective sections Serge Autexier, John Bateman, Uldis Boj¯ars, John Breslin, Matthias Bröcheler, Stéphane Corlosquet, Richard Cyganiak, Cătălin David, Anca Dumitrache, Alexander García Castro, Alberto González Palomo, Jana Gičeva, Maja Grintal, Tudor Groza, Siegfried Handschuh, Tuukka Hastrup, Andrei Ioniţă, Jan-Willem Knopper, Andrea Kohlhase, Siarhei Kuryla, Bastian Laubner, Paul Libbrecht, Dimitar Mišev, Knud Möller, Christine Müller, Normen Müller, Immanuel Normann, Florian Rabe, Gordan Ristovski, Sebastian Schaffert, Thomas Schandl, Marvin Schiller, Heinrich Stamerjohanns, Christoph Tempich, Jakob Ücker, Max Völkel, Marc Wagner, Vera Zegers, Vyacheslav Zholudev,
xvii
Big picture/vision? (bigger than this thesis) My contribu-
1 Introduction
tion to that? Consider
There is an enormous amount of mathematical knowledge on the web. This comprises both formal and informal knowledge; consider the libraries of formalized mathematics for automated theorem provers like the Mizar Mathematical Library [Miz] vs. general-purpose encyclopedias like Wikipedia [Wik]. The knowledge is used in a variety of applications: Formalizations drive computation and automated verification, whereas informal content is mainly used for educational purposes. There are, however, four problems with this state of the art: 1. The large gap between informal and formal knowledge complicates workflows of manual, semi-automated or automated formalization of informal representations of mathematical knowledge and impairs the accessibility of formalized mathematics to humans, as informal explanations cannot easily be added manually or generated automatically. It also complicates the exchange and reuse of knowledge at different levels of formalization. 2. The isolation of the knowledge bases from each other, even at same levels of formalization, also makes it hard to reuse knowledge, and forces people and applications using them to commit to a particular one instead of combining knowledge from different sources. 3. The specialization of each individual knowledge base, which has usually been designed to serve a single purpose (e. g. driving a computer algebra system), together with the abovementioned lack of integration, often leads to a poor reusability for other purposes, even though the knowledge in itself may be valuable. 4. The lack of web collaboration tools for contributing to knowledge bases – the more formal a knowledge base is, the more do collaborators on it rely on editors specialized on editing its particular formalization and accessing applications for the primary purpose of the knowledge base, but the less are these tools accessible for a casual contributor browsing the project’s web site, the less do these tools enable communication among the contributors, and the more work is it to enhance these tools by third-party services available on the web. This thesis presents an approach to overcome these deficiencies. I investigate languages for representing semiformal mathematical knowledge while still having the potential to cover the whole informal–formal range (cf. chapter 2). Chapter 3 covers services for creating or utilizing knowledge in these languages – both individual services and ways of integrating them. Finally, an environment that does integrate many of these services is presented in chapter 4. This environment, as well as some of the individual knowledge representations and services have been evaluated in several case studies, which are presented in chapter 5. The rest of this chapter is organized as follows: After defining a few central terms in section 1.1, I will review types and applications of mathematical knowledge in more detail in section 1.2, give
1
the thesis a workarea of the big plan, the chapters are workpackages
1 Introduction
a short introduction to semantic web technologies, which I consider helpful for addressing the above-mentioned problems, in section 1.3, and .
continue
1.1 Definitions 1.1.1 Degrees of Formality The notions of formal vs. informal representations of mathematical knowledge, with semiformality in between, are fundamental to the research presented here. In the context of this thesis, informal means “using natural language”, whereas formal means “using a symbolic language”. One should, however, differentiate further. While a sloppy proof sketch is certainly informal, informality does not necessarily contradict rigor, as pointed out by Gow and Cairns [GC07]. In fact, mathematics, as any other science, has developed its own rigorous natural language; consider structures like “Let M be . Suppose . Then , provided .” [Trz95]. Efforts have even been made to develop controlled natural languages, often called “mathematical vernacular” [Bru87], that are immediately machine-understandable. On the other hand, a mathematical formula is not necessarily formal. It may use ad hoc symbols, symbols that have not been defined, or ambiguous notations. Therefore, I define formal as rigorous, symbolic, and unambiguous. Often, the term “formalized” or “computerized” is used to denote mathematical knowledge that has been encoded in a formal language, usually for making it amenable to symbolic computation, such as computer algebra or automated theorem proving; i. e. “formalized” means a representation that is both formal and immediately machine-processable on a symbolic level. In the context of this thesis, semiformal does not denote a formalization level strictly between formal and informal, but a pragmatic compromise, possibly including both. I follow the definition given by Kohlhase [Koh]: Semiformalizations are representations of knowledge making use of informal (i. e. appealing to a human reader) and formal (i. e. supporting syntax-driven reasoning processes) means. Semiformalizations are usually realized as documents in representation formats that flexibly support both formal and informal modes of representation. Note that as defined here, the class of semiformalizations is very broad, it includes arbitrary (informal) documents, datasets, and logical axiomatizations. We will pragmatically restrict the set of completely informal documents to those that are written with eventual (semi)-formalization in mind. We include those as the starting points of a step-wise formalization process, first adding methodical and mathematical rigor, and then marking up formal elements. Pragmatically, the class of semiformalizations includes specifications from program verification, semantically annotated course materials, textbooks in the “hard sciences”, etc.
2
OK?
1 Introduction
1.1.2 Mathematical Knowledge While the term “mathematical knowledge” will predominantly be used throughout this thesis, the findings presented are not strictly limited to the domain of mathematics. Mathematics is the foundation of science and engineering, and thus solutions for managing mathematical knowledge also solve many of the problems of these domains. While science and engineering do not know the concept of a proof, which is the central means of establishing evidence in mathematics, they have other structured ways of establishing evidence or refuting false claims. From an authoring and publishing point of view, documents in all of these domains have in common that they contain formulæ. Hilf et al. pointed out structural similarities of mathematical and physical knowledge in further detail [HKS06]. Besides general aspects of mathematics, most of which also occur in related domains, applications in ontology engineering are a particular recurring theme in this thesis. define collaboration?
1.2 Formal and Informal Mathematics on the Web and its Applications Applications for computers for formal mathematics date back to the 1960s, computer algebra, automated reasoning, and program verification being important fields. An important milestone was the proof of the four color theorem, which Appel and Haken accomplished in 1976 using the help of a computer [Gon08]. Computer libraries for formalized mathematics have been developed for many such mathematical tools, particularly for automated theorem provers. While their main purpose is being used with a particular tool, many of them have also been published on the for a long time – while still being edited and maintained offline, though. For example, the Journal of Formalized Mathematics, publishing proofs of the Mizar Mathematical Library [Miz], which have been checked using the Mizar proof checking system, has existed since 1990 – on paper, and, for most of that time, also on the web [For]. Each such formal library exists independently from the others. Translating entries of one library for reuse in another library is non-trivial due to differences in syntax (i. e. different languages for representing axioms, theorems, and proofs) and semantics (i. e. different logical foundations). Informal mathematical knowledge on the web mostly serves human readers as a source of information, particularly in education. Not only do many mathematics professors put their lecture notes online, but also do more structured, searchable and browsable knowledge collections exist. Two contrary examples are given by Wolfram MathWorld and Wikipedia.1 Wolfram MathWorld is a collection of about 13,000 hyperlinked and categorized entries on mathematical topics, which has been maintained since 1999 by the Wolfram employee Eric E. Weisstein, with contributions from a larger community [Wei]. For about a quarter of all MathWorld entries2 , related files (“notebooks”) for the Mathematica computer algebra system can be downloaded. Wikipedia is a general-purpose encyclopedia, which also features a large section on mathematics [Wik09c]. In contrast to MathWorld, the content can directly be edited on-site, it is not controlled by a central authority, and all of it is available under 1 2
A more detailed survey of open- and closed-content mathematical encyclopedias on the web is given in [Lan07a]. estimation based on 200 random entries
3
ref mathematica
1 Introduction
an open content license. On the other hand, there is no connection to mathematical software. Formal and informal mathematical knowledge need not be strictly separated. Formal libraries need a human-comprehensible documentation, targetting external developers that reuse them to get symbolic computations done or to get proofs checked, as well as the developers that maintain the libraries themselves. This is similar to the field of software engineering, where the importance of documentation is undoubted and authoring documentation is supported by many development tools. Research on program understanding (also known as program comprehension) confirms the importance of documentation for understanding software [VMS99; Sto06] – which comprises source code documentation, external documentation, and records of developers’ communication about the code, such as e-mail discussions and bug reports. Source code documentation usually extends down to the level of classes and methods (in object-oriented languages). In the more radical paradigm of Literate Programming, the program structure is governed by the flow of a natural language explanation of the program logic, interspersed with fragments of source code [Knu92]. From a literate program, both compilable source code and human-readable documentation can be generated. Besides making formalizations comprehensible to human readers, different kinds of informal representation have also successfully been generated for applying text-based information retrieval methods in mathematical knowledge bases (cf. section 3.5). Collections of informal mathematical knowledge are not of much use if machines do not understand anything about their structure. Consider the Wikipedia article about the Pythagorean theorem [Wik09d]. It contains the central formula in the form a 2 + b2 = c 2 and has been filed into the categories “Articles containing proofs” and “Mathematical theorems”. The presentationoriented LATEX representation of the formulae in Wikipedia does not easily allow for searching formulae by their functional structure. Putting the fact aside that formulae cannot be searched at 2 2 2 all, a search for the equivalent expression √ x + y = z would not yield the theorem, and certain more complex rewritings, such as c = a 2 + b2 , would probably only be retrievable because they explicitly occur in the article. From the categorization it is not clear for a machine (just very likely for a human) whether the article contains a proof of the theorem that it contains – or just any other, unrelated proof. Moreover, it is not clear whether the proof is correct. MathWorld offers Mathematica notebooks for the purpose of symbolic computation for download, but they have been hand-crafted and do not directly correspond to the informal content of the encyclopedia entries, i. e. this feature does not reuse the human-readable content. The only machine-understandable information in the entries is their metadata, using Dublin Core and the Mathematics Subject Classification (cf. section 2.2.6.3).
1.2.1 Mathematical Knowledge Management The research field of mathematical knowledge management (MKM) addresses the challenges of creating, maintaining, and using digital collections of mathematical knowledge:3 Mathematical Knowledge Management (MKM) is a new interdisciplinary field of re3
A note to prevent misunderstandings: This definition assumes a wide notion of the term “knowledge management” as “dealing with knowledge in any desirable way”. The traditional definition of knowledge management narrows this down to “a range of practices used in an organisation to identify, create, represent, distribute and enable adoption of insights and experiences. Such insights and experiences comprise knowledge, either embodied in individuals or embedded in organisational processes or practice.” [Wik09b]
4
exact ref
1 Introduction
search in the intersection of mathematics, computer science, library science, and scientific publishing. The objective of MKM is to develop new and better ways of managing mathematical knowledge using sophisticated software tools. MKM is expected to serve mathematicians, scientists, and engineers who produce and use mathematical knowledge; educators and students who teach and learn mathematics; publishers who offer mathematical textbooks and disseminate new mathematical results; and librarians and mathematicians who catalog and organize mathematical knowledge. – William M. Farmer [Far04] Research on mathematical knowledge management has addressed the four problems pointed out initially in the following ways: Concerning informal vs. formal knowledge (problem 1), there has always been a bias towards formal representations [MKM09], which is no surprise, given that a large share of the MKM community comes from fields like computer algebra or automated theorem proving. Frequently investigated problems that originate from or result in informal knowledge representations are handwriting recognition, interactive exercises, and scientific publishing. Gow and Cairns suggested closing the gap between formal and informal representations by making parts of a formalized representation (Mizar in their case) available in an environment focusing on informal aspects (a digital library software in their case) and concluded from an initial case study that a lot more effort would be necessary for making the knowledge comprehensible to general readers [GC07]. Languages covering the whole range between informal and formal knowledge have been developed (cf. section 2.3), most notably OMDoc, but they lack a clear semantics for non-logical and non-functional structures, and there is still no widely agreed upon ontology for representing mathematical knowledge. Mathematical knowledge bases, are rarely interlinked (problem 2). This is partly a consequence of problem 1, as informal knowledge bases, such as the above-mentioned Wolfram MathWorld, and formal ones, such as the Mizar Mathematical Library, do not speak a common language. Knowledge bases on the same level of formalization, such as the libraries of different theorem provers, are, however, also barely interconnected. There are a few proofs of concept for translating between different representations of formalized mathematics (cf. section 2.4 and [Rab08, section 1.1.3.3]), which will not be covered here in further detail. Nothing comparable is known for semiformal representations. So far, knowledge has barely been reused across knowledge bases (problem 3). Collaboration on informal mathematical knowledge bases often takes place in open communities using specialized web content management systems (cf. section 4.1). The MKM community has done little research on such projects, but has been interested in improving collaboration on formalized mathematics (problem 4) using similar technologies. Formalized mathematics is still mainly available in repositories, whose contents are published on the web, but not contributed and maintained via a web interface. Instead, collaborators have to install software locally to work on their contribution, which they then commit to the repository, or to feed into a manual review process, which is, e. g., the case with the Mizar Mathematical Library [Miz]. The community is well aware of the web changing established patterns of scientific collaboration and scientific publishing (see, e. g., [BS05]), but proposed solutions, some of them will also be reviewed in section 4.1, are still in a prototypical stage.
5
web service architectures: MathWeb-SB, MONET
1 Introduction
1.3 A Semantic Web for Mathematics
rewrite wrt.
Ongoing research on the semantic web addresses the four problems pointed out initially. The objective of the semantic web effort is to enrich the existing world wide web with machine-readable data enabling intelligent retrieval and inference services [BLHL01]. The semantic web bridges the gap between informal and formal knowledge (1) by annotating informal content (e. g. HTML documents) with terms whose meaning have been formalized in ontologies, usually in certain subsets of description logic (cf. section 2.1). It overcomes the isolation of knowledge bases (2) by introducing URIs (uniform resource identifiers) for globally addressing knowledge items4 . In the special case of a URI being a URL (uniform resource locator), it employs the hypertext transfer protocol HTTP for retrieving a knowledge item, given its URL. Knowledge retrievable from URLs follows is preferably represented in a self-describing, machine-understandable RDF encoding (Resource Description Framework), drawing on ontologies providing background knowledge. RDF not only allows for describing, but also for linking resources. If these links point to retrievable resources, one speaks of linked data. Over the past few years, large datasets that had previously not been accessible in a uniform, machine-understandable way, have been exposed as linked data. Point (3) is addressed by semantic web services and automated agents, which access knowledge bases and utilize web services from various places on the web, combining knowledge from different sources, drawing their own inferences, and ultimately delivering added value to users. Many technical approaches to this exist, ranging from heavyweight architectures for finding self-describing web services that accomplish parts of the job to be done and orchestrating them, to lightweight mashups that combine, aggregate and transform data and web services [O’R05; Ank+08]. Finally, a development called “Web 2.0” or “social web” has led to new ways of creating web content and collaborating on it (4). Initially, this had emerged independently from the semantic web, but now the developments are being connected more and more closely, leading to a “social semantic web”, sometimes called “Web 3.0”. All that is now considered the state of the art in semantic web research, but it has been applied to mathematics to a much lesser extent. Widely-used solutions are still missing.5
1.3.1 Earlier Approaches early approaches to MKM on the SemWeb: mathweb-sb. HELM; mostly formal. Guidi TR 3.1.2 comparison mathweb vs HELM. Outline how they failed to address my four problems previous work on semiformal math knowledge: OMDoc
4
I will henceforth refer to a piece of knowledge about any distinct subject of interest as a “knowledge item”. “The structured representation of mathematical formulæ using MathML in the internet of today still plays a subordinate role, particularly when considering the potential of subsequent processing, multimedial presentation, and the cross-linking of formal expressions. We wonderfully represent common speech in the internet, we work with hypertext, but not with ‘hyperformulæ’.” [SGR09]; original German source: „. . . die strukturierte Darstellung mathematischer Formeln im heutigen Netz mit Hilfe von MathML spielt immer noch eine untergeordnete Rolle, insbesondere wenn man an das Potential der Weiterverarbeitung, an multimediale Darstellung und die Vernetzung formaler Ausdrücke denkt. Wir bilden die Umgangssprache wunderbar im Netz ab, wir arbeiten mit Hypertext, aber nicht mit ‚Hyperformeln‘.“ 5
6
MKM section
cite TBL linkeddata.org
1 Introduction
return
en
ev ka re
t in o p Formal Methods
b
Our challenge Semantic Web (today) WWW investment Figure 1.1: TODO: this is my challenge for combining SemWeb and MKM MMT: linked data for formal math
BegOP(1)
Most of the mathematical knowledge on the web has been created and still is maintained in a collaborative effort. Contributing informal knowledge to a shared knowledge base is possible at a low cost, as the author “just” has to write down what he has on his mind, but, on the other hand, the return is low as well. The author’s contribution may or may not be found and read by other interested users; otherwise, nothing else can immediately be done with it. Contributing formal knowledge to a shared knowledge base requires a high initial investment, such as learning a formal language and making sure that the new contribution is integrated with existing knowledge and compatible with it. But then, due to its amenability for symbolic computation, formal knowledge enables high-level services that humans would not be able to do quickly and reliably, e. g. automatic software verification. The goal of this work is to enable and facilitate collaboration on semiformal knowledge with a flexible degree of formality. By starting at a rather informal entry level, we lower the requirements and the inhibition threshold for making an initial contribution. Nevertheless, we will provide assistants that allow authors to gradually enrich their contributions with more and more formal structure. For every such additional formal aspect, we will integrate intelligent services into the authoring environment that make use of it, in order to share the benefits with the author. Some of these services will again power the authoring process, thus generating a feedback loop that facilitates the creation of high-quality content.
better word also includes “collaboration with yourself ”
EndOP(1) Layered Cake aus Jürgen
Acknowledgments
Zimmers Doktorarbeit
1
Old Part: reuse parts
7
1 Introduction
OpenMath wiki case study evaluation SWiM
frontend/backend integration
browse query
interactive documents
enabled
annotate
stimulate discussion
write knowledge repr. structures annotation
stabilize improve argumentation subject of
Figure 1.2: TODO: the following chapters in a big picture; solid: done; dashed: partly done; dotted: future work
8
2 Representing Semiformal Mathematical Knowledge This chapter deals with the representation of semiformal mathematical knowledge. Here, I only care about how to represent structural aspects of mathematical knowledge, particularly the ones that will be needed for interaction (cf. chapter 3) and collaboration (cf. chapter 4), but not yet about how to produce, maintain, and utilize such representations. Completely formalized representations will largely be omitted, except for a comparison to semiformal representations. The chapter starts with a summary of foundational semantic web technologies that will be required for representing semiformal mathematical knowledge (section 2.1). I will continue with an overview of the structures that mathematical knowledge has (section 2.2) and then review stateof-the-art knowledge representation languages. A central deficiency of these languages, which hampers the development of services and collaboration environments – to be covered in chapters 3 and 4 –, is their lack of explicit semantics for semiformal structures. The core contribution of this chapter is made in section 2.4, where I will develop several complementary ways of adding more such semantics to existing mathematical knowledge representation languages, particularly OMDoc and OpenMath.
2.1 Foundations of the Semantic Web (State of the Art) Most knowledge representations of semiformal mathematical knowledge that will be presented in the remainder of this chapter are based on XML, RDF, and ontologies. To facilitate understanding the following sections, I will shortly introduce those foundational technologies of the semantic web. Figure 2.1 shows the well-known and hotly debated layer cake architecture of the semantic web (see [GMB08] for a survey of its different incarnations). This section, focues on URIs, XML, RDF, and the basics of ontologies. Unicode, another foundation, will not be introduced in detail here. It is a set of characters and encodings for them, covering most human writing systems of all cultures, including mathematical notation [Uni]. Several other blocks of the layer cake will be covered in other chapters and sections: queries in sections 3.7.1, 3.6.5.1, and 5.1.3.2, and user interfaces in sections 3.1, 3.2, 3.3, 3.6, and 4.4.
do we have “proof ” anywhere?
2.1.1 URIs URIs (uniform resource identifiers) allow for identifying pieces of knowledge and linking to them [BLFM05]. URLs (uniform resource locators) are a special case of URIs; they do not merely name resources but also make them retrievable for HTTP clients. Writing URIs is usually facilitated by defining abbreviations for common prefixes, resulting in a namespace:localname syntax. With the prefix h bound to the XHTML namespace URI http://www.w3.org/1999/xhtml,
9
move CURIEs here
2 Representing Semiformal Mathematical Knowledge
Figure 2.1: The (in)famous Semantic Web Layer Cake one could abbreviate the html element of XHTML, a common XML-based language (cf. section 2.1.2), as h:html. In the context of RDF (cf. section 2.1.3), there is even the notion that terms from vocabularies have URIs themselves, which are simply formed by concatenating the namespace URI and the local name. With the prefix rdf bound to http://www.w3.org/1999/ 02/22-rdf-syntax-ns#, the abbreviation rdf:type would expand to http://www.w3.org/1999/ 02/22-rdf-syntax-ns#type. Here, and in the remainder of this thesis, namespace prefix ↦ URI bindings will be omitted for readability where they can easily be inferred from the context. Common namespace prefixes and URIs are listed in table A.1.
2.1.2 XML Most of the semantic markup languages that will be introduced in section 2.3 are based on XML (eXtensible Markup Language [Bra+08]). Within the XML family of specifications, languages, and tools, it is relatively easy to implement a new semantic markup language, as one can draw on extensive tool support for parsing, presenting, authoring, validating, querying, etc. 2.1.2.1 Syntax/Semantics The data model of XML, also called the XML Information Set (infoset [CT04]), is an ordered tree with labeled nodes. Actually, this tree only forms the backbone of the data model, as its nodes can be given identifiers and then referenced from other nodes, even across documents. This is not part of the foundational infoset specification; more high-level specifications, such as xml:id [MVW05] or XPointer [Gro+03] add this aspect, which is highly important for semantic markup. The vocabulary of an XML-based language1 consists of elements and attributes. Elements 1
From now on, I will use the shorter term “XML language”.
10
2 Representing Semiformal Mathematical Knowledge
can carry attributes and contain child elements or text nodes or both. Element and attribute names are actually URIs, but always abbreviated as namespace:localname or even just localname. In the XPath query language for selecting nodes from an XML tree [Ber+07], elements are simply addressed by their name, whereas attribute names are prefixed with @; I will also use this syntax to distinguish attributes from elements. In some places, more complex XPath constructs will be used, such as element[@attribute=‘value’] to denote those elements whose @attribute has a certain value. Any XML document has to respect certain fundamental syntactic rules that make it wellformed. This means that, for example, namespaces are declared before using them, that elements have a start and an end tag (...) or consist of an empty tag (), and that special characters are escaped when they occur in strings, e. g. < as a1 12 plus a1 a1 rational 1 2 a1 1 2
24
2 Representing Semiformal Mathematical Knowledge
particular type system, so it allows for types of symbols to be specified in separate files parallel to the CD, one per type system. The most common type system in the OpenMath community is, however, Davenport’s Small Type System (STS [Dav99]); types in that system would be given in a file named number-theory.sts. In the traditional application of OpenMath as a CAS interchange format, the semantics of symbols is fixed by translating OpenMath objects into respective CAS-internal representations: each OpenMath-aware CAS can choose to support a number of CDs, and then has to specify a phrasebook that translates between the symbols of these CDs and an internal representation understood by the CAS. The result of this translation must satisfy all FMPs declared for the symbols involved.
2.3.3 OMDoc OMDoc (Open Mathematical Documents [Koh06b]) integrates and extends MathML and OpenMath, which have been introduced in the preceding sections. Above the layer of objects, it adds knowledge representation for statements, theories, and documents, as detailed in section 2.2. Most of the research presented here is based on OMDoc 1.2, which was released in 2006. A major problem with OMDoc 1.2 is that its semantics is not completely formally defined. Compared to OpenMath CDs, more of the semantics of symbols can be expressed within OMDoc itself; therefore, OMDoc does not have to rely on phrasebooks. Still, the semantics of formal mathematical statements, such as symbol declarations, axioms, and proofs, is only defined in a phrasebook-like way, i. e. by translation to languages for formal proofs. The Ωmega [SBA05] and VeriFun [WS03] systems use OMDoc for communication, the latter even as its native file format [Mül05; Mül06]. Further interfaces to CASL [BM04a], PVS [ORS92], TWELF [Pfe01], TPTP [SSY94], the Mizar language [BK07], and OWL15 exist, in varying states of completeness. Some of them have been bundled in the Heterogeneous Tool Set (Hets [MML07]). The underlying logics have been implemented as theories in OMDoc [Koh06c]. Translations from OMDoc to the target languages and back have usually been implemented using OMDoc’s presentation framework; translations from non-XML languages to OMDoc are made by hooking into the parsers of existing tools and making them output OMDoc [Koh06b, section 25.2]. For the rather semiformal structures of documents, including informal mathematical statements, rhetorical structures, document structures, and metadata, no formal semantics has been specified at all. In this thesis, it will be established by way of a translation to ontologies (cf. section 2.4). At the time of this writing, work on a completely revised OMDoc version, called 1.6, was in progress. The key improvement of version 1.6 over 1.2 will be a completely revised formal core, building on the Module system for Mathematical Theories (MMT) [RK09; RK08; Rab08] developed by Florian Rabe. The syntax of MMT heavily relies on the Curry-Howard correspondence of proofs and terms, which makes it formally clean but not as close to mathematical vernacular as OMDoc 1.2. This motivated the developers to adopt the idea of a pragmatic and strict syntax, which they had developed for MathML before. The pragmatic syntax will largely correspond to OMDoc 1.2, and its semantics will be defined by translation to the strict syntax, which will largely 15 Kutz et al. have integrated read support for OWL into the Hets framework mentioned below [Kut+08]. Hets can also read and write OMDoc, but it can only translate between OMDoc and CASL, not yet between OMDoc and OWL. Independently from that, I have developed a translation from OMDoc to the RDF representation of OWL and back (cf. sections 2.4.3.5 and 3.7.3.1).
25
2 Representing Semiformal Mathematical Knowledge
Figure 2.2: Definition of the plus symbol within the arith1 CD arith1 http://www.openmath.org/cd http://www.openmath.org/cd/arith1.ocd 2004-03-30 2006-03-30 official 3 0 common arithmetic functions plus application The symbol representing an n-ary commutative function plus. for all a,b | a + b = b + a
β(quant1#forall, a, b, @(relation1#eq, @(arith1#plus, a, b), @(arith1#plus, b, a)))
... ... ...
26
2 Representing Semiformal Mathematical Knowledge
be a concrete XML syntax for the MMT abstract syntax. For addressing symbols, OMDoc adopts the syntax of OpenMath and MathML, but reinterprets and extends its semantics. A symbol is identified by cd and name, where cd is the name of an imported theory and the name is local to that theory. The cdbase, i. e. the base URI of the theory graph, is usually not explicitly given for each symbol reference, but can be reconstructed by following the import. MMT extends this by named imports; MMT URIs allow for referencing reused symbols by relative URIs constructed from their import paths. Mention metatheories and uncommittedness to a particular logical foundation (useful when documenting ontologies in OMDoc)
OMDoc 1.2 comes with an elaborate framework for presenting semantic markup in humanreadable formats ([Koh06b, chapter 19], see also sections 2.2.5). As will be outlined in sections 2.3.4 and 3.1.1, the OMDoc 1.2 syntax and implementation of defining presentation will be replaced by a more modern approach, to which I have contributed. OMDoc 1.2 allows for using metadata from a fixed vocabulary. This will be discussed in detail in section 2.4.4.2, and a new, extensible RDFa-based metadata framework will be presented in section 2.4.4.3. A first usable implementation of the new notations had already been released in 2008 (cf. [KMR08]) and the core of the new metadata framework had been settled in mid-2009, and both are already being used. Still, the effort of redesigning the formal core of OMDoc and formally specifying the pragmatic→strict translation, both features planned for OMDoc 1.6, was still going on. Therefore, the core OMDoc developers decided to release an intermediate version: OMDoc 1.3 is essentially OMDoc 1.2, but with the new pattern-matching notations, and the new RDFa metadata framework. The old metadata syntax will be retained as a pragmatic syntax, but the translation into the new, strict syntax will not yet be given formally, but only in natural language.
BegOP(2)
• Different levels of formality: – “→” elaborate informal sketch into fully formalized representation – “←” add informal explanations to full formalization – literate programming (example: listing 2.7) – seamless transitions EndOP(2) ActiveMath variant forked off OMDoc 1.1 with different notations and metadata [Lib; Man+06]
2.3.4 Notation Definitions in MathML, OpenMath, and OMDoc Many state-of-the-art knowledge representations for mathematics allow for defining notations of symbols. There is not yet a single standard for defining notations that satisfies all requirements for presenting mathematics on the web. In absence of a consensus among the developers of different approaches to formula rendering16 , the MathML specification does not prescribe any particular 2
Old Part: integrate Even in the research group of the author, which is responsible for maintaining the OMDoc standard, there are currently two competing approaches: pattern matching (cf. section 2.3.4.2, specified for OMDoc 1.3 and implemented in the JOMDoc library, and a declarative syntax (cf. section 2.3.4.3), specified for MMT and implemented in the MMT library. 16
27
2 Representing Semiformal Mathematical Knowledge
Listing 2.5: An XSLT template for rendering the arith1#divide symbol of OpenMath
language for defining notations; it does not even define default notations for its standard symbol vocabulary, i. e. there is no default mapping from pragmatic Content MathML to Presentation MathML. In the following, I will review the different existing approaches to defining notations, mainly focusing on those approaches to notation definitions that focus on rendering. There are, however, also approaches that support the opposite direction, i. e. parsing human-readable/writable input into a functional, content-oriented representation, or approaches that support both directions. Defining notations in XSLT, as well as some of the pattern-matching and declarative approaches, and some ways of handling operator precedences are state of the art. Our research group has made significant contributions beyond the state of the art, which will also be mentioned below. 2.3.4.1 XSLT Notations for symbols are needed for rendering documents, and rendering algorithms for mathematical knowledge represented in XML (which this thesis deals with) have often been implemented in XSLT (cf. section 2.1.2.2). Therefore, notations have traditionally been directly defined in XSLT (see, e. g., listing 2.5). The advantages of XSLT are its expressivity and the fact that it is a widely-accepted standard for XML→XML transformations, for which many efficient implementations exist. However, XSLT has been found hard to maintain and far away from the practice of mathematicians introducing symbols. Moreover, it is a Turing-complete programming language and thus much more expressive than would be required for the content→presentation transformation. The semantics of XSLT is specified informally [Kay07], but there is a formal semantics of its sublanguage XPath and the closely related XQuery language [Dra+07]. Obviously, either semantics has been designed for general XML languages and therefore does not take the specific characteristics of mathematical notation into account. Many proposals have been made for notation definition languages that accommodate for these disadvantages of XSLT; Manzoor et al. review some of the early ones [Man+06]. There are basically two alternatives, pattern matching and declarations, which will be discussed in the following sections. 2.3.4.2 Pattern Matching When following a pattern matching approach, one gives a pattern of content markup and defines a fragment of presentation markup with placeholders for structures matched by the pattern. The presentation markup fragment is similar to the body of an XSLT template, as shown above. The
28
2 Representing Semiformal Mathematical Knowledge
Listing 2.6: A pattern matching notation definition for the arith1#divide symbol of OpenMath
pattern matching, however, is usually not done in a query language like XPath above, but using literal XML, as in listing 2.6. This is the syntax that the JOMDoc renderer (cf. [Jom]) accepts and that will be part of OMDoc 1.3. It replaces the possibility to embed XSLT fragments or to use a simplified XSLT-like language, which was given in OMDoc 1.2 (cf. [Koh06b, section 19.2]). The OMDoc 1.3 syntax is inspired by Isabelle’s syntax for defining mixfix operators17 [Wen+09, chapter 7], as well as a syntax that had earlier been developed for the ActiveMath variant of OMDoc [Man+06]. For an abstract syntax closely related to the concrete syntax used in OMDoc 1.3, a formal semantics has been specified by mapping notation definitions to a formal rendering algorithm [KMR08]. 2.3.4.3 Declarative Notation Definitions Declarative notation definitions are well known from programming languages that allow for defining custom operators, as well as formal mathematical languages. In the former case, they are only used for making the source code more readable and editable (e. g. in SWI-Prolog; cf. [Wie, section 4.24]), whereas in the latter case they are additionally used for rendering human-readable output (e. g. in Isabelle; cf. [Wen+09, chapter 7]). In mathematical markup, they have, to the best of my knowledge, first been introduced with the presentation module of OMDoc 1.2 (cf. [Koh06b, section 19.3]) and the QMath OpenMath/OMDoc preprocessor [GP06a]. In OMDoc, they are only used for rendering. QMath originally used them for parsing, but the QMath-based Sentido formula editor also uses them for rendering (cf. section). Instead of giving content markup patterns, declarative notation definitions semantically describe the role in which an operator occurs, the most frequent ones being a constant without arguments, the application to arguments, or, a variant of application, the occurrence as a binder for A mixfix operator is the general case of an n-ary operator, which renders as an interchanging sequence of – possibly all different – symbols and its arguments. Well-known examples are the typing judgment Γ∶ t ⊢ α and the if–then–else operator. 17
29
exact ref sentido
2 Representing Semiformal Mathematical Knowledge
variables in a subterm. Instead of giving presentation markup fragments, they describe presentational properties of the operator: the presentational symbol, its fixity (prefix, postfix, or infix), and a set of properties governs bracket elision, i. e. when brackets around subterms are redundant. They will be covered below, as they are also used in pattern matching notation definitions. An evolution of the declarative syntax of OMDoc 1.2 has been implemented as a part of the MMT language, the formal core of the upcoming OMDoc 1.6 (cf. section 2.3.3); a semiformal semantics of these notation definitions is specified in [Rab09]. 2.3.4.4 Pattern Matching vs. Declarations Declarative notation definitions are structurally similar to the declaration of formal properties of symbols and can be given in the same run when introducing new symbols, whereas patternbased notation definitions are structurally similar to concrete occurrences of symbols in content or presentation markup. An author can copy fragments of content and presentation markup that already exist in documents, and simply link them together by constructs for matching (here: expr) and rendering subterms (here: render). Declarative notation definitions are concise and as little redundant as possible: Suppose that an operator has the same appearance in both the constant and the application role (which is the case for most operators). Then, the definition of the notation for the application role can focus on fixity and bracketing but refer to the “constant” notation definition for determining how the operator symbol itself is to be rendered. On the other hand, there are notations that do not fit into the strict patterns of declarations, most prominently noncompositional notations like sin2 x for (sin x)2 , which cannot be handled by composing the application of the respective notation definitions for the sine and power operators, but require pattern matching. The use of content markup leads to further, less obvious cases of non-compositionality: Suppose we introduced the one-dimensional integral as an operator that binds one variable and takes as arguments a set and a lambda abstraction representing a function in this variable, e. g. ∫S λx. f (x) dx. Even though this subsumes the concept of an integral over an interval, matheb maticians usually do not write the latter as ∫[a,b] , but as ∫a , and they would not write the function as λx. f (x) but simply as f (x). A notation definition for that case depends on the “set” argument to be an interval and thus is non-compositional. Ultimately, OMDoc 1.6 will feature a unified notation definition syntax that allows for using both declarations and patterns, each where appropriate. The more expressive pattern-based syntax will be the strict syntax, whereas the declarative syntax will be the pragmatic one (cf. section 2.3.3). 2.3.4.5 Brackets and Operator Precedences Both declarative and pattern matching notation definitions usually control the redundancy of brackets in the same declarative way. Brackets around a subterm are redundant when its constructing operator binds stronger than the operator of the enclosing term (consider ax + y vs. (ax) + y)18 . The relative binding strength of operators can be elegantly modeled as a partial order on operators (e. g. multiplication binds stronger than addition) and sets of operators (e. g. 18
The inner operator in this case is the invisible times operator. This is not an elision, but a rendering for the multiplication operator that merely results in a small amount of whitespace.
30
maybe move to editing say later (with backref here) that an editor should support this verständlich, oder Beispiel?
2 Representing Semiformal Mathematical Knowledge
arithmetical operators bind stronger than logical operators). Autexier et al. have implemented such a partial order in their extension of the TEXmacs scientific editor [Aut+07]. All other known implementations simplify this model to a total order on operators using numeric precedence values, which is both easier to implement and less prone to the accidental introduction of cycles, which would make a partial order collapse locally; therefore I will assume a total order from now on. Whenever the operator g constructing the subterm g(b1 , . . . , b m ) binds stronger than the operator f of the enclosing term f (a1 , . . . , a n ) binds its argument a i = g(b1 , . . . , b m ), brackets around the inner subterm are redundant. Binding strength is determined by comparing the numeric value of the i-th input precedence of f to the numeric value of the output precedence of g, where our algorithm assumes low numeric values for strong binding.19
2.3.5 MathLang MathLang [KWZ08] offers “an approach for computerising mathematical texts which is flexible enough to connect the different approaches to computerisation, which allows various degrees of formalisation, and which is compatible with different logical frameworks (e. g., set theory, category theory, type theory, etc.) and proof systems”. It is listed in the XML section for technical reasons, as it has an XML encoding that is used for most processing tasks. Due to its verbosity, the XML encoding is not used for authoring and presentation, though (see below). MathLang puts an even higher emphasis on formalisation of informal, but highly conventionalised mathematical vernacular. It allows for annotation of mathematical symbols and statements as well as logical and rhetorical structures in text [KWZ08], but neither facilitates reuse by modularity nor supports heterogeneity within one document. All of these languages are used for publishing mathematics on the web, for exchanging data among computer algebra systems and proof assistants (the mathematical counterparts to semantic web reasoners), and in computer-based education. MathLang is a related approach that starts with informal, but highly conventionalised mathematical vernacular and allows for annotation of mathematical symbols and statements as well as logical and rhetorical structures in text [KWZ08]. This is similar to OMDoc, but MathLang puts an even higher emphasis on a stepwise formalisation of mathematical vernacular that does not disrupt the way mathematicians would commonly write down things. MathLang does not facilitate reuse by modularity. MathLang can be translated into full formalisations comprehensible to several different proof assistants but does not support heterogeneity within one document. All of these languages are used for publishing mathematics on the web, for data exchange among computer algebra systems and proof assistants (the mathematical counterparts to semantic web reasoners), and in computer-based education. 19
Simple rendering algorithms assume one numeric precedence per operator. More sophisticated algorithms, like ours, distinguish between the output precedence of an operator and the input precedence per argument of an operator. This allows for an elegant handling of left and right associativity. The function type constructor →, for example, is right associative, which means that the function type A → (B → C) can be written as A → B → C, which is different from (A → B) → C, as the operator is not fully associative. This mode of bracket elision can be controlled by giving that operator an output precedence of p, an input precedence of p − 1 in the first argument, and an input precedence of p in the second argument. When the first argument is another function type, we get output precedence = p > p − 1 = input precedence and thus have to bracket the first argument. A second argument of output precedence p does not have to be bracketed. See [KLR07] for details.
31
2 Representing Semiformal Mathematical Knowledge
aspects: CGa: step, expression; weak types “the goal of CGa’s type system is not to ensure full correctness, but merely to check whether the reasoning parts of a document are coherently built in a sensible way.” TSa: natural language ↔ symbolic formulas, souring texmacs for authoring and presentation
2.3.6 CNXML
BegOP(3)
collXML indeed offers some highly relevant features. As with CNXML, I’m not so impressed about how they have done certain things, but let me provide some short comments in random order while skimming the spec (https://trac.rhaptos.org/trac/rhaptos/wiki/TitleIndex then anything that looks like “collection”): • package structure: They have thought about how to package a document plus related files (images, additional metadata) into a ZIP archive – interesting • licensing information by an unstructured XML comment – ugh :-( • well thought-out exporting/generating/flattening documents • they used to version collections independently from the documents they contain and therefore ran into trouble. No problem with an svn-like whole-repository versioning • RDF: apparently they represent certain information about collections in RDF – will find out which ones. Nice how they generate RDF (using the Python macro language Tal), but its use is limited: they generate RDF/XML only, just the collection metadata and outline, and apparently it’s only used for export. • PDF: They seem to have PDF export implemented, or at least good ideas for it • version history markup generated on export, however, again a hard-coded vocabulary for this • featured links: nice idea, but these should rather be generated by the system, based on user preferences EndOP(3)
2.3.7 LATEX
schöner for-
(La)TEX is largely a presentation-oriented markup language from my point of view. It offers most semantic constructs on the statement level, such as \begin{definition} . . . \end{definition} and other similar environments that directly correspond to OMDoc. On theory level, it does not offer any semantic markup, and on the object level, it offers some semantic markup, like \frac{num}{den} or \binom{n}{k}, but a lot of markup is presentation-oriented. This is partly bad practice of users (e. g. not realizing that fun should not be used to denote a function named “fun”, as it is interpreted as f ⋅ u ⋅ n, not using the AMS packages that offer more semantic macros,
32
matieren
@MK/HS:
2 Representing Semiformal Mathematical Knowledge
anything more concrete from
general laziness to produce semantically correct code when it just looks right), but to a larger extent by missing semantic macros. There is, for example, no explicit way of putting an “invisible times” operator. Example: What does O(n2 + 1) – in LATEX syntax: O(n^2+1) – mean?20
arXMLiv? weitere?
Landau symbol: the set of all functions that asymptotically grow at most as fast as n2 , where the +1 is actually superfluous Function application: the application of some function named O – which needs not be the Landau set constructor function – to n2 + 1 Invisible times: O (e. g. some variable) multiplied with n2 +1, where the multiplication operator is invisible Thanks to its macro-processing abilities, semantic extensions to (La)TEX are easy to define. One notable extension that does not target mathematics is SALT (Semantically Annotated LATEX [Gro+07]),connect to section 2.2.2 which allows for marking up rhetorical structures and claim citations. 2.3.7.1 sTeX STEX is a semantic LATEX extension for mathematical documents developed by Kohlhase [Koh08d]. It is essentially a TEX syntax for OMDoc, with a few minor differences mainly owed due to a loose alignment of the two development roadmaps. It is intended as an “invasive technology” in the sense that it allows for gently migrating non-semantic (La)TEX documents into semantically structured documents, and that it brings OMDoc to users of “legacy” tex(t) editors, which is successfully demonstrated by an STEX-aware extension of the AUCTeX Emacs mode (cf. section 3.3.1.1). In a process called “semantic preloading”, the author of an STEX document defines semantic macros for the mathematical symbols to be used, which expand into presentational TEX. The following snippet introduces a symbol with one argument [Koh08d]: \symdef{uminus}[1]{\prefix{-}{#1}}
For more complex operators, STEX has its own declarative notation definition macros. In contrast to a mere LATEX \newcommand, STEX’s symbol definitions are scoped to theories (called “modules” in STEX). In addition to merely providing a semantically structured input syntax and rendering presentational TEX, they also understand operator precedences and thus precedence-based bracket elision (cf. section 3.6.4.2). For statement, theory, and document level markup, STEX predefines a large collection of macros that closely correspond to their OMDoc counterparts. The LATEXML TEX→XML converter is used to generate OMDoc from STEX. For each (s)TEX package, a LATEXML binding is provided – a set of Perl declarations or functions that map TEX macros to XML elements, in this case STEX macros like \symdef , or the statement-, theory-, and document-level macros, to OMDoc elements. The resulting XML document is a mixture of LATEXML’s internal XML language, which is still close to the TEX input, and the desired output 3 20
Old Part: integrate adapted from a presentation given by Bastian Laubner
33
2 Representing Semiformal Mathematical Knowledge
language (here: OMDoc). By a further post-processing step, which can be implemented in XSLT, the document is fully transformed to the desired output language. This conversion path works reliably on Michael Kohlhase’s lecture notes, a collection of 1800 STEX modules introducing 2171 symbols21 . The reverse direction, however, has not been implemented; see section 3.3.1.2 for a detailed elaboration.
2.3.8 RDF Marchiori has proposed a straightforward approach of representing MathML formulæ in RDF, using the above-mentioned ordered sets [Mar03]. Compared to MathML, where one gets the ordered-tree structure for free, authoring the corresponding RDF graph is quite cumbersome, as one has to make more of the structure explicit. The advantage, however, is, as Marchiori points out, that RDF allows for making references to bound variables much more explicit than the XML syntax of MathML. One does not gain more formal semantics, though. Marchiori does not back his RDF representation by a formal ontology but directly reuses the URIs of the MathML XML namespace, for which no formal semantics has been specified. Even if one tried to implement such an ontology, its utility would be limited: The common RDF-based ontology languages (like OWL) rely on decidable subsets of first order logic, whereas large areas of mathematics require full first order logic or even higher order logic. Therefore, one would still need reasoners (then rather called “automated theorem provers”) for these logics, and usually do not accept RDF-encoded input. Many of them do, however, support OpenMath and thus MathML. To the best of my knowledge, Marchiori’s RDF representation of mathematical formulæ has never been adopted. In two other efforts of developing ontologies for mathematical knowledge, the responsibility for utilizing the full semantics of formulæ was left to specialized tools, such as computer algebra systems or automated theorem provers. Certain retrieval, matching, and other management jobs for formulæ can, however, be performed on the RDF level. Two approaches are known that do not fully represent formulæ in RDF but focus on symbols in relevant positions and use a different XML-based representation otherwise. In the MONET problem ontology, there is a property problem:openmath_head of the problem:Problem class, which explicitly represent the symbol in the head position of an OpenMath formula in RDF, i. e. the operator or constructor at the root of the functional representation of a formula as a tree. That was found to sufficiently describe the computational problem represented by the formula, whereas the rest of the formula is only represented in OpenMath XML. For example, a formula with the oms:calculus1#defint head symbol represents a definite integration problem [CDT04]. The HELM system generates from an original formalized representation in a non-XML language (cf. section 2.3.10) both an XML representation and a stand-off RDF graph containing a structural outline of properties that are relevant for searching. For any object (here: a definition, axiom, theorem, or proof, for example), occurrences of references to other objects (e. g. definitions of symbols used in an axiom) are represented. For each occurrence, its position in the object is represented; certain exposed positions have been found useful for query answering [Sch02; GSC03]. Given the example theorem ∀a.∀b.∀c.a ≤ b ∧ b ≤ c ⇒ a ≤ c, these are: h:MainHypothesis: the head symbol of the hypothesis; here: ∧ 21
figures of December 1, 2009
34
2 Representing Semiformal Mathematical Knowledge
h:InHypothesis: any other symbol anywhere else in the hypothesis; here, e. g., ≤ h:MainConclusion: the head symbol of the conclusion; here: ≤ h:InConclusion: any other symbol anywhere else in the conclusion; here: nothing else h:InBody: any symbol in the proof of the theorem BegOP(4) For supporting editing and navigation, shallow links to symbols already prove useful: for computing dependent CDs (CDUses; see above), and for reflecting changes to notation definitions, as we will show in section ??. RDF(a) on stmt/thy level; presentation trivial = XHTML. No RDFa specified for MathML specified
EndOP(4) challenged
RDFa [Adi+08] a standard for flexibly embedding metadata into X(HT)ML documents
by HTML5 microdata [TL09]
2.3.9 Formats for Technical Manuals OPS, DocBook, DITA
2.3.10 Formats for Formalized Mathematics Flyspeck? mention at least Isabelle, as we mention is elsewhere. Isar’s literate programming, embedding LATEX, however only for presentation
2.4 Making Semiformal Markup Semantic Having reviewed structures of mathematical knowledge in section 2.2 and languages for representing semiformal aspects of mathematical knowledge in section 2.3, I will now deal with giving them a stronger semantics. While some of the presented languages have a strong semantics, it only applies to the logical and functional structures. Where contemporary applications have support for other structures, it is mostly hard-coded and based on human-readable informal specifications, such as [Koh06b; Bus+04]. Consider, for example, OpenMath: According to the specification, OpenMath objects receive their semantics after translating them to the language of a computer algebra system or automated theorem prover using a phrasebook, but several immediate semantics for OpenMath objects have also been proposed [KR09; Str04]. However, for the OpenMath Content Dictionary language, no formal semantics has been specified at all. Even though a subset of it is compatible with Dublin Core, this has not been specified. Or consider OMDoc: While OMDoc is more expressive and thus more self-describing than OpenMath, the semantics of mathematical statements and theories is only established by translation to languages having a native formal semantics, usually based on model or type theory. In OMDoc 1.6, more of this semantics can be made explicit by way of the logical meta-language MMT, which allows for modeling theories within a logical framework. 4
Old Part: integrate
35
right?
2 Representing Semiformal Mathematical Knowledge
However, this only applies to the formal MMT core of OMDoc. It neither applies to rhetorical structures, nor document structures, nor to metadata. Languages for formalized mathematics have a strong formal semantics for logical and functional structures, but as these languages are usually committed to a particular logical foundation, one does not benefit from this semantics in cross-system settings. Where translations exist, they have usually been hard-coded for a pair of two specific languages or logics (cf. [Rab08, section 1.1.3.3]). While some languages for formalized mathematics also support document structures and metadata, they do not specify a semantics for them. The need for a formal semantics for the non-logical structures of mathematical knowledge is apparent, but how can it be satisfied? The scenario of a mathematical semantic web outlined in chapter 1 implies the following requirements for a solution:
maybe more specific section
1. All structures of mathematical knowledge have to be covered – particularly those whose description is usually embedded into mathematical knowledge items. An integration both with the strong formal semantics that often already exists for the logical structures, as well as with the environment of the mathematical knowledge items (cf. section 2.2.7) should also be facilitated. 2. A description of these structures should be possible across representation languages; it should be integratable into any representation language. 3. Authors of mathematical should not have to change their workflows too much; the languages and tools they are used to should still be supported. The solution I pursued and will present in the following subsections allows authors to continue using existing markup languages, but gives them additional semantics based on RDF and ontologies. The RDF data model is sufficiently flexible for modeling all structures of interest and enabling interlinking of knowledge items to their environment, as well as across applications. RDF graphs can be obtained from structured markup (e. g. XML) by translation, and, conversely, they can also be embedded into markup (e. g. as RDFa), to allow for modeling structures that the original markup language does not natively support. The RDF graphs are given their semantics by ontologies that model the relevant structures of mathematical knowledge. This general approach has been successfully pursued by others before: Cruz and Xiao show how interoperation between instances (i. e. documents) of different XML schemas, which represent data with similar semantics, can be enabled [CX05]: The structures of each document are represented in RDF, and the RDF graphs are integrated via a common ontology. They showed how this particularly facilitates a uniform representation of metadata, global conceptualization, and writing high-level queries (whose “formulation does not require awareness of particular source schemas”).22 Concerning representation formats, I will focus on XML languages. Most of the formats reviewed in section 2.3 are based on XML anyway, and most of the remaining ones can be translated to XML by existing tools. In section 2.3.8, I mentioned a direct representation of semiformal mathematical knowledge in RDF. However, I will still stick to XML markup as a more The details of how Cruz and Xiao achieved these goals differ from the approach that I chose, but the general approach of translating XML to RDF and developing ontologies for the relevant semantic structures is the same. 22
36
view this as an ontology engineering methodology, compare SIOC ME
2 Representing Semiformal Mathematical Knowledge
natural format for editing and presenting mathematical knowledge in a document-oriented way and also rely on the fact that many more MKM domain experts are familiar with XML than RDF. (Even for knowledge that is not strictly document-oriented, it may pay off to develop a concise XML syntax; compare the direct XML serialisation of OWL [MPPS09] to its RDF/ XML serialisation.) I will first introduce the ontologies that implement the structural semantics of mathematical knowledge in an RDF-compatible way in section 2.4.1, then describe how I extract RDF from semantic markup in section 2.4.2. By applying a variant of this XML→RDF extraction, OMDoc can be turned into a language for implementing ontologies (cf. section 2.4.3). This allows for extending and refining our supply of ontologies in a self-contained way, without leaving the familiar languages for representing mathematical knowledge and the tools that support them. Thus provided a rich and extensible collection of ontologies, we still have to be able to make use of it in mathematical documents. In section 2.4.4, I will present a new way of embedding arbitrary additional metadata into mathematical documents beyond those natively supported by the respective markup language. I conclude with remarks on preserving semantic structures in published documents (section 2.4.5), which will be required for the interactive services presented in chapter 3. Grüninger: • Logical languages have both a formal syntax and a model-theoretic semantics (e.g. RDF, OWL, common logic)
BegOP(5) @MOLE: OOR mostly on ontologylevel meta-
• semiformal languages have a formal syntax but lack a model-theoretic semantics (e. g. XML, EXPRESS) • numerous ontologies whose terms and definitions are specified only in natural languages
data, we are fine-grained; EXPRESS doesn’t have a formal
related: TEI, CIDOC CRM: [OE09]
semantics?!
EndOP(5)
2.4.1 Ontologies Having introduced the structures that mathematical knowledge can have, or that occur in the environment around mathematical knowledge, in section 2.2, I will now present ontologies for representing these structures. Representations for the logical and functional structures of mathematical knowledge already exist in most of the languages introduced in section 2.3, with a more or less formal semantics. As mentioned above but usually on translation to languages having a native formal, model or type theoretical semantics, whereas for some (sub-)languages, such as the content dictionary language of OpenMath, it is only specified informally. There have been few attempts at an ontology-based semantics for the logical and functional structures; therefore, I developed a new one (cf. section 2.4.1.1). The semantics of symbol notation definitions, which are not strictly part of the logical structure, is commonly specified in a similar way – either formally by giving functions or algorithms equivalent to the notation definitions, or informally in a manual. However, as the services presented 5
Old Part: use somehow?
37
2 Representing Semiformal Mathematical Knowledge
in chapter 3 will mainly rely on a semantic description of structures of mathematical knowledge for applications like navigation, validation, searching and querying, supporting user interaction, and knowledge base management, I will not consider the algorithmic semantics of notation definitions in further detail, but instead focus on aspects like the relation of notation definitions to symbols, top-level properties of notation definitions, such as fixity and arity. These aspects will be modeled in the same way as for logical structures and thus also be covered in section 2.4.1.1. Rhetorical structures and document structures have often been specified informally but also been implemented as ontologies in a few cases, often covering both kinds of structures at once (cf. section 2.4.1.2). Metadata usually have a direct RDF semantics; many metadata vocabularies have been specified informally but also implemented as ontologies (cf. section 2.4.1.3), which are directly reusable for our purposes. general remarks on reasoning complexity: transitive closures: try once with DL-lite simplification of the ontology (on realistic database, e.g. OM). Maybe it only performed badly because of OWL-DL, or because of Pellet? logical programming, e. g. as in TRIPLE semanticweb.org
http://triple.semanticweb.org
or KAON2
http://kaon2.
as an alternative; OWA vs. CWA, local closed world
2.4.1.1 Logical and Functional Structures Earlier ontologies of logical structures of mathematical knowledge have been defined in the HELM, MoWGLI and MKM-NET projects but are no longer in use. I will review them and then present my own redesigned ontology, reusing features from the former ontologies as appropriate. review HELM, MoWGLI, MKM-NET [GMA03]. for informal mathematical structures there is nothing so far – to what extent is MathLang related? – quote from [KWZ08]: “At least one choice of degree of formality should be both inexpensive and useful”
OMDoc Considering that the OMDoc language (cf. section 2.3.3) has the desired expressivity for representing arbitary logical structures of mathematical knowledge, I based the ontology for logical structures on OMDoc. The basic vocabulary of the ontology is directly derived from the RELAX NG XML schema for OMDoc. Then, abstractions from the XML syntax were added, capturing more of the semantics of the respective logical structures. The human-readable OMDoc specification was taken as the normative source of the intended semantics. The hierarchy of concepts and relations was initially modeled using RDFS classes and properties, but then enhanced by certain OWL features that were deemed useful for capturing further semantic aspects. In a first step, concepts were introduced for most of the elements on the statement and theory levels. (I will not explain their intended mathematical semantics in detail but instead refer to the informal descriptions in the OMDoc specification.) Then, they were grouped into a hierarchy as shown in table 2.1. Abstract superclasses were added, following the OMDoc specification in how it groups related concepts. Subclasses were added where elements can be specialized further by @type attributes. A different approach was chosen to capture OMDoc’s support of flexible degrees of formality. On statement level, a distinction is usually made between a formal statement-type element and
38
2 Representing Semiformal Mathematical Knowledge
an informal omtext[@type=‘statement-type’] element. Within statements, formal mathematical properties (FMP) are distinguished from informal ones (CMP); these element names are borrowed from OpenMath (cf. section 2.3.2. On the object level, which is not covered here, there can be formal content markup or informal presentation markup. All of these can be combined flexibly; for example, the outer structure of a proof can be given as a “formal” proof element, whereas its steps can be “informal” omtext elements, which can contain content- or presentation-markup objects. In the ontology, structures with the same semantics are represented by the same class, regardless of their degree of formality. The latter is modeled by a property formalityDegree whose range is FormalityDegree; so far, this class has the three instances Informal, Formal, and Computerized, the latter of which is currently only used for completely formalized proof objects. Thus, an informal definition, written as
in OMDoc, will be represented as follows in RDF: a oo:Definition ; oo:formalityDegree oo:Informal .
Class
Table 2.1: Class hierarchy of the OMDoc ontology corresponding OMDoc OMDoc XML element specification section(s)
MathKnowledgeItem Theory2324 Statement ConstitutiveStatement Axiom Hypothesis
Definition Import Symbol NonconstitutiveStatement AlternativeDefinition Assertion AssumptionAssertion27
F
theory25
axiom, I omtext[@type=‘axiom’] hypothesis, I omtext[@type=‘hypothesis’] F definition, I omtext[@type=‘definition’] F imports26 F symbol F F
alternative assertion, I omtext[@type=‘assertion’] F assertion[@type=‘assumption’], I omtext[@type=‘assumption’] F F
15.6 15 15.2 15.2.2 17.1 15.2.1 15.6.1 15.2.1 15.3, 15.4 15.3.3 15.3.1
All classes are in the OMDoc ontology namespace; the common prefixed oo: is omitted here. Indentation denotes a subClassOf relationship. 25 An F superscript in front of an XML element name denotes that the corresponding RDF resource will be assigned a formality degree of Formal, C denotes Computerized, whereas I denotes Informal. Informal statement elements (omtext) are generally covered in section 14.3 of the OMDoc specification, but they have the same semantics as their formal counterparts. 26 will be renamed to import in OMDoc 1.6 27 name chosen to disambiguate from Assumption (a part of a sequent) 23
24
39
2 Representing Semiformal Mathematical Knowledge
Table 2.1: Class hierarchy of the OMDoc ontology Class corresponding OMDoc OMDoc XML element specification section(s) F Conjecture assertion[@type=‘conjecture’], I omtext[@type=‘conjecture’] F Corollary assertion[@type=‘corollary’], I omtext[@type=‘corollary’] F assertion[@type=‘false-conjecture’], FalseConjecture I omtext[@type=‘false-conjecture’] F Formula assertion[@type=‘formula’], I omtext[@type=‘formula’] F Lemma assertion[@type=‘lemma’], I omtext[@type=‘lemma’] F Obligation assertion[@type=‘obligation’], I omtext[@type=‘obligation’] F Postulate assertion[@type=‘postulate’], I omtext[@type=‘postulate’] F assertion[@type=‘proposition’], Proposition I omtext[@type=‘proposition’] F Rule assertion[@type=‘rule’], I omtext[@type=‘rule’] F assertion[@type=‘theorem’], Theorem I omtext[@type=‘theorem’] F example, I omtext[@type=‘example’] Example 15.4 C Proof proofobject, F proof , 17 I omtext[@type=‘proof ’] F TypeAssertion type 15.3.2 ProofStep 17.1 I DerivationStep omtext[@type=‘derive’] F DerivedConclusion28 derive[@type=‘conclusion’] F Gap derive[@type=‘gap’] 29 F hypothesis, Hypothesis I omtext[@type=‘hypothesis’] F ProofLocalDefinition definition, I omtext[@type=‘definition’] F ProofLocalSymbol symbol I ProofText omtext F Property FMP, I CMP 14.1, 14.2 SequentPart 14.2 F Assumption assumption 28 29
name chosen to disambiguate from Conclusion (a part of a sequent) also listed above as a subclass of Axiom
40
2 Representing Semiformal Mathematical Knowledge
Class Conclusion
Table 2.1: Class hierarchy of the OMDoc ontology corresponding OMDoc OMDoc XML element specification section(s) F conclusion
The properties of the ontology abstract from the XML schema to a greater extent than the classes. Most properties part-of relationships derived from parent-child containments in the XML tree, or they represent other relationships derived from attributes pointing to [the URIs of] other elements. Purely logical part-of relationships exist between theories and their statements, and between statements and their substatements. Other part-of relations, often but not necessarily coinciding with the logical ones, can be found in the document structure and will be covered in the following section. The primary interrelation on theory level is the import relation. On statement level, there is a great diversity of interrelations. In the XML syntax most of them are established by a single URI-valued attribute that always has the same name – @for. The intended semantics can partly be deduced from the types of statements linked that way, but I chose to make it more explicit by introducing separate properties for the @for attributes of separate elements, again relying on the informal descriptions of the respective elements in the OMDoc specification. Even if such interrelations were represented by differently-named XML attributes, their consistency, in contrast to the consistency of part-of relationships represented by parent-child containments, cannot be modeled in an XML schema languages: The value of a @for attribute can only be restricted syntactically to a URI, but XML schema languages lack the expressivity to demand that the URI must be the URI of a particular element type. Another notable relationship exists between different levels of formalization of the same (sub)statement: informal statement elements, such as omtext, can point to formal ones, which they verbalize. dependency relations (partly expressible within this ontology); transitivity
Property
dependsOn 30 hasPart formalityDegree31
30 31
Table 2.2: Properties of the OMDoc ontology corresponding domain range OMDoc XML element/ attribute MathKnowledge- MathKnowledgeItem Item MathKnowledge- MathKnowledgeItem Item Statement FormalityDegree ⊔ Property ⊔ ProofStep
OMDoc specification section(s)
All classes are in the OMDoc ontology namespace, commonly prefixed oo:. See the I , F , and C annotations in table 2.1.
41
sell this advantage of an ontology elsewhere, too
2 Representing Semiformal Mathematical Knowledge
Property
verbalizes =formalizes−1 imports32 =importedBy−1 homeTheory =homeTheory-↩ Of −134 hasImport 36 importsFrom defines =hasDefinition−1
exemplifies =exemplifiedBy−1 corroborates =corroboratedBy−1 refutes =refutedBy−1 proves =provedBy−1
Table 2.2: Properties of the OMDoc ontology corresponding domain range OMOMDoc Doc XML specification element/ section(s) attribute @verbalizes MathKnowledge- MathKnowledgeItem14.3, Item 14.4, 14.6 theory/imports/↩ Theory Theory 15.6.1, @from33 18.1 parent::theory Statement Theory 15.6.1, ⊔ Theory 18.1 ⊔ Document ⊔ DocumentUnit 35 theory/imports Theory Import 15.6.1, 18.1 imports/@from Import Theory 15.6.1, 18.1 37 definition/@for , Definition Symbol 15.2.1, omtext[@type=↩ ⊔ Property 15.2.4 ’definition’]/↩ ⊓ formalityDegree∶ @for, Informal CMP//term↩ [@role=↩ ’definiendum’] example/@for, Example Symbol 15.4 omtext[@type=↩ ⊔ Definition ’example’]/@for ⊔ Axiom ⊔ Assertion example[@type=↩ Example Assertion 15.4 ’for’]/@for example[@type=↩ Example Assertion 15.4 ’against’]/@for proof/@for, Proof Assertion 17.1 omtext[@type=↩ ’proof ’]/@for
subproperty of dependsOn will be renamed to import in OMDoc 1.6 34 subproperty of hasPart 35 TODO see document ontology 36 subproperty of homeTheoryOf 37 TODO omdoc 1.6 32 33
42
2 Representing Semiformal Mathematical Knowledge
Property
hasStep38 justifiedBy
hasProperty39 assumes concludes usesSymbol
Table 2.2: Properties of the OMDoc ontology corresponding domain range OMDoc XML element/ attribute proof/* Proof ProofStep derive/proof , DerivationStep ProofStep derive/↩ ⊔ Definition proofobject, ⊔ Axiom derive/method/↩ ⊔ Assertion premise/@xref */CMP|*/FMP Statement Property FMP/assumption FMP/conclusion om:OMS40 , CMP//term↩ [@role=’definiens’]
Property Property Property ⊔ SequentPart 41
Assumption Conclusion Conclusion
OMDoc specification section(s) 17.1, 17.2 17.2
14.1, 14.2 14.2 14.2 14.2 CMP/@xml:lang → dc:language?
We have not (yet?) modeled the truth/falsity of assertions, e.g. that a conjecture neither has a proof nor counter example, or that a false conjecture has been proven false, or that a proof should be true rule: proof relies on ? OWL 2 QL reuse and update existing graph of the ontology
(or other
the ontology captures lots of relations that are too informal for being meaningful to, e. g., an automated
profile)
theorem prover.
compli-
The current version of the ontology has been implemented in OWL , using the Turtle syntax for the RDF representation of OWL. To facilitate maintenance, it is maintained in several modules that rougly correspond to the modules of the OMDoc XML schema; there is, for example, one module for proof structures. For compatibility with a wider range of OWL tools, we export the ontology as a single RDF/XML file at its namespace http://omdoc.org/ontology#.42 The Protégé [Pro] and Swoop [Kal+06] ontology development environments were used for validation and debugging. subproperty of hasPart subproperty of hasPart 40 as a descendant of, e. g., a statement or property 41 TODO anything else? 42 The merging has been implemented using the Jena RDF API for Java. The implementation of the into.kwarc.semweb.OWLMerge class, which is executable from the command line, is currently shipped with SWiM, but should actually be made available separately. It should probably also be ported to the OWL API [Owl], which offers functions specialized to merging OWL ontologies. 38
39
43
ance?[Mot+09]
2 Representing Semiformal Mathematical Knowledge
Florian’s relations: X is theory/theory-inclusion defined in document Y X is symbol/axiom/import declared in theory Y X is import from theory Y X is imported by Y; X has preimage Z (here Y is an import, which imports the symbol, axiom, or import Z declared in the source theory of Y; this generates the symbol, axiom, or import X in the target theory) X depends on Y (here "depends on" means for a symbol X: Y occurs in the type or definition of X for an axiom X: Y occurs in the formula of X for an import X: Y occurs in the morphism of X)
BegOP(6) OpenMath The OpenMath ontology models classes and properties for all structural entities found in OpenMath’s CD groups, CDs, type signatures, and notation definitions. Properties from common ontologies like Dublin Core were reused where appropriate43 . Consider for example the CDUses element that refers to a list of CDs whose symbols are used by the current one. The XML schema can only restrict it to a list of strings that would be allowed for naming CDs, but it does not convey the semantics that any such string is a by-name pointer to an actual CD44 . OMDoc vs. OpenMath: Cruz’ ontology integration approach for XML schemas [CX05]
When managing notation definitions (e. g. in the database where they are authored, or in the data structures of the renderer that uses them), one has to know what symbol a notation definition applies to. A pattern matching notation definition obviously applies to the symbol(s) that occur in its content markup pattern. A declarative notation definition would instead point to its symbol by URI reference. MONET Quite a different ontology representing the OpenMath content dictionaries had been developed for the MONET web service architecture. It does not reflect the logical structure of the content dictionaries, but is intended for relating OpenMath objects to web services operating on them; for example, one can specify that there is a web service for computing definite integrals, which can operate on any object that applies the oms:calculus1#defint symbol to certain arguments [CDT04]. Consider, for example, the following description of an integration problem, using the MONET problem ontology and given in the OWL functional-style syntax [MPSP09]: SubClassOf(problem:definite_integration problem:Problem) SubClassOf(problem:definite_integration gams:GamsH2a) SubClassOf(problem:definite_integration 6
Old Part: treat as simplification of OMDoc The idiosyncratic metadata vocabulary of OpenMath 2 is likely to be replaced by Dublin Core (DC) in OpenMath 3. Anticipating this change, we map e. g. the Name of a symbol definition to the dc:identifier property, and Description to dc:description. 7 Old Part: integrate 44 Note that by adding a rule to the ontology it is also possible to compute the CDs used by some CD by looking up the CDs of the symbols occurring in its FMPs and examples. 43
44
EndOP(6) BegOP(7)
EndOP(7)
2 Representing Semiformal Mathematical Knowledge
ObjectIntersectionOf( ObjectAllValuesFrom(problem:openmath_head oms:calculus1#defint) ObjectSomeValuesFrom(problem:openmath_head oms:calculus1#defint)))
The fact that OpenMath symbols are represented as classes – subclasses of mom:OpenMathSymbol – and not as instances, which would have allowed for simplifying the third axiom listed above to ... SubClassOf(problem:definite_integration ObjectHasValue(problem:openmath_head oms:calculus1#defint))
. . . is owed to technical restrictions, which the underlying reasoner imposes for scalability reasons; the authors concede that it would have been more appropriate to use instances [CDT04]. 2.4.1.2 Rhetorical and Document Structures same as for 2.4.1.1, but extend it beyond OMDoc 1.2, where the semantics of rhetorical structures was not really well-defined (just inspired by RST, but otherwise “obvious meanings”) Reuse SALT [Gro+07] rhetorical blocks and RST-like phrase structures [MT]. (sub)sections and cross-refs follow the SALT approach and how to link mathematical/rhetorical structures to document structures as annotations ([Gro+07]).
semantics for document structures: [Ren+02] 2.4.1.3 Metadata Most of the metadata vocabularies introduced in section 2.2.6 have been implemented as ontologies. For some metadata vocabularies, the authoritative specification is given abstractly in a manual, but an officially endorsed ontology exists, possibly as one out of many implementations. This is, e. g., the case with Dublin Core [Nil+08] and Learning Object Metadata [IEE02b; NPB03]. Usually, in these cases the alternative implementation to an ontology is an XML schema – which allows for making less of the semantics explicit, though. For other ontologies, such as ccREL vocabulary [Abe+08], the ontology itself is the normative implementation. Classification schemes can be used in conjunction with ontologies in two ways. Either the identifiers of their categories are used as literal values of metadata fields like dc:subject, or, if one wants to make more explicit what classification scheme has been used, one can introduce refined subproperties of dc:subject, such as mnp:primarySubject or mnp:secondarySubject from the MathNet ontology [Mata], for which it is recommended to have an MSC value (cf. section 2.2.6.3). However, implementing a classification scheme as a proper ontology, where each category is a resource of its own, has further advantages: The hierarchy of categories can be made explicit, URIs can be used more flexibly in SPARQL queries. In the course of the MONET project, an ontology for the problems of GAMS has been implemented [Mon]. Dolog et al. have turned the ACM Computing Classification into an ontology [Dol+03], drawing on the Learning Object Metadata “classification” vocabulary. 8
Old Part: integrate; this is my contribution
45
ODRL: XML Schema is normative, RDF(a) under way
BegOP(8)
2 Representing Semiformal Mathematical Knowledge
Figure 2.3: Main classes and properties of the SIOC Core ontology [Ber+09] Parts of the OpenMath CD metadata vocabulary correspond to other, standard vocabularies (e. g. Description ↔ dc:description), whereas others are specific to OpenMath (e. g. CDStatus). 2.4.1.4 Environmental Structures Even though most contemporary systems for managing mathematical knowledge do not formalize environmental structures as ontologies, such ontologies exist and have been used by other systems. The FOAF ontology (Friend of a Friend [BM07]) offers a very basic representation of user profiles – persons that know other persons, belong to groups, create content, hold accounts in online services, etc. The SIOC ontology (Semantically Interlinked Online Communities, cf. [Boj+08; Ber+09; BB07]) extends FOAF by a more elaborate model of user-generated content, covering web-based discussion areas, such as blogs and message boards: sioc:Users create sioc:Posts in sioc:Forums, which are hosted on sioc:Sites. sioc:Posts can reply to other sioc:Posts. As SIOC evolved, these concepts were generalized to cover other kinds of user-generated content as well, as shown in figure 2.3. Argumentation ontologies provide a more elaborate model of discussions and will be covered in section 3.2. Finally, there are ontologies for user modeling, such as the General User Model Ontology (GUMO [Hec+05]). 2.4.1.5 Upper Ontologies Upper ontologies [MCR07]
2.4.2 Extracting Structures from Markup Approaches to giving XML-based language a semantics in terms of RDF and ontologies are diverse and range from extremely practical one-shot hacks to general formal models. Most practical
46
EndOP(8)
2 Representing Semiformal Mathematical Knowledge
XML→RDF extractions have been implemented as monolithical XSLT stylesheets that work for one source XML language. The GRDDL W3C recommendation (Gleaning Resource Descriptions from Dialects of Languages [Con07]) specifies a uniform way of linking from an XML instance document or from an XML Schema to implementations of transformations that extract RDF from this particular document, or from all instances of a schema, respectively. GRDDL is not concerned with how a transformation should be implemented; the authors recommend to use XSLT. Patel-Schneider and Siméon have developed a unified model theory for both [PSS03]. However, the benefit of that approach is rather theoretical, as it makes impractically restrictive assumptions about the XML structure (see [Liu+04] for details). The authors of XSDL (XML Semantics Definition Language45 [Liu+04]) stroke a balance between between those two extremes. They have done substantial theoretical elaboration on a semantics-preserving translation of XML into RDF but also provide a concise declarative syntax mapping XML to OWL-DL. To the best of my knowledge, XSDL has not been implemented, though. I took a position between plain XSLT and XSDL and implemented a library of XSLT templates and functions that significantly lowers the investment for implementing a translation from a new XML language to RDF and enhances the reusability of such implementations. Here, I will conceptually describe the approach; the implementation will be covered in section 3.7.3, and its integration into SWiM in section 4.3.3.
Cruz/Xiao09: “does not solve the essential probem of query answering across heterogeneous sources”
make up some “theoretical” part of Krextor
2.4.3 OMDoc as an ontology language Having presented several ontologies for describing structures of mathematical documents written in OMDoc so far, I will now take the opposite position and investigate ways of writing ontologies as mathematical documents in the OMDoc language. These investigations were motivated by the desire to design a more flexible metadata framework for OMDoc, for which easy extensibility by additional vocabularies implemented as ontologies was a requirement. The metadata framework itself will be described in section 2.4.4; this section deals with with the integration of ontologies with OMDoc theories. These foundations will then enable authors to create, extend, integrate, or document custom metadata vocabularies for use in OMDoc documents in the OMDoc language itself. But they can also be applied to the engineering and maintenance of ontologies in general and thus have a much wider impact, beyond metadata. Documentation is crucial in engineering – in software engineering as much as in ontology engineering46 . Ontologies are (or: should be!) unambiguous and amenable to formal validation, but many practical applications of ontologies suggest different requirements: Oftentimes, the domain not to be confused with the XML Schema Definition Language, formerly known as “XML Schema” (cf. section 2.1.2). 9 Old Part: integrate 46 An informal Google search shows that there is a lot to catch up with. Six of the ten top hits for “software documentation” (without quotes) deal with the process of documenting software, mostly providing guidelines. Four results lead to documentation of concrete software products, two of them to auto-generated API documentation. Nine of the ten top hits for “ontology documentation” lead to documentations of concrete ontologies, three of which have been auto-generated from the ontology sources. Coverage of the process of documenting ontologies only starts on the second result page, hits #11 being our own previous work. 45
47
rewrite
BegOP(9)
2 Representing Semiformal Mathematical Knowledge
of interest is too complex to be fully captured in an ontology, or the formal tools at hand do not support the full complexity of an ontology. In both cases, the developer of ontology-based software needs to understand the intentions the domain experts and ontology engineers had when conceptualizing the respective domain and formalizing it in an ontology. Authors who want to annotate documents with concepts from an ontology, or ontology engineers who want to reuse concepts from an ontology have similar requirements on comprehensibility. As examples, consider DOLCE and FOAF. DOLCE was originally modeled in first order logic and implemented in KIF [Mas+03], but a simplified, “Lite” implementation in OWL was provided to accommodate semantic web services. For developers working with DOLCE Lite, the KIF implementation serves as a more exhaustive reference, which is less ambiguous than the accompanying informal manual. Neither of them is closely integrated as documentation into the OWL implementation, though. In this situation, we also consider a formalised representation of the original, non-simplified knowledge as documentation. An ontology is essentially the same as a mathematical theory whose metatheory usually is a decidable subset of first order logic, such as a description logic. [MP03]: FoC is a formal specification and proof language, used something like OMDoc for documentation. Original language retained because of tool support. Here: OMDoc not as expressive as FoC, at least not without tricks; structural differences too large. Criticism: OMDoc’s XSLT biased towards OMDoc, not suitable for browsing FoC. Conclusion: build own language: formal FoC, plus structured comments (how do they look?). FoCDoc compiler (compare javadoc) generates XML that can be translated to OMDoc, HTML, or LATEX. Uses of OMDoc here: formalize FoC theories, or establish mappings between OpenMath (content) and FoC (implementation; compare CSC’s presentation/content/semantics layers) in OMDoc to facilitate exchange with OpenMath-aware systems, add OMDoc/OpenMath code to FoC sources (not sure if one can really embed OMDoc into FoC?). Envisaged opposite direction: OMDoc/OpenMath as a potential input syntax for FoC (provided that there are user-friendly editors)
EndOP(9) BegOP(10)
EndOP(10) BegNP(11)
EndNP(11)
2.4.3.1 Motivation The quest for representing semantic web ontologies in OMDoc was motivated by three deficiencies of RDFS and OWL: They are limited in (i) their expressivity, (ii) in their way of handling modularity, (iii) and in their capabilities for documentation. Limited expressivity was a deliberate design goal as decidability is a prerequisite for web scalability. A common experience in ontology design is, however, that certain axioms in the domains to be modeled exceed the expressivity of the languages chosen for implementation. Sometimes, dumbing down the model to less expressive special cases47 is sufficient, whereas in other cases, a potentially ambiguous or imprecise prose description of the actual axiom is added to the documentation of the ontology. In such cases, it could help to make the ontology heterogeneous: another, more expressive logic would be employed for formally modeling such axioms, even though it might not be used for reasoning 10
Old Part: somehow reuse New Part: integrate 47 as has, e. g., been done for the DOLCE ontology, a simplified version of which has been formalized in OWL-DL; cf. http://www.loa-cnr.it/DOLCE.html. 11
48
duplicate above
2 Representing Semiformal Mathematical Knowledge
Modularity in RDFS in only supported in the standard RDF way that concepts from external ontologies can be referenced by their URIs. This does not make dependencies explicit at all and can easily lead authors into creating inconsistency. If possible at all, one would have to collect all URI references occurring in an RDFS ontology and then apply some heuristics to these URIs in order to get hold of the actual ontologies depended upon48 . OWL improves on this by allowing explicit imports of ontologies via the owl:imports declarative. This always imports a whole ontology – there is no possibility for information hiding, as known from modular programming languages –, and the imported symbols must be reused literally, i. e. they cannot be mapped onto symbols or complex expressions from the importing ontology. Even literal imports are not yet widely used in web ontologies, and tools usually do not enforce their usage; improvements are to be expected with a more widespread adoption of OWL 2 [CG+08]. If not used to enhance reasoning, I argue that heterogeneity and modularity of an ontology improve the documentation of its structure. But documentation in the strict sense, i. e. informal documentation of formal axioms, is also severely limited in common ontology languages. Documentation is usually maintained separately from a formal ontology, only pointing to entities of the ontology. This results in synchronisation and other maintenance problems such as lack of completeness. Some ontology languages, like F-Logic [KLW95], only support completely unstructured comments, like most programming languages do, and thus will not be considered further. RDFS and OWL 1 support annotating all entities of an ontology (classes, properties, individuals), as well as the ontology as a whole, with metadata [MH04]. Some semantic wikis, such as IkeWiki [Sch06], support ontology editing in a way that every entity is described by a text document, in which the links to other entities (e. g. superclasses) are embedded and surrounded by informal documentation. OWL 2 adds this possibility to annotate axioms to the core of the language [MPSP09], following a pattern that is very similar to RDF reification (= treating triples as resources of their own and giving them a URI). However, this enhancement is fairly new and not yet supported by tools. Named graphs extend the RDF data model by the possibility to assign a URI to any RDF subgraph [Car+05] – e. g. a group of related axioms in an OWL ontology –, thus enabling them to be documented. Usage of this has mainly been explored for providing trust and provenance information. The latter ranges from simple Dublin Core metadata expressing “who said what and when” to justifications of how an inferred statement has been established (i. e. what existing statements and axioms/rules have been used [Div+09]). As values of annotations, strings are most common. The RDF data model also allows for XML literals [Bec04b], which could, in principle, link to ontology entities, but no tool support is known for this to date. RDFa would be an alternative for embedding RDF-based ontologies into HTML [Adi+08], turning it into a semantic markup language similar to the above-mentioned approach of some semantic wikis. To the best of my knowledge, RDFa has not been used for authoring ontology documentation so far. In mathematical knowledge management, tensions between high expressivity desired by authors and decidability or even tractability required for web-scalable automated inference are wellknown. Mathematical knowledge has traditionally been recorded and communicated in documents. Semantic markup, in parallel with document-oriented presentational markup, is widely 48
In most cases, the last / or # in a URI marks the end of the namespace URI, and it is a good practice to make ontologies available for download from their namespace URI.
49
explain better
2 Representing Semiformal Mathematical Knowledge
used for formulæ (cf. section 2.3.1 about MathML), and not uncommon on higher levels of knowledge either: In fact, OMDoc offers the features desired for ontology languages: (i) It is not committed to a particular logical foundation, but can integrate any desired logic, thus allowing for heterogeneity, (ii) it makes modularity explicit by supporting theory inclusions (imports and views), (iii) and its wide range of formality degrees supported and its literate programming abilities support documentation nearly everywhere.Therefore
more exact differentiable/ continuous
2.4.3.2 Correspondences between OMDoc and semantic web ontology languages
in OWL,
One can easily identify the following correspondences between semantic web ontology languages and OMDoc: Classes, Properties, and Individuals correspond to objects or symbols. Axioms and Rules correspond to statements, as they state properties of resources. However, a distinction between proper axioms and facts derived from them is not usually made in ontologies. OMDoc, following the “little theories” approach [FGT92], allows for modeling this distinction and thus reducing theories to their core, while still enabling authors to document selected logical consequences of this core within the same theory. Ontologies correspond to theories. Both are often designed modularly and import other ontologies or theories. Both entities of an ontology and symbols of an OMDoc theory are identified by URIs within the namespace defined by the whole theory/ontology. OMDoc is XML-based and thus complies with basic web standards like URIs, and any desired logical foundation can be formalized in OMDoc. We can thus make use of the similarities to semantic web ontology languages pointed out above and use OMDoc for modeling ontologies – provided that we overcome certain obstacles, which are addressed in the following subsections: (i) Since OMDoc is uncommitted to a particular logical foundation, it does not have a native understanding of the RDF, RDFS, and OWL (-DL) syntax and semantics (cf. sections 2.1.3 and 2.1.4). Therefore, these foundations – at least their symbol vocabularies – have to be modeled as OMDoc meta-theories first. (ii) OMDoc theories can import other theories for a modular design, but they cannot directly reference existing semantic web ontologies in order to enhance them. Therefore, we have to specify an import syntax and semantics. (iii) OMDoc itself is not supported by any description logic reasoner. Therefore, we need to provide a way to extract semantic web ontologies from theories, as has been done earlier for theorem proving languages (cf. section 2.3.3). As some of our existing infrastructure that could be used as a foundation for implementing the translation in step iii was based on RDF, we decided to accomplish step i via the RDF semantics of OWL [Sch09]. OMDoc would also allow for implementing OWL in terms of its direct model-theoretic semantics (cf. [MPSCG09]), and modeling views mapping between both implementations (cf. [PSM09]). 2.4.3.3 Knowledge Representation As a foundation for expressing semantic web ontologies in OMDoc, we wrote theories for RDF, RDFS, and OWL, which declare as symbols all classes, properties, and individuals of these languages. An ontology is then written as follows: Classes, properties, and individuals are declared as
50
Matthias? maybe connect with what we say about little theories in the OMDoc intro
2 Representing Semiformal Mathematical Knowledge
symbols with a type49 . Property types are modeled as compound types, e. g. owl#ObjectProperty(foaf#Person → foaf#Group) consisting of the actual property type, plus domain and range. Class definitions like “Student = Person ⊓ ≥ 1 enrolledIn” (“A student is a person, and is enrolled at least once”) are ref fig given as OMDoc definitions (cf. Listing 2.7). This is a machine-oriented representation that a swim/PIC/sentidouser would not usually see, but which would render as three lines in Figure 5.3 and be edited usowl ing a dedicated formula editor (cf. section 3.3.3). This example also showcases some of the literate programming features of OMDoc: term annotates references to symbols (“technical terms”) in natural language, whereas phrases of natural language can be linked to corresponding subterms of formulæ. All other statements can be expressed as OMDoc axioms in such a way that a property is applied to two arguments: a subject and an object. This is the most direct way of representing RDF in OMDoc but does not take advantage of the higher expressivity of OMDoc. However, the author has the possibility to annotate redundant axioms (as introduced in section 2.4.3.2) as theorems instead, which can then be proven on the OMDoc level, using other axioms of the same ontology plus the inference rules of the respective ontology language, as represented in the RDF, RDFS, and OWL theories. 2.4.3.4 Connecting OMDoc and Semantic Web URIs OMDoc and RDF have different ways of giving URIs to symbols. RDF-based ontologies have a namespace URI, which is usually considered to be the URI of the ontology, and all entities within the ontologies have local names (cf. section 2.1.1). An absolute URI is formed by concatenating the namespace URI and a local name. OMDoc, on the other hand, addresses symbols by a triple of cdbase (theory graph), cd (theory) and [local] name (cf. section 2.3.3). This difference is largely conventional and does not hinder the integration of OMDoc with RDF-based semantic web ontologies. The only situation where the difference needs to be overcome is where an existing semantic web ontology is rewritten in OMDoc, e. g. for the purpose of documenting it or making its modular structure more explicit, and whenever an OMDoc ontology imports a semantic web ontology. In order to have OMDoc ontologies generate RDFstyle URIs, we allow for attaching the namespace URI of the original ontology to a theory via the special metadata field odo:semWebBase, which is recognized by our OMDoc→OWL translation presented in the following section. Figure 2.8 shows how this would be done for FOAF. This makes sure that the OMDoc→OWL translation gives the Agent class its correct URI, i. e. http://xmlns.com/foaf/0.1/Agent. We can create an OMDoc theory from a semantic web ontology by simply providing a suitable odo:semWebBase metadata field, only adding symbol declarations, definitions, axioms, etc., later. This is a low-cost way for starting OMDoc-based ontologies which, does not preclude making use of OMDoc’s possibilities for documentation and expressive knowledge representation later. Thus we have a suitable migration path from web ontologies to OMDoc.
49
OMDoc has a foundationally unconstrained infrastructure for type systems: objects can be associated with types that are objects themselves. The particular choice of types is only governed by the available theories. Here we define types as part of the RDF, RDFS, and OWL theories.
51
2 Representing Semiformal Mathematical Knowledge
Listing 2.7: An OWL ontology in OMDoc: class definition and documentation For our "university" ontology, we first import FOAF and then introduce the concept of a student. ... A student A student is a person who is enrolled at least once. 1
52
2 Representing Semiformal Mathematical Knowledge
Listing 2.8: An OMDoc ontology with a semantic web namespace URI Friend of a Friend (FOAF) vocabulary
2.4.3.5 Reasoning Our intention with promoting OMDoc as a more expressive semantic web ontology language is not to replace well-tried technologies for semantic web reasoning. While OMDoc does, in principle, allow for alternative approaches to reasoning, being an exchange format for automated theorem provers, this is not the objective of our work. So in order to allow for writing expressive ontologies in OMDoc while still being able to use optimized reasoners on their tractable/decidable fragments, we defined and implemented a translation from OMDoc to OWL as a module within our Krextor XML→RDF extraction framework (cf. section 3.7.3). This choice was influenced by the prior availability of Krextor and by compatibility considerations. Support for extracting structural outlines of OMDoc documents as RDF in terms of the OMDoc ontology (cf. section 2.4.1.1) had already existed before; parts of that could be reused for the OWL translation. The other reason was that RDF/XML [Bec04b] is the only OWL syntax that all compliant tools are required to support. While the implementation of the translation is hard-coded, we aim at giving an exact specification by OMDoc axioms: There is, for example, a set of direct subject–predicate–object axioms (cf. section 2.4.3.3) in the OWL theory that state that any application of the owl#Restriction symbol to suitable arguments translates to an anonymous RDF resource of type owl:Restriction that has certain RDF properties. Extracting RDF triples from OMDoc symbol declarations and axioms is mostly straightforward, but the generation of correct URIs for entities of semantic web ontologies is more involved. We traverse the graph of theory imports and collect the namespace URIs of all theories that carry an odo:semWebBase metadatum. Whenever we encounter a reference to a symbol onto#sym for an ontology that is implemented as an OMDoc theory onto, we generate the semantic web compliant URI as the concatenation of the namespace URI of the theory and the name of the symbol. Listing 2.9 shows the RDF generated from the example introduced in listing 2.7 above in Turtle notation. The class, which a student is defined to be equivalent to, is represented as a union class of a set of classes, represented as a linked list. Most of the statement- and theory-level structure of OMDoc, such as the distinction between defined and inferred statements and theory morphisms, is lost and uniformly translated to less expressive OWL axioms. Preserving the informal documentation of the definition as an OWL 2 axiom annotation is not yet supported. The output shown in listing 2.9 has been obtained by post-processing Krextor’s output. The actual output, shown in listing 2.10 is less legible: Neither does it abbreviate namespaces by prefixes, nor does it use Turtle’s syntactic sugar for blank nodes and RDF’s linked list data structures.
53
RDF to OMDoc
2 Representing Semiformal Mathematical Knowledge
Listing 2.9: RDF generated from an OMDoc theory (somewhat pretty-printed) a owl:Ontology ; owl:imports foaf: . a owl:Class ; dc:description "A Student" ; owl:intersectionOf (foaf:Person [ a owl:Restriction ; owl:minCardinality "1"^^xsd:nonNegativeInteger ; owl:onProperty ]) .
Listing 2.10: RDF generated from an OMDoc theory (Krextor’s raw Turtle output) a ; owl:imports foaf: . rdf:type owl:Class ; owl:equivalentClass _:d24e43 . _:d24e43 owl:intersectionOf _:collection-d24e44 . _:collection-d24e44 rdf:first rdf:rest _:collection-d24e44-1 rdf:first rdf:rest _:d24e47
foaf:Person ; _:collection-d24e44-1 . _:d24e47 ; rdf:nil .
rdf:type owl:Restriction ; owl:onProperty ; owl:minCardinality "1"^^xsd:nonNegativeInteger .
However, Krextor’s OWL output does not address humans, but machines. Machines do not require pretty-printing, and all OWL tools are required to support the RDF representation (RDF/ XML, which Krextor can also generate, but which would be even harder to read and thus is not shown here). Thus, our translation works like a compiler and linker that creates (OWL/RDF) object code from a higher-level OMDoc source code. Output for humans will be addressed in the following section. 2.4.3.6 Documentation and Presentation OMDoc comes with an elaborate, adaptive presentation framework for creating human-readable documents from semantic markup. Mathematical formulæ are rendered as Presentation MathML; structures on the statement and theory levels, and complete documents, are rendered as XHTML. For every mathematical symbol, one or more notations can be defined – compare, e. g., our initial OWL example in the German DL notation (Student = Person ⊓ ≥ 1 enrolledIn) vs. the
54
ref
2 Representing Semiformal Mathematical Knowledge
Manchester syntax [HPS09]: Class: Student EquivalentTo: Person that enrolledIn min 1
A default notation is usually provided by the author of a theory, but users can also author their own ones to customize the presentation to their preferences. Initially, the renderer collects all available notation definitions from all imported theories. For every symbol in a content formula as the one in Listing, the renderer selects from those notation definitions that match the symbol the most appropriate one for the current presentation context, which is made up of, e. g., the language of the enclosing document, the domain of application, or user preferences. The output is parallel markup [Aus+09, section 5.4], which allows for implementing additional services that facilitate browsing and reading – for example linking rendered symbols to the place where they are introduced. A reader who does not know, e. g., the symbol ⊓ in our sample formula, can click on it and thus navigate to the section of the document rendered from the owl OMDoc theory that declares (and documents!) the symbol owl:intersectionOf . I have implemented this in SWiM using XLinks; the JOBAD framework for interactive documents even displays definitions as tooltips without forcing the user to leave the document (cf. section 3.6.5.1). Documentation can be given in metadata blocks (cf. section 2.4.4.3), which can be attached to any element on the statement and theory level (cf. Listing 2.7). Textbook or literate-programming style is also possible: A theory can not only contain formal statements but also informal text sections, and definitions, axioms, and theorems can have both formal and informal content (CMP and FMP; cf. Listing 2.7). FOAF evaluation: section 5
2.4.4 Markup for Metadata In the previous section, I presented how semantic web ontologies can be integrated with OMDoc theories and then enhanced by a more elaborate documentation. In this section, I will describe a new metadata framework for OMDoc that allows ontology-based implementations of metadata vocabularies to be used in mathematical documents. This meets the requirement to introduce additional metadata into OMDoc, which has been triggered by the use of OMDoc for lecture notes and exercises, as well as technical specifications.50 Starting with a review of the metadata support of state-of-the-art markup languages for semiformal mathematical knowledge, I will review on the metadata support of the current OMDoc version 1.2 in more detail, and then introduce a new, extensible metadata schema for OMDoc. 2.4.4.1 State of the Art The mathematical markup languages introduced in section 2.3 feature various degrees of metadata support, both in terms of the coverage and extensibility of the vocabulary (cf. section 2.2.6.3), and in terms of where metadata can be given. In the worst case, they do not support metadata at all – which is usually the case with languages for formalized mathematical knowledge. MathML and the formula language of OpenMath support annotation of arbitrary (sub-)expressions (called 50
Personal communication with Michael Kohlhase, July to September 2008. The original e-mail thread, not mentioning use cases, has been archived [Koh08b].
55
ref FOAF screenshot
ref navigation
2 Representing Semiformal Mathematical Knowledge
attribution in OpenMath), but there is no clear guideline on how to use metadata in the classical sense of section 2.2.6.51 OpenMath content dictionaries and CNXML have fixed idiosyncratic metadata vocabularies. Both of them have significant intersections with Dublin Core, but no formal mapping has been specified, which restricts the interoperability of OpenMath and CNXML metadata. CNXML only allows metadata on document level. In OpenMath CDs, they can be given for CDs and, partly, for symbol definitions. MathLang does support annotation of almost every element of a document with metadata, but the “vocabulary” is restricted to four terms related to type checking [Til06]. OMDoc 1.2, and, similarly, STEX, have hard-coded support for two metadata vocabularies, as will be detailed below: Dublin Core (the older Metadata Element Set), which has been integrated in a straightforward way but with some idiosyncratic extensions, and an XML syntax for the predecessor of ccREL. Those two vocabularies reside in their own XML namespace each, which facilitates extensibility by further vocabularies to some extent, compared to other languages mentioned above. A whole additional vocabulary can be incorporated into OMDoc at once by simply introducing a new module into the OMDoc RELAX NG XML schema, which declares one element per vocabulary term in a certain namespace; in many cases, metadata vocabularies already come with a ready-to-use XML schema implementation. Still, one would have to touch the schema and specify the semantics separately. The OMDoc-based e-learning environment ActiveMath makes use of additional vocabularies for educational metadata, including LOM (cf. section 2.2.6.3) and custom ones, but they are hard-coded into the main XML schema in a less extensible way than in OMDoc 1.2, as they are not distinguished by different namespaces [Mel+03; Gog+04]. In OMDoc, metadata can be attached to almost every element on the document, theory, and statement levels [Koh06b, chapter 12]. Finally, RDF has been presented as a format for representing semiformal mathematical knowledge. By its very nature, and thanks to the fact that most metadata vocabularies have been implemented as (RDF-based) ontologies, an RDF-based knowledge representation allows for annotating every object of interest with metadata and can flexibly be extended by arbitrary metadata vocabularies. This holds for all encodings of the RDF data model and thus also includes RDFa, the encoding of RDF annotations inside XML languages. However, as said in section 2.3.8, RDF has not seriously been used by authors of mathematical knowledge so far. formalized math languages
2.4.4.2 Metadata in OMDoc 1.2 (State of the Art) The current version 1.2 of OMDoc supports general metadata about documents, such as titles, authorship, language usage, or administrative aspects like modification dates, distribution rights, and identifiers have been covered. This is achieved by syntactically reusing the Dublin Core and Creative Commons metadata vocabularies, i. e. providing XML elements for all of their properties, plus a few extensions. The OMDoc module DC comprises the basic Dublin Core Metadata Element Set [Dcm]. OMDoc additionally allows for assigning roles to dc:creators and dc:contributors (e. g. “author”, “editor”, or “translator”) using an additional @role attribute to these elements, whose value is a MARC 51
In section 2.4.4.3, this will be enabled at least for MathML and OpenMath expressions in OMDoc documents.
56
2 Representing Semiformal Mathematical Knowledge
Listing 2.11: OMDoc 1.2 proof of Fermat’s theorem, with a revision history of historical attempts (assumed to be a digital library edition) Proof of Fermat’s Last Theorem Pierre de Fermat Andrew Wiles Michael Kohlhase 1637-06-13T00:00:00 1995-05-01T00:00:00 2006-08-28T00:00:00
relator code [Mar]. Furthermore, a simple vocabulary for recording revision histories has been added to dc:date: The additional @who attribute refers to the URI of a dc:creator or dc:contributor in the same metadata record, and the @action attribute refers to an action out of the set “updated”, “created”, “imported”, “frozen”, “review-on”, and “normed”. Both of these features and their syntax have been inspired by the Open Packaging Format52 (OPF [Opf]). In practice, these extension have been used rarely. For rights management, the ccREL [Abe+08] vocabulary has been added as the OMDoc module CC. The rights management markup in OMDoc had been designed before Creative Commons started recommending RDFa, but as embedded markup was required and Creative Commons at that time only suggested the workaround of putting RDF/XML into XML comments of the document to be annotated, Michael Kohlhase, the developer of OMDoc, modeled a custom XML syntax, closely but not exactly following the Creative Commons RDF schema. An example featuring many of the OMDoc 1.2 metadata elements is shown in listing 2.11. This way of representing metadata has various drawbacks: The vocabulary is hard-coded and can only be extended by extending the OMDoc XML schema by additional namespaces for new vocabularies. Secondly, OMDoc is not aware of the formal semantics of its metadata vocabularies. They have been integrated into the syntax of OMDoc, but their semantics is only available informally 52
formerly known as Open eBook
57
2 Representing Semiformal Mathematical Knowledge
as a part of the natural-language specification of OMDoc [Koh06b, chapter 12]. A more formal, RDF-based semantics would be available, via the RDFS ontologies of Dublin Core and Creative Commons, but those have not been incorporated into the OMDoc 1.2 specification. Still, an RDF representation can be obtained from these metadata by a straightforward translation (cf. section 2.4.2). The MARC relator extensions to dc:creator, dc:contributor, and dc:publisher53 can also be translated that way: There is an RDF implementation, and it does not conflict with Dublin Core, as the relators have been modeled as subproperties of DCMES properties [DCM05]. However, for OMDoc’s DC extension for revision histories there is no straightforward translation to RDF triples at all: They are syntactically written like annotations to the actual triples, e. g. #fermat-proof , dc:date, 1637-06-13T00:00:00. This could be modeled by RDF reifications – which turn triples into resources that in turn can be annotated –, but the latter are widely considered problematic (see, e. g., [CS04]). Finally, the semantics of the @action attribute has not been specified at all. The OMDoc 1.2 specification merely states that “recommended values include the short forms updated, [rest mentioned above] with the obvious meanings”54 , and that “other actions may be specified by URIs pointing to documents that explain that action” [Koh06b, section 12.1]. This lack of formal semantics, or of semantics altogether, makes it hard to implement application support. Support for OMDoc metadata has so far only been implemented in my own semantic wiki SWiM (cf. chapter 4), and in ActiveMath [Gog+04]. 2.4.4.3 The new OMDoc Metadata Framework Given the need to incorporate additional metadata into OMDoc, and considering the deficiencies of the metadata support in OMDoc 1.2, we developed a new framework. The requirements were as follows: 1. Stay backwards-compatible with OMDoc 1.2 concerning expressivity. That is, continue supporting Dublin Core and Creative Commons, and the custom extensions. 2. Expose the formal semantics of metadata vocabularies to OMDoc-based applications; additionally be compatible to semantic web applications. 3. Incorporate a vocabulary for versioning – particularly aiming at technical specifications. 4. Do not hard-code a fixed set of vocabularies into the language but stay flexible and extensible for many applications, including future and unknown ones. Given the fact that many existing metadata vocabularies, including Dublin Core and Creative Commons, have an RDF semantics (cf. section 2.4.1.3), and that with RDFa (cf. section 2.3.8) a standard for flexibly embedding metadata into XML had recently stabilized, we chose to incorporate RDFa into OMDoc, and to look for metadata vocabularies with RDF-based implementations to satisfy our further requirements. So far, RDFa has only been specified for the “host languages” XHTML [Adi+08]. The specification is generally biased towards XHTML but nevertheless foresees a future adoption of RDFa For the latter, a @role has note been specified. “Whenever anyone says ‘you know what I mean’, you can be pretty sure that he does not know what he means, for if he did, he would tell you.” – James Davenport citing his father [DK09] 53
54
58
relocate to rhetorics relocate to rhetorics ontology
2 Representing Semiformal Mathematical Knowledge
as an annotation sublanguage by other XML languages. The vector graphics format SVG Tiny already includes RDFa in the same way as XHTML, referring to the XHTML +RDFa specification but making a few minor deviations from it. Other languages are starting to adopt RDFa as well [IL09]. Full RDFa in OMDoc After initial discussions on how much of RDFa to incorporate into OMDoc, we decided to give authors who want to model complex annotations freedom to use the full expressivity of RDFa, but to particularly recommend a metadata syntax that resembles the one of OMDoc 1.2 and allows for expressing most metadata that could also be expressed there. The other reason for fully integrating RDFa is compatibility to RDFa tools. When publishing the sources of OMDoc documents on the web, linked data crawlers like Sindice [TDO07] may find them. While they would not be able to make any sense of OMDoc’s own XML vocabulary (e. g. understanding that a proof element denotes an instance of the oo:Proof class), they would at least be able to understand the annotations made in RDFa, and thus enable users to search for, e. g., OMDoc resources having the dc:creator Michael Kohlhase. A full integration of RDFa means that the following attribute have to be added to OMDoc, with the same semantics as specified for XHTML +RDFa (quoted from [Adi+08]; technical terms explained below): @rel a whitespace-separated list of CURIEs, used for expressing relationships between two resources (‘predicates’ in RDF terminology); @rev a whitespace separated list of CURIEs, used for expressing reverse relationships between two resources (also ‘predicates’); @content a string, for supplying machine-readable content for a literal (a ‘plain literal object’, in RDF terminology); [XHTML-specific attributes omitted] @about a URI or safe CURIE, used for stating what the data is about (a ‘subject’ in RDF terminology); @property a whitespace separated list of CURIEs, used for expressing relationships between a subject and some literal text (also a ‘predicate’); @resource a URI or safe CURIE for expressing the partner resource of a relationship that is not intended to be ‘clickable’ (also an ‘object’); @datatype a CURIE representing a datatype, to express the datatype of a literal; @typeof a whitespace separated list of CURIEs that indicate the RDF type(s) to associate with a subject. A CURIE (Compact URI, specified as a part of RDFa, but also in a specification of its own [BM09]) is a way of abbreviating a URI as namespace:localname, but in contrast to XML local names, the local name definition of SPARQL [PS08] is used, which is more liberal, e. g. permitting leading
59
2 Representing Semiformal Mathematical Knowledge
digits. As in SPARQL, the underscore prefix is reserved for blank nodes, such as _:bnode-id, and names in the default namespace are written with an empty prefix, i. e. as :localname. However, the latter namespace is not intended to be the default namespace declared in the surrounding XML, but a fixed namespace specified for the language. In addition to that, CURIEs also allow for completely unprefixed names, such as localname, which can be reserved words whose mapping to URIs is specified as a part of the language specification. The mappings to URIs for the default namespace and for unprefixed names have been specified for RDFa in XHTML, but as there is currently no standard way of declaring these mappings for a different host language, e. g. in its XML schema, we do not anticipate that any RDFa-aware software – except our own; see section 3.7.3 – would be able to interpret such CURIEs. Therefore, we leave the specification of how OMDoc should handle such CURIEs as future work. Some RDFa attributes allow URIs and CURIEs, which are generally hard to distinguish.55 Therefore, a CURIE in such an attribute has to be surrounded by square brackets. This syntax is called “safe CURIE”. Also note that full RDFa compatibility leads to a syntactical redundancy in all OMDoc elements that carry metadata. In OMDoc 1.2, it was clear (by the human-readable specification, not necessarily for machines!) that metadata contained in an XML element E referred to the concept denoted by E, e. g., that the dc:title in listing 2.11 is the title of the proof with the URI #fermat-proof. RDFa requires the subject of annotations to be set explicitly, using the @about attribute: ...
Otherwise the parent subject would be reused, which is initially the base URI, i. e. , unless specified otherwise, the URI of the whole document – which may, of course, contain many other metadata records. RDFa in XHTML is often used for talking about different things than the elements of the XHTML document itself, such as the book described in a paragraph of the document, except for annotations on the top level for expressing, e. g., the document’s author and license. In contrast, metadata in OMDoc are always intended to be annotations for the things modeled in the document, such as theories or statements. It is recommended for all of these things to have a URI, which is defined by the @xml:id attribute.56 It would be tempting to specify that, for elements that have metadata and an @xml:id, the RDFa subject of the metadata annotations implicitly gets set to the URI of the respective element. One could even specify that, if an element carrying metadata does not have an @xml:id, a blank node will be generated for it. However, XHTML is – and will always be – much more widespread than OMDoc, RDFa has first been designed for annotating XHTML and is still currently biased towards XHTML, and RDFa-aware software will probably not be able to handle custom reinterpretations of the RDFa syntax and semantics soon, at least not as long as there is no way of specifying them in a machine-understandable way57 . Now suppose we had an OMDoc document at an URI 55
The incoherent use of URIs vs. CURIEs in the RDFa attributes is likely to change in future versions [Bir09]. The MMT URIs of OMDoc 1.6 will enable additional ways of giving URIs to OMDoc concepts, but from an RDFa point of view the principle remains the same (cf. section 2.3.3). 56
57
60
HTML5 itemscope
2 Representing Semiformal Mathematical Knowledge
Element meta link resource
Attributes @property, @content, @datatype @rel, @rev, @resource @about, @typeof
Children literal text or XML (optional) (resource|meta|link)* (meta|link)*
Table 2.3: Elements of the recommended RDFa syntax for OMDoc metadata U containing a proof with RDFa metadata but without an explicit @about attribute. Suppose the relation of the proof to the theorem it proves were, for some reason, not modeled in OMDoc syntax, but in RDFa, using the OMDoc ontology (cf. section 2.4.1.1), i. e. as , which is perfectly legal. An RDFa crawler not knowing OMDoc would extract the triple oo:proves from that annotation. From the domain of the oo:proves property, any RDFS reasoner would then infer that U is an instance of oo:Proof , which is clearly not the case; actually, this would even lead to a contradiction for an OWL reasoner, as oo:Proof is disjoint with oo:Document, of which U actually is an instance. Realizing that the web should not be polluted with such invalid RDF triples58 , we therefore specify that RDFa metadata in OMDoc must only be used together with correctly placed @about attributes. A relaxation of this policy is subject to future additions to the RDFa specification that might allow for defining parsing rules specific to particular host languages. Recommended Syntax for RDFa Metadata I will not cover full RDFa in further detail here; for an introduction, see [AB08; HHA08]. Instead, I will continue with the recommended syntax for using metadata: We introduce the elements meta and link as children of any metadata block.5960 Their semantics is roughly inspired by the namesake elements that can occur in the head of an XHTML document: meta is a literal-valued metadata field, whereas link points to another resource by referring to its URI. Resources with document-local identifiers only, i. e. blank nodes, can be created using the resource element. The elements are shown in table 2.3; an example for using them is given in listing 2.12. Relevant Metadata Vocabularies Due to the inherent flexibility of RDFa, any metadata vocabulary can be used. However, we give particular recommendations for metadata in the abovementioned domains of special interest. Using Dublin Core and Creative Commons metadata with the new RDFa syntax for OMDoc is largely trivial. Concerning Dublin Core, we recommend using the more modern DCMI terms vocabulary instead of the DCMES, which is now possible by way of a simple namespace declaration. While the MARC roles had been used as annotations of triples with the dc:contributor property in OMDoc 1.2, there is a specification of how to use them in RDF, defining them as sub-properties of dc:contributor [Joh05]. Most Creative Commons 58
See also the “Pedantic Web” initiative [HC09]. Actually, the link element has existed before, as a part of OMDoc’s rich text (RT) module [Koh06b, section 14.6]. However, this usage does not conflict with its usage as a metadata child. 60 Note that the metadata element does not exist for RDFa processors, as it does not carry any RDFa attributes. It is merely a means of structuring the OMDoc syntax. 59
61
cite mail thread
metadata element now optional
2 Representing Semiformal Mathematical Knowledge
license declarations will become much easier than in OMDoc 1.2, as we will follow the more recently recommended practice of not always constructing licenses from scratch, but directly linking resources to existing Creative Commons licenses using the xhv:license property61 ; for example .
It should also be noted that the OMDoc 1.2 syntax allowed for constructing licenses that contradicted the ccREL ontology. For example, it was possible to say , although cc:DerivativeWorks is not in the range of the property cc:prohibits.62 The OMDoc 1.2 Dublin Core extensions for revision logs were not immediately RDF-compatible, as outlined in section 2.4.4.2. We were able to partly replace them by the revisioning vocabulary of DCMI terms (cf. section 2.2.6.3). Listing 2.12 shows the proof of Fermat’s last theorem once more, now redone using RDFa metadata, and using DCMI terms for the revision history. Comparing this to listing 2.11, particularly note the following features: • We are able to link to resources, such as FOAF profiles (cf. section 3.2.2), that describe people (creators, contributors, etc.) in further detail. • More than one predicate can be given per subject and objects. This makes it convenient to say that a person is both an editor and a publisher of a document.63 • The complete revision history can be embedded into the document. • Versions (or persons, or licenses) can also be described (as blank nodes) if they are only known in this document, i. e. are not globally identifiable by a URI. • The DCMI Terms vocabulary allows for modeling the history of revisions more faithfully than the Dublin Core extensions of OMDoc 1.2. We can use more specific subproperties of dct:date, such as dct:created or dct:issued. Date can be made really explicit to automated parsers by declaring a datatype for them; otherwise the parser would have to know that dct:date and its subproperties usually have an ISO 8601 date value [BM04b], or it would have to apply heuristics. Successive revisions can be modeled as a linked list via dct:replaces, in addition to referring to them by dct:hasVersion. We did not model Michael Kohlhase’s digitalization of Wiles’s proof as such a replacement, but as a resource that is based on Wiles’s proof via the dct:requires and dct:source properties. • The license of this document is a ready-to-use Creative Commons license that can simply be referenced by its URI. Alternatively, we can construct it in place. This property from the XHTML vocabulary supersedes the former cc:license property [Abe+08]. By the implementation of the ccREL ontology, this property is also a subproperty of dc:license, which in tuen is a subproperty of dc:rights (cf. section 2.2.6.3). 62 Given that semantic web reasoning usually assumes an open world, one cannot easily conclude from the absence of the permission to create derivative works that it is prohibited [Her+08]. Therefore, it is unclear whether one can effectively prohibit derivative works using the ccREL vocabulary. This Orwellian approach to restricting thinking about illiberal licenses by restricting language (cf. [Orw49]) may be debatable, but the ccREL ontology currently specifies it like this, so we have to accept it for the sake of compatibility, or – eventually – model our own licensing ontology that extends ccREL. 63 marcrel:AUT is only a subproperty of dc:contributor. 61
62
2 Representing Semiformal Mathematical Knowledge
Listing 2.12: Proof of Fermat’s last theorem, with OMDoc’s new RDFa metadata Proof of Fermat’s Last Theorem 1637-06-13T00:00:00 1995-05-01T00:00:00 2006-08-28T00:00:00
63
2 Representing Semiformal Mathematical Knowledge
Compared to OMDoc 1.2, one aspect cannot be expressed with DCMI Terms: the actions that lead to new revisions. One state-of-the-art ontology that offers the desired expressivity is the Ontology Metadata Vocabulary [Har+07; Pal+09] for describing ontologies – by section 2.4.3, also including mathematical documents. Instances of omv:Ontology can be arranged into a list linked via omv:hasPriorVersion. As an overlay list to the mere sequence of revisions, a sequence of changes can be given. An omv:ChangeSpecification connects two ontology versions by its properties omv:changeFromVersion and omv:changeToVersion and consists of a set of one or more omv:Changes chained together by omv:hasPreviousChange. A change has an author (an omv:Person), a date, and a few more properties. OMV offers a lot of change subclasses specific to RDFS and morphism OWL ontologies; we could easily add change types for mathematical documents, theories, or dct↔OMV statements, e. g. a change type for adding a type declaration to a symbol. Also there is potential for interaction rules between DC and CC, e.g. if BY(D) and dc:creator(D,A) then ....) – Interesting. Yes, why not. But then I vote for the following plan 1. first do this as a part of the OMDoc spec to learn how it works 2. but then don’t keep it within the OMDoc standard, but try to convince the CC developers 3. only if they don’t like our approach, keep it in the OMDoc standard, otherwise contribute it to CC and refer to it from OMDoc. 1637-06-13T00:00:00 2006-08-28T00:00:00
BegOP(12) 12
Old Part: integrate
64
2 Representing Semiformal Mathematical Knowledge
Pragmatic Metadata As the listing in Sect. 2.4.4.3 shows, the new RDFa-based metadata syntax is much more verbose than the old one of OMDoc 1.2. Therefore, we suggest two ways of facilitating the annotation: For manual authoring, one can keep the old, “pragmatic” OMDoc 1.2 syntax and specify a transformation of such annotations to the new, “strict” RDFa syntax – implementable, e. g., in XSLT. also consider STEX as an even more pragmatic metadata syntax . Respecifying Metadata Inheritance As I modeled our metadata ontologies in OMDoc, I am now able to extend it by a formal specification of certain rules that had only informally been stated in the OMDoc 1.2 specification: for example, that most DC metadata propagate from document sections down into subsections unless subsections specify different values, or that any dc:creator of a subsection of a document becomes a dc:contributor to the whole document. model formally in DL (give example): insection ○ creator ⊑ contributor @inherits, compare ActiveMath new contribution: can also add metadata from RDF ontologies to OM terms (as attributions) require importing ontologies when used as CURIEs?
2.4.5 Preserving Semantic Structures when Publishing As will be shown in chapter 3, services operating on mathematical documents require knowledge about their structural properties. In a web-publishing scenario, it makes sense to also enable such services to operate on published documents that have been generated from a semantic markup source. The original source may not always been publicly accessible (but stored in a database backend), or it may not be linked from a published document that a web crawler indexes or that a user shares with other users. Therefore, the semantic structures that are explicit in the semantic markup also need to be preserved in the published version. Here, I will mainly consider web publishing in XHTML +MathML, where the semantic structures can be preserved in two ways: Rendered formulæ will be interlinked with their semantic representation using parallel markup, and structures on the statement, theory, and document levels, which are rendered as XHTML, will carry RDFa annotations. Some remarks will also be made about other output formats, such as PDF. 2.4.5.1 Preserving Object-Level Structures The publishing process on the object level takes OpenMath or Content MathML objects as input and generates Presentation MathML. As introduced in section 2.3.1, a Presentation MathML formula can be annotated with a content-oriented representation of the same formula. Suppose we had a rendering algorithm ρ∶ C → P creating presentation-only markup from content markup. This algorithm can easily be extended to an algorithm ρ ′ that produces parallel markup:
65
align with 2.4.4.3
EndOP(12)
2 Representing Semiformal Mathematical Knowledge
< semantics > ρ(c) < annotation − xml > ρ′ ∶ c ↦ c < /annotation − xml > < /semantics > The utility of parallel markup can be greatly enhanced, as we will see in section 3.1, by adding fine-grained links between both representations, covering the subterm structure down to individual symbols. MathML does not prescribe the direction of these links (cf. [Aus+09, section 5.4]), but as we are mainly interested in accessing the semantic structure of a mathematical expression from its human-readable presentation, we require that they point from presentation to content markup. This is simply most intuitive in that case but also has the particular advantage that it enables cross-linking for n-ary operators. Consider the content-oriented representation @(plus, a, b, c), which would be rendered as a + b + c. The “plus” operator occurs once in the content-oriented representation, but n − 1 times in the rendered presentation. As XML links are injective (i. e. multiple links can have the same target, but from the same origin there can be at most one link) and established by giving potential link targets unique IDs (e. g. “expr1”) and letting the link sources point there by @xref attributes (e. g. xref=“#expr1”), the “plus” operator can only be cross-linked when giving its content representation an ID and letting the rendered presentations point there.64 In our further work with parallel markup, we rely on the algorithm implemented by JOMDoc (cf. section 3.1.1). Instead of a full description of how that algorithm creates cross-linked parallel markup, we emphasize its two most challenging aspects: (i) For every content element to be crosslinked, a unique ID has to be generated. (ii) In pattern matching notations, all sources and targets of cross-links have to be identified in the presentational fragment and the content markup pattern, respectively. A prerequisite for cross-linked parallel markup of subterms is that subterms are marked up in the rendered presentation. From a purely visual point of view, this is not required, as the experienced human reader knows how to read brackets and how strong operators bind when no explicit brackets are used; i. e. there is no need to group the 2x in 2x +1. For software that does not have this experience but still wants to translate a user’s selection of 2x back into @(times, 2, x), this subterm needs to be marked up, though, and Presentation MathML supports that. The invisible mrow element explicitly groups its content into a subterm. Certain other Presentation MathML elements have this grouping property: the constructors for fractions, radicals, super-/sub-/under/overscripts, table cells, and a few others; see [Aus+09, table 3.1.3.2] for a complete list. 64
The MathML specification does not prescribe a direction for these cross-references, for good reasons. It is anticipated that applications that do not need certain annotations of a MathML formula remove them; therefore “in absence of other criteria [which we have!], the first branch of the semantics element is a sensible choice to contain the id attributes. Applications that add or remove annotations will then not have to re-assign these attributes as the annotations change.” [Aus+09, section 5.4] Moreover, our choice of direction has solely been influenced by the injectivity requirement mentioned above; the MathML specification emphasizes that “the direction of the references should not be taken to imply that sub-expression selection is intended to be permitted only on one child of the semantics element. It is equally feasible to select a subtree in any branch and to recover the corresponding subtrees of the other branches.” [Aus+09, section 5.4]
66
introduce this notation somewhere
2 Representing Semiformal Mathematical Knowledge
2
a + b 2 c + d
2
Figure 2.4: Parallel markup: Presentation markup elements point to content markup elements. The light gray range is the user’s selection, with the start and end node in bold face. We first look up their closest common ancestor that points to content markup, and then look up its corresponding content markup – here: E.2 From a single presentation-markup element, the corresponding content markup element can simply be looked up by traversing the @xref link. For a range of presentation markup selected by the user (e. g. with the mouse), this is less trivial, unless the user interface restricts the possible selections the user can make to subterms. One solution is to locate the closest common ancestor of all selected presentation elements65 that carries an @xref attribute and traverse that link. An example is given in figure 2.4. Since the JOMDoc rendering algorithm supports pattern-matching-based and thus non-compositional translations from content to presentation markup, not every content subterm corresponds to a presentation subterm. For example, in the presentation element corresponding to sin2 x, there will be no contiguous subexpression pointing to the content expression sin x. Paul’s example: select b + c inside a + b + c
Alternatives to Parallel Markup All structural information about a mathematical expression can be preserved by consequently using cross-linked parallel markup, as described so far. However, certain types of information may not be conveniently or efficiently accessible to software. I am aware of the following two cases: (i) structural information that is not trivially or not at all present in the original content-markup input and that will be required during interaction with the presentation markup (ii) subterm annotations that an application chooses to visualize as switchable displays. This will be elaborated in the following paragraphs: 65
Due to the linearized tree structure of XML, this can practically be implemented as looking up the closest common ancestor of the start and the end node of the selection.
67
2 Representing Semiformal Mathematical Knowledge
Certain structural information that was not immediately present in the content-markup input to the rendering process, such as information about precedences of operators, which the renderer needs for correctly generating brackets, but which an interactive service that dynamically displays and hides redundant brackets can also draw on (cf. section 3.6.4.2), could be represented by attributions to the occurrences of the operators in the content markup output by the renderer. But then it would only be accessible from a presentation-oriented interface (e. g. an interface that allows the user to select expressions with the mouse) via two indirections: first content-markup lookup, then content attribution lookup. Such information can be provided in a more lightweight way by attaching custom XML attributes from a non-MathML namespace directly to Presentation MathML elements (cf. [Aus+09, section 2.3.3]). Not only redundant brackets are an example of information that might be displayed or hidden on demand. Descriptions, labels, or natural-language abbreviations for subterms are another such case. In the content-markup input, they can be provided as attributions to subterms. Among other possibilities, they can presented to the user in a way that allows him to interactively switch between the formal subterm and its informal abbreviation. Presentation MathML natively allows for representing such display alternatives using the maction element [Aus+09, section 3.7.1]. maction is a container for at least one Presentation MathML subexpression, plus additional display or interactivity parameters, possibly given as additional child elements; the @actiontype attribute allows for distinguishing between different purposes of using maction. We will mostly use it as a container for several Presentation MathML expressions that can be displayed alternatively. The @selection integer attribute controls which child is displayed. Besides a few suggestions, MathML does not prescribe definitive values for @actiontype, so we will introduce ones for the interactive services that we are going to offer (cf. section 3.6). Here, we will demonstrate the use of maction for the above-mentioned subterm abbreviations. While the renderer could leave the abbreviations as attributions in the content markup and leave it to the user interface software to retrieve them from there and put them into mactions, those maction elements, being valid Presentation MathML, can as well be directly generated by the renderer. Consider a physics document, where the author provides Wpot (R) (potential energy) as an instructive abbreviation of the complex term −e 2 4πє 0 R/2 . We introduced the OpenMath symbol folding#abbrev that serves as an attribution key and provided a notation definition that matches content-markup expressions attributed that way and renders them as mactions. For example,
Wpot (R) −e 2 4πє 0 R/2
is rendered as −e 2 4πє 0 R/2
Wpot (R)
68
2 Representing Semiformal Mathematical Knowledge
Summarizing, we pose the following additional requirements at the usage of Presentation MathML for preserving semantic structures. All of them are compliant with the MathML specification. 1. Alternative displays, among which the user can switch, should66 be realized by maction elements and an @actiontype attribute that indicates the intended type of interaction. 2. Subterms that are not yet enclosed an mrow or one of the grouping operators listed in [Aus+09, table 3.1.3.2] must be grouped using the invisible mrow element. 3. For services that need access to the semantics of mathematical expressions, the latter must be provided as parallel content markup [Aus+09, section 5.4]. There must be cross-links from all atoms and from the subterm-grouping elements of item 2 to the corresponding content elements. Content elements may be annotated with additional attributions. 4. If services that directly operate on the presentation markup to customize its display require efficient accesss to specific annotations they may be added directly to presentation markup elements as attributes from a non-MathML namespace. It is recommended that such annotations require considerably less additional space than parallel markup; otherwise parallel markup should be preferred. Complexity Considerations Compared to generating presentation markup only, adding content markup roughly doubles the size of the output, assuming that the XML element and attribute names for content markup on average have the same length as those for presentation markup. This can be seen by structural recursion over the constructors for content markup: Atoms, i. e. symbols, variables, and numbers, are represented by a single element both in content markup and in presentation markup.67 Applications of operators – and binders similarly – also have the same size in content and presentation markup, as in both the operator and its arguments occur in either representation68 , and in most cases there is a grouping element around them in either representation69 . Cross-links increase the size enlargement factor to up to four, as an ID attribute is added to almost every content markup element, and a link to one such content markup element to almost every presentation markup element. mactions, when nested, can lead to an exponential blowup to the base of the number of alternatives in the worst case. This could be avoided if Presentation MathML allowed for structure sharing between expressions (which only Content MathML supports so far [Aus+09, section 4.2.7]
66
Here, and in the remainder of this thesis, we use these capitalized keywords in accordance with RFC 2119 [Bra97]. Content-markup symbols tend to be more verbose than presentation-markup symbols, though: in the worst case, they carry a complete CDBase+CD+Name URI represented by one to three attributes, whereas a presentation-markup symbol in the best case only consists of an XML element that contains a single character, such as +. 68 This figure is correct in the case of prefix, postfix, and binary infix operators. In the case of n-ary infix operators and mixfix operators, an occurrence of the operator is placed between every pair of successive arguments in presentation markup, possibly even before the first and after the last argument. Thus, the content-markup representation only needs half the size of the presentation markup, plus/minus a small constant. 69 This figure overestimates the size of presentation markup when subterms are not always grouped into mrows, as, in some cases, there is no other markup – such as brackets – around a subterm. 67
69
2 Representing Semiformal Mathematical Knowledge
2.4.5.2 Preserving Statement-/Theory-/Document-Level Structures The statement, theory, and document levels of mathematical knowledge can be handled in the same way w. r. t. semantic structure preservation. The presentation process on these levels has OMDoc as a common input language and XHTML +RDFa as a common output language. elaborate
Acknowledgments The short review of state-of-the-art ontology languages in section 2.1.4 is joint work with John Bateman. The elaboration on logical structures of mathematical knowledge in section 2.2.1 is heavily based on prior work by Michael Kohlhase [Koh06b; KK08] or in collaboration with him [LK09], but adapted to the topic of this thesis. The work on defining notations for mathematical symbols that advances the state of the art has been done in collaboration with Michael Kohlhase, Christine Müller, Normen Müller, and Florian Rabe; I was particularly involved into the refinement of the OMDoc 1.3 pattern matching syntax and the modeling of brackets and operator precedences in that syntax (cf. section 2.3.4). The description of the SIOC Core ontology in section 2.4.1.4 is partly based on a joint publication with Uldis Boj¯ars, Tudor Groza, John Breslin, and Siegfried Handschuh [Lan+08b]. Parts of the review on structure extraction from XML markup are reused from [Lan09]. The sections on using OMDoc as an ontology language (section 2.4.3) and on the metadata framework for OMDoc 1.3/1.6 (section 2.4.4.3) are mostly based on a joint publication with Michael Kohlhase [LK09]. The preservation of object-level semantic structures on publishing (cf. section 2.4.5.1) has been developed jointly with Florian Rabe [GLR09].
70
3 Services for Mathematical Knowledge Management explain wide scope of MKM
This chapter deals with various aspects of interactingwith semiformal mathematical knowledge in the sense of chapter 2, plus their technical foundations. From this point on, I assume that the mathematical knowledge that is interacted with is represented in MathML, OpenMath, or OMDoc. I also assume a semiformal semantics of this knowledge, as established by the ontologies and the XML-to-ontology translation given in section 2.4.
3.1 Browsing 3.1.1 Rendering The first step in obtaining a browsable view on a semantic document is rendering it, i. e. transforming content markup to presentation markup: 1. Content-markup formulæ are translated to Presentation MathML formulæ, preserving the original semantics as parallel markup (cf. section 2.4.5.1). The translation is governed by the notations defined for the symbols used (cf. section 2.2.5). 2. Non-formula OMDoc markup is translated to XHTML, preserving the original semantics as RDFa (cf. section 2.4.5.2). Such transformations have traditionally often been implemented in XSLT (cf. section 2.1.2.2), with more and more exceptions being made for rendering formulæ, due to its complexity. 3.1.1.1 Rendering Formulæ As outlined in section 2.3.4.1, XSLT is not the most suitable language for defining notations of symbols. Pattern matching as a means of defining notations (cf. section 2.3.4.2) is often implemented by translation to XSLT. This translation, which is also usually implemented in XSLT, is particularly straightforward for the presentation markup fragment. The JOMDoc library [Jom], however, which is used in most of the implementations presented in this thesis, directly implements the pattern matching and rendering algorithm in Java using the XOM XML object model [Har]. The MMT library implements declarative notations, again directly (based on the XML object model built into the Scala language [EPF]), but the declarative notation definitions of OMDoc 1.2 were also implemented by a translation to XSLT, which was implemented in XSLT. A notable feature of
71
change
3 Services for Mathematical Knowledge Management
the MMT library is that it also uses notation definitions to render statements, theories, and documents, and thus provides a uniform management of presentation on all levels of mathematical knowledge. A system that wants to render formulæ has to make the notation definitions of all occurring symbols available to the renderer. Both the JOMDoc and the MMT renderer implementations, as well as the OMDoc 1.2 presentation implementation, can collect notation definitions automatically from imported theories, which makes it easy to render OMDoc documents: Depending on how the documents are stored, one may just have to provide an application-specific implementation of a function that resolves the URL of an imported theory. The most important motivation for replacing the OMDoc 1.2 implementation by the ones of JOMDoc and MMT was given by the disadvantages of generating XSLT stylesheets from declarative notation definitions in OMDoc 1.2; therefore, let us briefly review this process: In OMDoc 1.2, a document was rendered in a three-step process, as explained in [Koh06b, section 25.1] and [Lan07a, section 3.5]: In the first step, an XSLT stylesheet had to be generated from each OMDoc document containing declarative notation definitions. In the second step, the set S of documents containing the theories imported by the document d to be rendered – assumed to define the notations for the symbols used in d – was determined. Both generation steps were implemented in XSLT. The final XSLT stylesheet xd used to render d was customly generated and included a static XSLT for the statement-, theory-, and document-level elements, and all those XSLTs that had been generated for each document s ∈ S in the first step. The controlling of this process had been implemented in two ways: (i) For batch-processing of files from the command line, there was a makefile, which iteratively executed the aforementioned two generation steps. (ii) The first version of the SWiM wiki – the predecessor of the one presented in chapter 4, which had been implemented before JOMDoc and MMT existed, – reimplemented the process in a recursive, demand-driven way: When rendering a document d, the XSLT stylesheet xd was either taken from a cache, or generated by recursively calling generation step 2, which in turn called step 1 on demand [Lan07a, section 4.2.6]. This approach has turned out to be error-prone and hard to debug. When errors spotted in a formula in the XHTML+MathML output were found to originate from the first step of generating XSLT from declarative notations, it was not sufficient to debug the generated XSLT, but to debug the XSLT implementation of the generation, which had a highly abstract code with no obvious resemblance to the OpenMath input nor to the Presentation MathML output. In November 2008, when we finally stopped using the OMDoc 1.2 XSLT stylesheets for rendering formulæ, the formulæ in the OMDoc example documents still contained a lot of rendering errors1 , even though the XSLT implementation of the generation steps 1 and 2 had been in use and under maintenance since September 2000. 3.1.1.2 Rendering Notation Definitions A special case is rendering notation definitions themselves. This is currently an issue in OMDoc, as notations are defined completely formally. Other than for the rest of OMDoc’s statements and 1
We cannot give more exact figures, but experience with JOMDoc, which is now also in use for rendering a collection of lecture notes with more than 2000 symbols, shows that maintenance is now easier.
72
3 Services for Mathematical Knowledge Management
Figure 3.1: Rendered notation definitions for the arith1#plus symbol of OpenMath, from the OpenMath wiki at http://wiki.openmath.org/?title=ntn:arith1 [Lan]. contrary to the TEXmacs extension of Autexier et al. (cf. [Aut+07]), there is no way of intermixing them with natural language, except (ab)using the informal part of a symbol’s definition for introducing an example of its notation, e. g. “we call (nk) the binomial coefficient of n over k”. Thus, an auto-generated example may be helpful both for the author(s) of the symbol and its notation definition (to see whether he got the notation definition right) and, of course, for authors interested in using the symbol. I have implemented such an auto-generation for the OMDoc 1.3 notation definition syntax. The source code of the content markup pattern and the preview of the rendering are displayed next to each other. The preview is generated by replacing all placeholders in the content markup pattern by strings, whose values are the names of the placeholders, and then rendering that expression using the current notation definition. For the notation definition arg1 from listing 2.6, we would obtain @(arith1#divide, arg1, arg2) rendered as arg2 . Figure 3.1 shows the same for an n-ary operator, where arg1, . . . , argn are used as arguments. 3.1.1.3 Rendering Non-Formula Markup For presenting non-formula markup without preserving the semantical structure, one could draw on existing XSLT stylesheets – both for OMDoc and for OpenMath content dictionaries. One important feature of these stylesheets is their support for document inclusion. Mathematical documents can be highly modularized and thus contain knowledge of different granularities, as will be discussed in section 3.7.2.2. Still, if the author composed a document from reusable parts, the reader may prefer reading it at once in a coherent view, without having to follow any further links. Therefore, the rendering implementation has to be prepared for inclusion. By default, the XSLT stylesheets for OMDoc resolve all inclusions (). As modularity had not been supported for OpenMath CDs before but was needed for storing documents in the semantic wiki SWiM (cf. section 4.3.2.1), so I implemented it analogously to OMDoc, using XInclude [MOV06]. JOMDoc, Krextor, RDFa focus on web browsing, but maybe say a few words about other output formats (LATEX, PDF)
73
3 Services for Mathematical Knowledge Management
3.1.2 Navigation BegOP(13) However, the links from individual symbols in rendered formulæ to their declarations can be traversed with the mouse. This linking is achieved by post-processing the parallel markup that mmlproc outputs and translating the (cdbase,cd,name) triples to SWiM-internal page URLs. Where am I? What’s here? Where can I go? (Veen 2001, The Art and Science of Web Design)
3.1.3 Interactive Exploration offer interactive services that work on the preserved semantic markup from 2.4.5 (JOBAD integration) Particularly, services that rewrite a formula should retain the previous state of the formula as an alternative to which the user can switch back.
3.2 Arguing Conceptualization and formalization of knowledge, including semiformal mathematical knowledge, is non-trivial. Many issues can occur during such a process; I will focus on how to report them, how to argue about them, and how to resolve them. A knowledge item might be required but not yet have been conceptualized or formalized. An existing knowledge item might be hard to understand or consist of wrong facts; redundancies with other knowledge items might have been identified, or a subpart of one knowledge item might considered to deal with a topic of special interest, deserving to be promoted to a knowledge item of its own. Rittel and Kunz modeled the design process for complex problems – i. e. a generalization of our setting – as “a conversation among stakeholders (e. g. designers, customers, implementors, etc.), in which they bring their respective expertise and viewpoints to the resolution of design issues” ([KR70], as cited by [CB87]). I will adopt this model and assume a collaborative software environment, whose users report issues and argue about them, propose solutions that are again subject to discussion, until finally a solution is approved and implemented. Such a discourse can be lengthy and hard to keep focused, as issues can be “wicked problems”, exposing traits like not allowing for a “definitive formulation”, having solutions that are “not true-or-false but goodor-bad”, and the nonexistence of an “immediate and [. . . ] ultimate test of a solution” [RW73]. A solution is usually materialised in an improved version of the affected knowledge item or a new knowledge item. Later, other users, who want to understand why some knowledge item is modelled in a particular way, can trace back the discourse that led to its creation or modification. Thus, the discussions about issues with knowledge items become part of the collective experience of the community. I developed a model for structured argumentation about issues with mathematical knowledge items. The model is designed to guide discourse and to enable a system to assist users with the implementation of solutions in common cases. Being inspired by the state of the art in bug tracking and argumentation ontologies, I realized the generic part of the model as an extension module of the SIOC online community ontology. I specialized the general model to the particular setting of 13
Old Part: integrate
74
EndOP(13)
3 Services for Mathematical Knowledge Management
a collaborative environment for mathematical knowledge, where one document holds knowledge about one distinct mathematical topic, or about a set of closely related topics. Typical knowledge items that I consider are definitions of symbols, theorems, and proofs (cf. section 2.2.1). Issues with them can be that a knowledge item is wrong, incomprehensible, presented in an uncommon style, or redundant (cf. section 3.2.3.1). A review of the state of the art and a theoretical outline of my approach follow below.
3.2.1 State of the Art My general argumentation model is inspired by three sources: (i) practical experience with bug and issue tracking systems, (ii) the perception that existing collaborative knowledge engineering environments insufficiently support similar features, (iii) and by the formal model of IBIS and the DILIGENT argumentation ontology. A detailed account for (ii) in the particular setting of wikis is given in section 4.2.1; here, we will focus on bug and issue tracking, and argumentation ontologies. 3.2.1.1 Bug and Issue Tracking In bug tracking systems, users or developers of a software system report issues with that system. Unexperienced users, or developers being in the design phase, often report issues with the system in general (e. g. that a certain feature is missing), whereas developers currently working on the implementation usually narrow issues down to a particular component of the system. Follow-up comments that elaborate on the description of an issue or that propose a solution can be given. Some systems, such as Bugzilla [Bug], support voting on the importance of bugs. In the end, a developer takes a decision and changes the affected source code, i. e. fixes the bug. Links from bug reports to the affected software artifacts are shown in some bug trackers, which are closely integrated with source code revision management systems, such as Trac with Subversion [Trab]. Similar patterns (discussion of changes and voting or decisions on their acceptance) are present in source code review systems (see, e. g., [MR08]) [Lan+08b]. Ontology-based approaches to bug and issue tracking have not yet reached a mature state.2 Approaches known to date emphasize other aspects of software engineering than the structure of a discourse about an issue. The Dhruv system [Ank+06] aims at supporting bug resolution in open source software communities – by interlinking code artifacts, bug reports, discussion posts and community members, and then recommending related resources. EvoOnt is a a set of ontologies for modeling the whole process of software engineering, including bug tracking [Evo]. EvoOnt models of software projects have been used for code analysis in software projects, supported by iSPARQL, an extension the SPARQL RDF query language by join operators that perform similarity matching [KBT07]. There was no particular focus on bugs; the EvoOnt representation of bugs merely provided additional information about source code, with relations like “which bugs have been filed with this source file?”, or “which revision of a source file resolved a bug?”. The BAETLE ontology (Bug And Enhancement Tracking LanguagE) [BTS+] has more general goals, while still being closely aligned with EvoOnt’s bug model. It reuses existing semantic web ontologies wherever possible and aims at a unified model for all contemporary bug tracking systems. BAETLE 2
For an application of ontologies to the particular case of searching bugs, see [Tra+09].
75
3 Services for Mathematical Knowledge Management
aims at improving bug retrieval across systems and projects. Software supporting BAETLE is not yet available, though. 3.2.1.2 Argumentation Ontologies http://terrytao.wordpress.com/, John Baez
An early approach at formalising argumentation about issues was IBIS (Issue-Based Information System), developed by Rittel and Kunz [KR70], which particularly aimed at wicked problems [RW73], as mentioned above. Not all characteristics of wicked problems apply in my case, as they were originally investigated in governmental planning [RW73], a domain that lends itself less well to formal modelling than knowledge engineering. The gIBIS hypertext system [CB87] applied the IBIS method to system design. This served as inspiration for subsequent applications in ontology engineering: the DILIGENT argumentation ontology and the Change and Annotation Ontology (ChaO) of Collaborative Protégé. A collection of semantically structured mathematical knowledge can be considered an ontology, particularly if it contains formal definitions of mathematical concepts (see section 2.4.3 for a more elaborate account of correspondences). The DILIGENT argumentation ontology was conceived in the context of the namesake collaborative ontology engineering methodology as an extension of IBIS that makes arguments more focused, thus making design decisions more traceable and allowing for inconsistent argumentations to be detected [Tem+05; Tem+07]. A discourse in terms of the DILIGENT argumentation ontology is structured as follows: When an issue has been raised, collaborators can express their agreement or disagreement with it, i. e. whether they consider this issue important, justified, and legitimate. An issue can be resolved by implementing a proposed and – again by posting agreements – approved idea in the space of knowledge items (called “ontology entities” in DILIGENT) and concluding the discussion thread with an explanation of the decision taken. This decision will link to the issue that has been solved and to the idea i that was realised. If that idea was to create or modify a knowledge item k, a link “i resolves into k” will be created. Besides merely agreeing or disagreeing with an issue or idea, collaborators can also argue about it, i. e. justify it by examples or evaluations, or challenge it by alternative proposals or counter-examples, and others can again agree or disagree with these arguments. DILIGENT has been evaluated in two wiki-like prototypes of collaborative systems, coefficientMakna and Cicero, which will further be described in section 4.2.2.1. Protégé is an editor for OWL ontologies. Its extension Collaborative Protégé [Tud+08] and its web frontend WebProtégé [TVN08] are powered by ChaO. Besides changes to ontologies, which I will not focus on here, ChaO defines “annotations”, which roughly correspond to the main classed of DILIGENT but are more loosely coupled. In line with Protégé’s independence from a particular ontology engineering methodology, the ChaO ontology does not prescribe a certain flow of discourse, but allows annotations of any type to annotate (i. e. reply to) other annotations.3 In contrast to the DILIGENT-based applications, Collaborative Protégé, however, has a large community of users applying it in realistic settings, particularly within biomedical informatics.4 Thus, 3
There have been considerations to elaborate ChaO into a generic ontology for collaborative ontology engineering workflows, of which the DILIGENT methodology would then only be a special case [Seb+08]. 4 Personal communication with Alexander García Castro, 2009/07/29.
76
temporal reasoning on versioned metadata would be helpful for retracing discussions that led to knowledge item revisions.
check ref
3 Services for Mathematical Knowledge Management
chances are good that Collaborative Protégé and ChaO will be improved through continuous development.
3.2.2 The SIOC Argumentation Module I found the DILIGENT vocabulary, with its IBIS roots, suitable for my setting of structured knowledge engineering. Compared to competitors like Protégé’s ChaO, I considered its more rigid structure a better foundation for domain-specific extensions (cf. section 3.2.3). The fact that DILIGENT had only been implemented in prototypical environments so far did not affect my decision, as my target environment for mathematical knowledge would have been different anyway and thus would not have permitted reuse of existing DILIGENT-based code. DILIGENT only served as a formal inspiration for my argumentation ontology, though. The actual implementation was done as a module of the SIOC ontology (cf. section 2.4.1.4). I will first explain the reasons for that, and then describe the implementation. Collaboration with Tudor Groza on rhetorical structures in scientific documents, and with Uldis Boj¯ars on SIOC, showed that argumentation can be used in a wide range of settings [Lan+08b]. SIOC itself is already recognized as a standard for modeling user-generated content on social media sites and is supported by various software applications. That led us to the decision to design the core of our argumentation ontology, despite inspiration gained from IBIS and DILIGENT, from scratch as a module of SIOC. SIOC has previously been extended by several modules “to extend the available terms [for covering a particular use case] and to avoid making the SIOC Core Ontology too complex and unreadable” [Ber+09]. The SIOC Types module, for example, introduces subclasses of SIOC concepts in order to represent different kinds of social web object more precisely. There is, for example, the sioc:Forum subclass sioc_t:ArgumentativeDiscussion representing “a discussion area where logical arguments can take place” [Sio], but so far SIOC has not offered further specific supports for argumentative discussions. This is the gap that the argumentation module fills. The minimum needed for modeling argumentation in a SIOC-compliant way is having a class that can be assigned to any resource in addition to sioc:Item or sioc:Post, stating that this post has the role of an argumentative statement. A post of type sioc_arg:Statement is at the root of an argumentative discussion, much like an IBIS issue. It can be followed by a replying post of the same type, modeled by sioc:has_reply in SIOC Core (thus one statement refers to another statement). The way in which we modeled this relation, was by introducing sioc_arg:refers_to as a subproperty of sioc:has_reply. Starting from this, we specify additional classes and properties for arguments, all subclasses of Statement 5 , or sub-properties of refers_to. The reason behind our design was to provide both developers and users with the flexibility of choosing their own way for identifying the argumentation (statement) types for their posts. This is different from the DILIGENT argumentation ontology, whose classes and properties do not have a common superclass or superproperty. From the bug tracking (cf. section 3.2.1.1) and wiki discussion (cf. section 4.2.1) use cases, as well as from forums and blogs (elaborated in [Lan+08b]), we observe that discussions usually In the remainder of this section, I will omit the sioc_arg prefix when it is clear from the context that an entity belongs to that ontology module. 5
77
3 Services for Mathematical Knowledge Management
refers_to
supports/ challenges
Statement
agrees_with/ disagrees_with/ neutral_towards
subClassOf arises_from Issue
Idea
proposes_solution_for
Argument
subClassOf
Example
Elaboration
Evaluation
Justification
elaborates_on Position
decides supported_by Decision
Figure 3.2: The SIOC argumentation module start with an issue or an idea. An Issue is a problem to be discussed, a decision on a solution being expected as the result of the discussion. An Idea can refer to an Issue, then taking the role of a solution proposed for that issue, or it can stand on its own. In this last case, the Idea can either be a general idea, not proposing to solve any particular issue, or it is a proposed solution for an implicit issue that is not addressed in a discussion post of its own. On the other hand, Issues can also follow up on Ideas – particularly when a discussion was initialized by an Idea and then the idea turns out to be problematic. Most of our concepts (as depicted in figure 3.2) root in the DILIGENT argumentation ontology (cf. section 3.2.1.2) but have a slightly different semantics, which is owed to the fact that we are in a setting more general than ontology engineering. In the DILIGENT methodology, an Issue states a requirement for the ontology to be designed, and an Idea would propose a concrete conceptualization or formalization, according to the definition of an ontology; ideas cannot represent roots of argumentation threads. Still, these different understandings of issues are still subsumed under IBIS’s more general notion of an issue.6 In the SIOC argumentation module, both Issues and Ideas can be followed up by Elaborations, which continue the line given by the parent statement, and thus enrich the argumentation model of the discussion. Users can reply to Issues, Ideas, and Elaborations on the former, with Arguments, which can be justifications or challenges. An Argument tries to argue objectively; it is distinct from a Position (see below), which rather conveys the personal opinion of a user. On the other hand, depending on the particular use-case, the presence of the Argument concept might not be needed (this being the reason for the different way of representing it in figure 3.2). In the Blogosphere, every opinion can be seen as a personal interpretation of the reality, while in a bug tracking system, 6
. . . with the exception that IBIS assumes issues to be phrased as questions [KR70]. However, both DILIGENT and my variant thereof are consistent with the IBIS requirement that “the origin of issues are controversial statements” [KR70].
78
3 Services for Mathematical Knowledge Management
such opinions are supported by real issues, thus having the circumstance of being considered objective. In addition, the role of an Argument can be resumed to: (i) an expression that states if an Issue is considered legitimate and worth discussing, and (ii) an expression that shows if an Idea can be considered a good solution. Subclasses of Argument comprise: Example, Evaluation, and Justification, which can be attached to their parent post by one of the properties supports or challenges. In this case, our design was motivated by the DILIGENT evolution used by the Cicero system (cf. section 4.2.2.1) and allows for retrieving supporting or challenging arguments with one query step less than a model with positive and negative argument classes and just one arguesOn property, as would be the case in the original DILIGENT argumentation ontology. Also, we opted for only this small set of subclasses for the Argument concept, as earlier studies in argumentation have shown that a restricted space of argument types helps to keep a discussion more focused [PST04]. In a more subjective manner, users can express their Positions on a statement – either agreeing or disagreeing. The relation to the statement is represented by one of the properties agrees_with, disagrees_with, neutral_towards. While most argumentation ontologies do not allow the representation of neutral positions in order to force the argumentation towards solutions, they are nevertheless quite common in online discussions. In fact, they are different from the absence of the position in that they express “I do care about this statement, I’m just not decided whether to support it or not.” For a minimum working model, it is sufficient to give Positions on Ideas, but in a more elaborate model Positions on Issues, Elaborations, and even Arguments could make sense. At the end of an argumentative discussion a decision can be taken. It can be documented by replying to the post that started the discussion (either an Issue or an Idea) with a Decision. Decisions can also be taken on subtrees of a discussion, e. g. on one of the ideas for solving an issue, while leaving the overall issue still open. In the case of making a decision on an issue, one can also link the Decision to the winning Idea. A Decision should be backed by linking to the positions that were in favor of the action decided. 3.2.2.1 Usage Recommendations When developing the SIOC argumentation module, we phrased some recommendations on how to use it in social applications. I have implemented and evaluated basic support for argumentations in one particular way, as explained in section 4.4.3, but that implementation leaves space for extensions and improvements. This is why I will reflect the usage recommendations here. Generally, it is up to application developers how much of the argumentation module they support. As set out above, the list of use cases is diverse. Nevertheless, it is recommended that applications restrict the statement types with which the user can reply to a post to exactly those that are allowed by the schema, plus possible subclasses thereof. One aspect that our model currently does not capture is a voting scheme. The developer should make the choice of implementing positions as proper posts, or by introducing a vote mechanism on statements. There exist several possibilities to model voting: (i) The ChaO ontology of Collaborative Protégé (cf. section 3.2.1.2), for example, allows for either “5-star” or “yes/no” voting [TN07], whereas (ii) Cicero allows for “yes/no” voting either on individual ideas or in a multiple choice way [Del+08b]. When using voting in problem solving, the process may be made more efficient by separating
79
3 Services for Mathematical Knowledge Management
it into two stages: setting a deadline until which all argumentation (such as coming up with ideas and arguing on them) has to be finished, and then allowing the community to vote, as to prepare a final decision. This has been investigated in the Cicero system (cf. section 4.2.2.1). Finally, it is recommended to close an argumentative thread with a decision, with no more possibility to submit posts. In some applications, such as bug tracking systems, however, the possibility to reopen a discussion is commonly offered. In a small web of trust it may be feasible to let every user make decisions, whereas in larger social networks we recommend this to be restricted to moderators.
3.2.3 Domain-specific Extensions I have enhanced the generic argumentation ontology by particular concepts from the mathematical domain, which allows for creating more specific issues and ideas in a machine-understandable way, and for implementing semi-automatic assistance in implementing certain of these specific ideas into the system, as will be set out in section 3.2.4. The extension of the argumentation ontology for a particular domain assumes that knowledge items are typed as concepts from that domain. Every knowledge item is assumed to have exactly one principal type from an ontology describing the respective domain. I have focused on the mathematical domain, particularly on the main types of mathematical statements, as defined in the OMDoc ontology (cf. section 2.4.1.1). Additional types of a knowledge item, such as its status in terms of a hypothetical project management ontology (e. g. “Draft”, “UnderReview”, “Published”), will not be taken into account for reporting and resolving issues. I assume a SIOC-structured discussion thread about every knowledge item, whose discussion posts are additionally instances of classes from the above-mentioned argumentation ontology. In my current model, I consider the classes Issue, Idea, Position, and Decision (cf. figure 3.3). To establish a bridge between the domain of knowledge and the argumentation about it, I created an ontology of domain-specific subclasses of the Issue and Idea classes. A particular type of issue is considered applicable to certain types of knowledge items; I model this in the ontology as well. For example, an issue with a mathematical proof can be that it is wrong, whereas the notation of a symbol cannot be wrong in the logical sense but inappropriate, misleading, or hard to read or write (cf. sections 5.1.2.3 and 5.1.3.3 for a detailed description of that use case). Furthermore, I assume that to a pair of a knowledge item type and an issue type, certain types of ideas can be applied. For example, if a proof is wrong, it could be deleted and replaced by a correct proof, or it could be kept as an instructive bad example. Obviously, I do not expect to cover all possible cases with a finite set of predefined issue and idea types, but the most common ones. These are shown in table 3.1. Issue type Incomprehensible Wrong UncertainWhetherTrue UnclearHowToUtilize UnclearWhetherUseful 7
applies to knowledge item type MathKnowledgeItem Assertion, Example, Proof Assertion Symbol, Assertion Symbol, Assertion, Definition, Axiom
relevance7 9/20 10/20 9/20 4/20 8/20
according to what kind of issues and ideas the participants of the survey had experienced
80
3 Services for Mathematical Knowledge Management
InappropriateForDomain Example, Definition, Proof , omo:NotationDefinition UncommonStyle Example, Definition, Proof , omo:NotationDefinition RelationUnclear Proof , Example, omo:NotationDefinition Underspecified Definition, Axiom, Assertion Overspecified Definition, Axiom, Assertion Theory, Assertion, Example, Definition TooManySubparts Reinvention Theory, Assertion, Definition, Axiom Idea type applies to Issue type ImproveThis Issue FixSemantics9 Issue ImproveInformally Issue ImproveRelated8 Incomprehensible, UncertainWhetherTrue, UnclearHowToUtilize, UnclearWhetherUseful CreateRelated8 Incomprehensible, UncertainWhetherTrue, UnclearHowToUtilize, UnclearWhetherUseful ProvideExample Incomprehensible, Wrong, UncommonStyle, ReCreateAlternative11 lationUnclear, InappropriateForDomain Split TooManySubparts RemoveParts TooManySubparts, Reinvention ReplacePartsByReferences TooManySubparts, Reinvention FactorOutParts10 TooManySubparts, Reinvention IntegrateOthers Issue 12 KeepAsBadExample Issue Delete Issue
4/20 8/20 7/20 9/20 6/20 5/20 9/20 relevance7 8
12/21 11/21 7/21 8/21 10
2/21 11/21 9/21 5/21 3/21 1/21 10/21
Table 3.1: Domain-specific issue and solution types
3.2.3.1 A Survey on Issues in Mathematics To get an understanding of common issues and solutions in mathematical knowledge management, I conducted a survey among domain experts (see section B.1 for detailed results). I collected information about the participants’ previous experience with mathematical knowledge bases, the support for tracking and solving issues in the tools they have used, types of knowledge items they have dealt with, types of issues they have encountered, how these issues were solved, and reasons why issues remained unsolved. A majority of 30 out of the 51 participants is experienced in contributing to libraries of software tools like automated theorem provers; contributions to websites applies to the following knowledge item types: Assertion, Definition, Symbol, Axiom indentation denotes subclasses 10 not covered by the survey 11 applies to the following knowledge item types: Proof , Example, omo:NotationDefinition 12 applies to the following knowledge item types: Assertion 8
9
81
3 Services for Mathematical Knowledge Management
SIOC argumentation module (partly shown)
Theorem agrees_with/ disagrees_with
Domain-specific argumentation classes (partly shown)
Position
……
Example
subClassOf
agrees_with/ disagrees_with
Math. Knowledge Item
supported_by
subClassOf OMDoc ontology
Ontology Entity
Decision decides
decides resolves_into
Issue subClassOf Wrong
Inappropriate for Domain
proposes_solution_for Incomprehensible
Provide Example
Idea subClassOf Keep as Bad Example
Delete
Figure 3.3: The SIOC/OMDoc argumentation ontology or open knowledge bases ranked second and third (24 and 21 participants, respectively). The most commonly experienced granularity of knowledge items was either a course unit, a mathematical theory (i. e. a few related definitions and axioms), or a mathematical statement. Only in a few cases the participants had had support for automated issue tracking and solving by the knowledge bases they had used. The prevalent type of knowledge item that the participants had ever found affected by issues was the definition of a new mathematical symbol or concept.13 About half of the 25 participants who answered that question had experienced issues with theorems, proofs, examples, theories, notation definitions, and axioms. The most common issue was that a knowledge item was simply wrong, followed by being incomprehensible, its truth being uncertain, being underspecified, or redundant. Other common cases were knowledge items of which it was not clear whether they were useful, and knowledge items expressed in an uncommon style. Issues were mostly solved by directly improving the affected knowledge item (as opposed, e. g., to creating another one), by splitting it into more than one, or by deleting it altogether. Still, some participants have experienced issues being unresolved and mostly attributed this to an insufficient tool support for restructuring knowledge items. Other common reasons were insufficient awareness of the users that there is actually an issue, insufficient social interaction among users, as well as insufficient tool support for editing knowledge items. The replies the participants gave about issues and ideas they had experienced influenced the further development of the ontology; detailed response figures are shown in table 3.1.
13
I did not distinguish between symbol declarations and definitions.
82
3 Services for Mathematical Knowledge Management
3.2.4 Automated Assistance A simple automated problem solving assistance based on [part of] the argumentation ontology can be specified as follows: Whenever there is a discourse about a knowledge item, the system should check whether there is an issue that is both unresolved (meaning that no decision on it has been posted yet) and not challenged as invalid by the existence of a majority of disagreement replies to it. If ideas have been posted on how to resolve this issue, the most popular one in terms of the ratio of agreements to disagreements should be selected. Formally, any issue s satisfying D(s) = ∅ ∧ (P− (s) ≠ ∅ ⇒ ∣P+ (s)∣ > ∣P− (s)∣) is considered legitimate, and the idea i in
arg max
i∈Id(s), P+ (i)≠∅
∣P+ (i)∣ ∣P− (i)∣+1
wins, where Id, D, P+ , and P−
denote sets of ideas, decisions, agreements and disagreements with an issue or idea, respectively14 . The system may provide assistance to any volunteering author to implement the winning solution in the space of knowledge items, e. g. by automatically creating a template for a new knowledge item that the author can then complete. If an author follows the steps proposed by the system, the system should conclude the respective discourse by posting an automatically generated decision. Still, freedom should be left to the community to implement solutions manually, when users feel that the automatic support is not adequate to the wickedness of the current problem. In this case, the author to resolve an issue has to document this decision manually. Any thread that has been concluded by a decision will no longer be considered by the system.
3.2.5 Manifestation of Discourse into Documents for future work (not done so far): how to make previous argumentations persistent as e. g. rhetorically structured documents in the knowledge base
3.3 Editing what would we like to edit? Formulæ, statement/theory level, metadata – we want a dedicated editor for each; edit locally how do we make the editor accessible in situations where we need it (cf. MCS notation article)
3.3.1 State of the Art There are three common approaches to editing documents in a complex markup language; I will review them from a mathematical point of view, focusing on formulæ and on documents. short summary of the different approaches
In any of these editing approaches, there can be tool buttons, menus, or keyboard macros for inserting frequently-used content, or for changing existing content. This is, however, most common in WYSIWYG/WYSIWYM editors. 14
It is not yet clear what idea should be preferred when there is more than one such i.
83
related: Lurch (search e-mail)
3 Services for Mathematical Knowledge Management
3.3.1.1 Raw Access with Support The editor fully exposes the original markup. Any text or XML editor can be used as a basis, while it is advisable to enhance it with a language-specific configuration for syntax highlighting, and possibly other features, such as indentation or folding. This approach has been used most commonly for mathematical markup languages, due to the availability of high-quality extensible text editors and the relative ease of implementing extensions. There is no conceptual difference between editing formulæ and other document structures. For the Emacs editor, so-called modes for OMDoc and STEX have been developed. After an initial implementation from scratch (cf. [Jan06]), the OMDoc mode has been reimplemented as an extension of Emacs’s nxml mode for editing XML [Pes07]. The STEX mode is an extension of the AUCTeX Emacs mode [Ste]. Similarly, jEditOQMath has been developed as a distribution of the jEdit editor, bundled with plugins for editing OMDoc in an ActiveMath setting [Lib06]. jEditOQMath, however, does not expose formulæ in their original markup, but uses the QMath syntax instead (see below). 3.3.1.2 Custom Input Syntax A textual input syntax that is easier to read and to write than the original encoding is developed. This is usually a one-dimensional syntax, as opposed to the two-dimensional appearance of rendered mathematical formulæ; therefore, it is also called linear syntax. On loading a document, it is translated to the custom input syntax; on saving, it is translated back. There are languages that act exclusively as an input syntax in the sense that only a preprocessor exists that translates such a representation to the actual knowledge representation language. However, I will focus on two-way translations. Formulæ are usually small and thus easy to retype, but particularly for documents it is important to be able to edit those that already exist on the web or in knowledge bases (cf. section 3.7). There are not yet that many documents whose native format is, e. g.,, OMDoc15 , but many translations from other languages into OMDoc exist. Suppose there is a language L1 that is easy to edit, thanks to existing tool support, and suppose a lot of content is available in a language L2 , which is not that easy to edit. The existence of two bidirectional translations L2 ↔ OMDoc ↔ L1 will then enable authors to edit a document whose original source is in L2 in the well-supported language L1 . The exchange language (here: OMDoc) needs not be easy to edit, it just has to be sufficiently expressive to warrant a lossless translation. QMath [GP06a] is such a custom input syntax for OpenMath formulæ, which is extensible by way of “contexts” of declarative notation definitions (cf. section 2.3.4.3). For the OpenMath standard content dictionaries, QMath additionally ships with notation definitions resembling the syntaxes of various computer algebra systems (CAS). Popcorn [HR09a; HR09b] is an alternative input syntax for OpenMath that focuses even more pragmatically on CAS integration. Its supply of built-in infix and mixfix operators is not extensible, but there are some built-in notations for CAS-specific programming constructs16 . Concerning complete mathematical documents, two syntaxes are currently known for OMDoc, but not yet used widely: QMath and STEX. QMath ships with a context file for OMDoc that 15
The largest collection of OMDoc documents that is currently known, Michael Kohlhase’s lecture notes, is originally written in STEX, which is a custom input syntax for OMDoc (see below). 16 . . . from the prof1 content dictionary
84
3 Services for Mathematical Knowledge Management
facilitates entering statement- and theory-level structures and metadata, as long as they are not too deeply nested.17 Similarly as for mathematical symbols, QMath syntax for any XML language, such as OpenMath content dictionaries, can easily be defined. STEX, which has been introduced in section 2.3.7.1, can also be considered a custom input syntax for OMDoc. Neither for QMath nor for STEX, a translation back to OMDoc currently exists. The Sentido editor implements a translation from OpenMath to QMath, using the QMath contexts for emitting the desired QMath syntax for symbols. A translation of the statement, theory, and document-level syntax of OMDoc to QMath would be fairly straightforward to implement in XSLT. Earlier versions of the OMDoc XSLT stylesheets had featured LATEX output, which was abandoned later. STEX output could, in principle, be implemented similarly. However, there are currently two major obstacles:
verify with Alberto
• The result of an OMDoc→STEX translation is not necessarily the same as the original STEX that had been edited, as TEX expands macros, whereas XML does not – that is, it would be hard to agree on a “canonical” STEX syntax. • STEX and OMDoc currently have different syntaxes for declaring symbols and their notations. OMDoc treats both separately, whereas STEX has a combined syntax for declaring a symbol (without a type, though) and its notation, which resembles the LATEX \newcommand macro. • Moreover, STEX represents the argument list of an n-ary operator as a single TEX argument that is a comma-separated list, in order to circumvent the LATEX restriction that a macro can only have up to 9 arguments. Operators with a fixed arity of n are modeled as commands with n arguments – like \frac{num}{den} in LATEX –, but truly n-ary operators are modeled as unary LATEX commands, where the actual arguments are commaseparated, e. g. \nunion{A,B,C} for the n-ary set union A ∪ B ∪ C. This becomes more complex for flexary operators like the function declaration symbol f ∶ S1 × ⋅ ⋅ ⋅ × S n → R, which could be written as \function{f}{S_1,...,S_n}{R} in STEX. The first and last arguments are fixed, but there is a flexible number of arguments in between. In OpenMath, all of these arguments would simply be children of one OMA element; therefore, it would be impossible to figure out which subrange of them forms a single n-ary TEX argument. A solution of this problem would be a markup that captures more of the semantics, such as \function{f}{\crossproduct{S_1,...,S_n}}{R}, but forcing the author to make more of the semantics explicit again makes authoring harder. For the less complex, non-semantic HTML, a lot of input syntaxes exist, such as Markdown [GS+]; they are often used in wikis (cf. section 4.2). LaTeX2OQMath: LaTeXML→*.tex.xml→heuristic XSLT yields OQMath (OMDoc with embedded QMath), depends on ActiveMath integration [And]
17
However, one can also use the QMath syntax for formulæ only and OMDoc XML for the rest, or mix XML and QMath fragments.
85
QMath also partly acts as an input syntax for whole OMDoc
3 Services for Mathematical Knowledge Management
3.3.1.3 WYSIWYG/WYSIWYM18 The editor shows the document as it will be rendered, or close to that. Semantic annotations are hidden in the editor’s internal representation of the document and usually only partially exposed to the user interface. For example, an annotated text fragment could be highlighted with some color to indicate that an annotation exists. The annotation would be revealed on mouse-over, and it would be editable by selecting the highlighted text and opening a dialog window. WYSIWYG editors are common in word processors. The probably most widely used WYSIWYG editor for editing HTML in web applications is TinyMCE [Tin], which is the default editor of many content management systems and blogs, and for which a large number of plugins exists. For these reasons it is also the preferred base for several extensions using semantic web technologies, e. g. within the Semantic Reblog republishing tool [WM09], or for the One Click Annotator of the loomp semantic content management system [Hee+09]. It is also used by the semantic wikis IkeWiki (cf. section 4.3.1) and its successor KiWi ; for the latter, it has also been extended by custom annotation plugins. WYMeditor is a similar but less widely used editor emphasizing the WYSIWYM paradigm, which means that it focuses less on an exact reproduction of the rendered appearance of the document, but rather on visualizing its structure and enabling structural editing, and on generating clean XHTML [Hov+]. WYSIWY[GM] editing is also common for presentation-oriented formulæ; there are numerous WYSIWYG editors for Presentation MathML [Matb]. Fewer such editors exist for contentoriented formulæ;
ref
introduce
formeleditoren: No linear input syntax is used, but the formula is composed, not only previewed, two-
term earlier,
dimensionally, and
at WYSI-
Visual editorsfor formulæ and for documents are quite different in implementation, mainly due to the two-dimensionality of mathematical notation, as opposed to one-dimensional text. A seamless integration of both kinds of editors is rare, especially in technically restricted web environments. There, the formula is usually edited in a separate interface, and only a preview is shown in the document editor. Both WIRIS [Eix] and ASciencePad [Jip] feature a visual formula editor integrated into HTMLArea, an editing widget similar to TinyMCE. ASciencePad is an extension of the single-file and single-user “wiki” TiddlyWiki; its math editor translates a linear syntax to Presentation MathML. WIRIS have instead integrated a Java applet for WYSIWYG editing of Presentation MathML formulæ and offer this as a plugin for the Moodle e-learning platform. Inside HTMLArea, the formulæ are previewed as images. However, neither of the two editors edits content markup. WIRIS have developed an OpenMath editor [Wir], but the latter has not been integrated into HTMLArea. 3.3.1.4 Editing Large Structured Documents State of the art: e. g. Connexions’s edit-in-place
18
What you see is what you get/ what you see is what you mean
86
WYG Word 2007?
3 Services for Mathematical Knowledge Management
3.3.2 Statements, Theories, Rhetorics, and Document Sections An editor can treat statements, theories, and rhetorical structures like ordinary sections of a document – except that they carry additional annotations. Concrete use case: how do symbol and notation definitions evolve?
STEX defines LATEX macros for all of OMDoc’s structuring elements. Usually, empty XML elements are entered as LATEX commands, whereas elements with children are entered as LATEX environments. The LATEXML bindings are sufficiently expressive for handling such STEX→OMDoc translations. An elaborate example is given in the documentation for STEX’s proof module [Koh09b]. The SWiM editor is an extension of the visual HTML editor TinyMCE [Tin]. The structures of the semantic markup are made accessible as special nested HTML tables, which can easily be inserted via tool buttons. Both directions of this conversion are implemented in XSLT. The head line of one such table includes the name of the XML element, e. g. CDDefinition, and optionally the list of attributes as “key=value” lines, e. g. the type attribute of a signature dictionary, which points to a CD defining the type system used for the signatures. While any desirable markup can be represented like this, it is not user-friendly for deeply nested structures. Therefore, SWiM gives dedicated editing support for certain aspects of the markup: First, metadata of CDs or symbol definitions are editable via a dedicated form-based metadata editor. Secondly, for some XML elements the tables are arranged more intuitively. For example, notation definitions map a prototype to a rendering (cf. sect. ??), which is reflected by the side-by-side arrangement notation prototype rendering instead of
notation prototype
BegOP(14)
. Finally, there is a visual formula editor, which we will explain in detail in
rendering
the following section.
EndOP(14)
3.3.3 Formulæ We reuse the visual OpenMath formula editor that originated in the Sentido Mathematical Environment [GP06b] and integrate it as a plugin into TinyMCE in a similar way as done previously into the MathWebSearch formula search engine [Koh+08]. Inside TinyMCE, the formulæ are encoded as span elements, decorated using CSS. When the cursor is inside one of them, the Sentido toolbar button is highlighted, and the document path displayed below shows the location as “formula”, instead of “span”. The editor comes as a pop-up window, consisting of an input field for the linear formula, a drop-down menu for selecting the input syntax, a preview area where the 2D formula is shown in Presentation MathML updated in real time as we type in the input field, and a set of collapsible palettes for inserting formula templates. These palettes are XHTML made by hand to include all the symbols from the MathML group of the standard OpenMath CDs, but we plan to make SWiM automatically generate additional ones for other symbols defined in the wiki. cut a part on qmath
14
Old Part: integrate, deSWiMify, as this chapter is about editing in general
87
3 Services for Mathematical Knowledge Management
Figure 3.4: Editing a document in the extended TinyMCE, formulæ marked yellow.
Note that it is not possible to automate the translation from LATEX syntax to content formulæ, as LATEX is rather presentation-oriented, but an approximation would be possible. It is not necessary to open the formula editor to do minor edits to the formulæ: any changes to the linear formula text are reflected in both the content of the formula editor if called later, and in the submitted OpenMath XML content. Since the editor keeps track of the syntax used for each formula (displayed as tooltips), it is possible to have formulæ temporarily in different syntaxes while editing. However, the next time this page is opened for editing in SWiM, all formulæ will be translated from OpenMath to the same syntax as specified by the server. This way it is possible to quickly paste a formula in any of the supported syntaxes without having to convert the rest. The formulæ are submitted as a string serialization of OpenMath XML, so that they do not interfere with TinyMCE or the browser. Using XML directly would corrupt the content because the editor works in HTML mode. On the server, this string is parsed back into XML. When a page is opened for editing next, the server again has to provide any contained OpenMath formulæ in their string serialization. Both are done in the same processing step as the conversion of the other
88
3 Services for Mathematical Knowledge Management
Figure 3.5: The formula editor window, when editing three different formulæ. The Variables palette allows to declare variables as functions. All symbols have Unicode and ASCII variants (∞/inf), and outermost parentheses do not need to be complete as seen in the bottom example.
CD markup to HTML tables. 15 In this application of the formula editor, we can not display the MathML formulæ as is done in Sentido and other formula editors like ASciencePad [Jip] because of interference from other components. Undo/redo inside the linear input field in the formula editor is provided by the browser, which is enough as changes in the text field are parsed back immediately, while outside it is handled by TinyMCE. Inside the formula editor each change can be undone/redone, but once we leave it the whole formula becomes an undo step.
3.3.4 Notation Definitions Treat notation definitions separately? After all, they cover both formulæ and statements. Thus they are partly covered above, but they also have different properties Reuse parts of MCS notation article (“maintaining/editing notations”) – what information is needed?
OMPE (OpenMath Presentation Editor [Man+06]) is an editor for notation definitions for OpenMath symbols. The content pattern to be matched is entered in OQMath, a variant of our linear QMath syntax. The resulting presentational pattern can be edited in a LATEX-like syntax; common presentation symbols can be inserted via a toolbar. This makes it much more usable than SWiM in its current state. We are planning to address this by reusing parts of our formula editor, such as the parser for the linear input syntaxes, for editing notation definitions, too. On the other hand, debugging notation definitions by previewing their effect to rendered formulæ is only possible by afterwards feeding them to a presentation pipeline and viewing the rendered 15
EdNote: @CL, probably leave out the last sentence to save space. --CL
89
EdNote(15)
3 Services for Mathematical Knowledge Management
documents in the ActiveMath environment, whereas SWiM offers both in an integrated environment.
3.3.5 Metadata State of the art: forms, maybe mention semantic forms (SMW extension) pragmatic vs. strict
Having a rich pragmatic syntax that is convenient to author and a strict syntax that is more suited for automated processing and validation (cf. Sect. 3.4.1) is actually a general strategy that Michael Kohlhase first introduced in MathML 3 and also employ for other aspects of OMDoc. In certain application settings, one can generate part of the metadata automatically. In the STEX input syntax, each metadata field is represented by a LATEX command. A syntax for arbitrarily extensible vocabularies does not yet exist; instead STEX emphasizes the flexibility of pragmatic markup. For each simple metadata term, a LATEX command and a corresponding LATEXML binding for the translation to OMDoc has to be defined. STEX ships with such definitions for the Dublin Core Metadata Element Set, plus a few custom extensions. This is, for example, the LATEXML binding for dc:title [Koh09a]: DefConstructor(’\DCMtitle{}’,"#1");
STEX usually makes more complex OMDoc structures available as LATEX environments. For complex metadata structures involving anonymous resources (i. e. blank nodes), thisis currently in an experimental state and has been tried for requirements specifications [Koh08c]. Following the pattern introduced there, the revision history given in the OMDoc RDFa metadata example in listing 2.12 could be written as follows19 , given that suitable LATEXML bindings have been defined: \begin{DCTversions} \DCTversion[id=initial,creator={Pierre de Fermat},created=1637-06-13] \DCTversion[id=correct,replaces=initial,creator=awiles,date=1995-05-01] \DCTversion[id=digitalized,source=correct,creator=kohlhase,issued=2006-08-28] \end{DCTversions}
3.4 Validating I have not actually done anything about it and won’t get it integrated into SWiM, but it’s too important to be neglected formal correctness is neither verified nor enforced, but we stay interoperable with more formal tools
• syntactic validation (XML schema, exists in TNTBase) • syntactic validation that an XML schema cannot do (e. g. CMP/FMP groups): JOMDoc • link structure validation: beyond XML schema (document collections!): extract RDF, use “closed-world interpretation” of RDFS range/domain 19
We skip certain peculiarities, such as URI references to authors, for simplicity.
90
forward-ref SWiM
3 Services for Mathematical Knowledge Management
• metadata validation: generate XML schema from closed-world RDFS
MMT in TNTBase
• type validation of formulæ (OpenMath STS), or on higher levels • proof checking • manual “validation” by tagging (for informal content) challenge: RDFa metadata validation (as the metadata vocabularies are no longer hard-coded into the schema); solution: closed-world interpretation for RDFS; schema generation (cf. [Koh08a]) Or pragmatic syntax that is validatable with XML tools: Having a rich pragmatic syntax that is convenient to author and a strict syntax that is more suited for automated processing and validation (cf. Sect. 3.4.1) is actually a general strategy that Michael Kohlhase first introduced in MathML 3 and also employ for other aspects of OMDoc.
3.4.1 Metadata
cite same
One advantage of hard-coding metadata vocabularies into the XML schema of OMDoc 1.2 was the easy possibility to validate a document via OMDoc’s RELAX NG [Rel] schema (cf. [Koh06b, appendix D]). Unknown metadata fields, e. g. dc:inventor, or invalid combinations, such as a document’s author having a revision history, would have been rejected. Now, with the RDFa syntax, this is no longer possible. Enabling validation only for those metadata vocabularies, for which we retained the OMDoc 1.2 pragmatic syntax would contradict the design goal of extensibility. We enable validation for arbitrary vocabularies by taking a closed worldview on them and generating a RELAX NG grammar out of them, following an approach that taken earlier for type-checking OpenMath XML [Koh08a]. Compared to a validation of the extracted RDF, this allows for pointing out invalid annotations right in the document (see [LK08a] for details). It is limited to simple cases, though, as RELAX NG does not support XML names inside attribute values, which are frequently used in RDFa to abbreviate URIs, and obviously does not implement RDFS or OWL-DL entailment. The semantic web ontology languages RDFS and OWL make an open world assumption. They assume that the metadata given about one resource in one document need not be complete, but that additional metadata can be given in external, even unknown documents. This is not suitable for validation, where we instead require certain relevant metadata fields to be in our document – for example, that for any entry in the revision log of a document the name of the author is stated. Secondly, validation assumes a restrictive interpretation of property range and domain assertions. In RDFS, the latter are used to infer additional knowledge about subject and object. Consider the following triples: # in some metadata records: :Michael a foaf:Person . :Michael rev:has_version _:v1 . # in the versioning ontology: rev:has_version rdfs:domain rev:DataAsset .
# (1) # (2) # (3)
An RDFS reasoner would infer that Michael is both a person and a data asset [BG04]. In OWL, if we had made those two classes disjoint by adding a disjointness axiom (4), the sample knowledge base would be inconsistent. Research on automatically adding such restrictions to ontologies
91
source as elsewhere what has actually been done? [GM05]
3 Services for Mathematical Knowledge Management
for the purpose of validation has been done before (e. g. [Li+06]). Still, a general-purpose DL reasoner would only be able to point out that one of the triples (1) to (4) violates the consistency, and the justification would not be “rev:has_version is not allowed for Michael, as he is a foaf:Person” – but the latter is what we’d actually expect from a validator. Moreover, feeding the RDF metadata extracted from a document to a reasoner would not easily allow for pointing out the location in the document where an invalid statement was made. Therefore, we chose a similar approach as we had taken earlier for type-checking OpenMath mathematical expressions [Koh08a]: From a formal model (there, the type signatures of mathematical symbols, here an RDFS or OWL ontology), we generate a RELAX NG grammar for XML validation. This allows for rejecting metadata fields that have not been declared in the ontology, or who are not applicable in the current context because of a domain mismatch. This solution is still simplistic, however: RELAX NG does not support XML names inside attribute values, which RDFa frequently uses to abbreviate URIs into compact URIs (CURIEs), so we have to assume unique and fixed namespace prefixes (e. g. foaf: for FOAF; cf. table A.1), and it does not support reasoning for the classes declared as domains of properties.
3.5 Searching and Querying Except SPARQL, this is not my contribution, but it deserves being mentioned, as it is an important part of an integrated environment; forward-ref to OpenMath wiki queries, maybe another example from Flyspeck here text search with indexing of math markup: not yet done, but maybe easy? SPARQL [PS08] search: useful examples from OpenMath wiki; link math, discussion, etc. formula search: not my business, but mention MWS (structural indexing), vs. (e.g.) ActiveMath (reduction to Lucene text search), MML with LSI (Cairns
3.6 Integrating Services into Interactive Documents With its integration of – most fundamentally – browsing and editing, a wiki makes a suitable foundation for integrating further services, as we have seen. In terms of project management and maintenance, it may, however, not be desirable to simply take all appropriate mathematical services and integrate them into a single platform like SWiM– neither for the developer of the wiki platform, who would have to provide an interface to each service and test the integrated system, nor for the developers of the services, who may not be familiar with the wiki paradigm, or who may already be committed to different integrated platforms. For these reasons, we took a broader approach on integrating further services and first developed an architecture that enables the integration of mathematical services into interactive documents. We will first give further motivation for this approach, then describe the JOBAD architecture (JavaScript API for OMDoc-based Interactive Documents20 ), including the description of 20
The original name referred to “active documents. I do, however, believe that the term “active document” is too general. It refers, for example, to a Microsoft API for embedding document widgets into foreign applications, such as
92
3 Services for Mathematical Knowledge Management
services implemented for this architecture. Our vision of an interactive document is a document that the user can not just read, but adapt according to his preferences and interests while reading it – not only by customizing the display of the rendered document in the browser, but also by changing notations (which requires re-rendering) or retrieving additional information from services on the web.
3.6.1 Related Work and Motivation Interactive mathematical documents have been investigated in a number of research efforts, covering topics such adapting documents, interactive exercises, and connecting to mathematical web services. The ActiveMath project investigates how to aggregate documents from a knowledge base such that the resulting document contains exactly the topics that the reader wants to learn and their prerequisites [Act]. Interactive Exercises have been realized in ActiveMath and MathDox [GM08; Cuy+08; Coh+06]. Here, the user enters the result into a form and then gets feedback from a solution checker in the server backend. ActiveMath comes with its own web services [Mel+06], and MathDox has originally been designed for talking to computer algebra systems but can also connect to other services via MONET (see below). Gerdes et al. have developed a reusable exercise feedback service for exercises that has also been integrated with MathDox [Ger+08]. Besides supporting MathDox’s own communication protocol, Gerdes’s service also complies to the XML-RPC and JSON data exchange standards [Ger+08]. The services developed within the MathDox and ActiveMath projects, such as the ActiveMath course generator, are potentially open to any client, but have not been used with any frontend other than their primary one so far [Ull08; Mel+06]. There are also elaborate web service architectures for mathematical web services that have been designed for integration with many systems, such as the ones developed in the SCIEnce [Sci] and MONET projects [Mon]. SCIEnce explicitly targets symbolic computation and grid computing and does not consider documents as user interfaces. MONET is an architecture that, in principle, allows for any kind of mathematical web service. Still, mainly computational web services have been developed and evaluated within that framework. Web services can register with a central MONET broker that accepts requests, which do not directly call a web service but consist of a problem description (e. g., solve an equation, given as an OpenMath expression). The broker then forwards the request to the best-matching service. The above-mentioned MathDox allows for access to MONET web services via a document interface. Asynchronous communication with a server backend (AJAX [Gar05]) allows for client/server interaction without submitting forms. It is a prerequisite for responsive browser-based applications: A client-side script can exchange small data packets with a server backend and insert responses from the server into the current page. This technique is employed by MathDox [Cuy+08] and Gerdes’s frontend to their feedback service [Ger+08]. Despite the efforts mentioned above, there is still a lot of static mathematical content on the web. Where documents act as frontends to web services, as in the above-mentioned systems, web browsers [Mic]. Similarly, a simple search in scientific digital libraries will reveal a multitude of interpretations of that term. In the discourse in our research group, it also refers to versioned documents in their lifecycle and documents generated adaptively to user preferences, both aspects that are only marginally covered in this thesis. Therefore, the more restricted term “interactive document” will be used here.
93
3 Services for Mathematical Knowledge Management
integrated backend or independent web services
unit converter
content dictionaries
notation collection
Web Services renderer
proxy
menu layers Client Modules
elision
mouse keybd GUI
Document
action objects
folding
unit conversion
definition lookup
notation selection
initially generates
Services
XHTML+MathML+OpenMath, JavaScript
Figure 3.6: JOBAD Architecture. Note the central role of the rendering service, which both generates JOBAD-compliant documents and is needed for many other services. they have usually been designed to give access to a small selection of web services performing very specific tasks (mostly giving feedback to exercises and symbolic computation) – as is the case with ActiveMath and MathDox. Our goal is to facilitate the integration of diverse web services into mathematical documents – inspired by the Web 2.0 technology of mashups [O’R05; Ank+08]. Originally, mashups were handcrafted JavaScripts pulling together web services from different sites. Since then several mashup development kits have been developed, e. g., Yahoo! Pipes [Yah] or Ubiquity [Moz]. We aim at a similar development kit for mathematical applications. Active Essays (Yamamiya)
3.6.2 The JOBAD Architecture The JOBAD architecture consists of a document format that enables interactive documents, mathematical services operating on these documents, a user interface giving access to them, a lightweight communication protocol for connections from a document to web services, and a set of generic communication and document manipulation functions used by the user interface and service components. The document format is XHTML +MathML +RDFa with embedded semantic structures, as specified in section 2.4.5. We assume a broad notion of “service” including local interactive functionality as well as any service with an HTTP interface, regardless of whether it complies with a “heavyweight” web service standard like XML-RPC or MONET. While a JOBAD-enabled document could be authored manually and the URLs of remote services could be hard-coded into the document, we rather assume that documents are automatically rendered from a contentoriented representation, e. g. using the JOMDoc renderer introduced in section 2.3.3, and that the server backend that serves the rendered documents also controls what services they have access to by putting appropriate initialization code for those service modules into the document. We distinguish three kinds of interactive services by the amount of data they exchange with a web service backend: 1. services that merely draw on data embedded into the document and
94
3 Services for Mathematical Knowledge Management
operate on the client only, 2. symbol-based services, which send a symbol identifier and an action verb to a web service in order to retrieve additional information, 3. and expression-based services that send complex content-markup expressions to a web service.The decision which kind of service should be used for a particular functionality depends considerably on the format used for interactive documents: Some meta-information about the current appearance may be embedded into the document to permit local interaction, for example, alternative presentations or parallel markup. The decision also depends on performance considerations. For example, one can expect that definitions of symbols will only be looked up occasionally, so it would be a waste of resources to put all definitions of all symbols into hidden parts of a document when rendering it. We take a pragmatic approach: Whenever it is not feasible or not desirable to embed all information required by a service into the document, the document must contain sufficient information that instructs the JOBAD client how to retrieve the missing information from a server backend. Information that was retrieved once should be cached on the client in order to avoid later, unnecessary requests. The easiest way of realizing a mathematical web service is to expose functionality via an HTTP interface. When adopting the REST pattern [Fie00], URLs directly represent mathematical resources (e. g. OpenMath symbols). This can be used, for example, to retrieve the definition of a symbol. More complex services act on the selected expression. In those cases, we use parallel markup to obtain the corresponding content expression and include it in the body of the HTTP request. 3.6.2.1 Service Advertisement The JOBAD framework neither commits to a fixed set of services nor to a fixed set of user interface elements. Rather do we specify the way of how a JOBAD-compliant document server advertises available JOBAD modules (cf. listing 3.1). For each module the document server chooses to enable in the interactive documents it serves, it shall serve the corresponding JOBAD JavaScript module to the client. In the head of any interactive document served, each JavaScript module must be initialized. To modules that access web services, a URL of a web service backend they can connect to must be provided. The URL passed to a symbol-based service on initialization may contain placeholders for CDBase, CD, and Name of a symbol. The web services may be components of an integrated backend that also contains the document server, but they can also be remote web services that the document server is aware of.
3.6.3 User Interface JOBAD offers various user interface elements for input and output. By right-clicking, a context menu can be requested for the object under the cursor or for the range of selected objects. A selection can be made in the usual way of dragging the mouse, but we are also planning a selection that expands on repeated clicking, from atoms to terms to larger expressions. Alternatively, there are keyboard shortcuts that can be pressed as soon as a selection has been made. While performing actions on the current selection makes sense for services like folding or definition lookup (cf. section 3.6.5.1), other services, such as elision (cf. section 3.6.4.2), should rather be made available globally. While this has not yet been realized within JOBAD, two possible approaches are global
95
this is actually future work
3 Services for Mathematical Knowledge Management
Listing 3.1: Service initialization code in a document // GUI elements to be enabled jobadInit("ui/contextmenu"); // loads the context menu // In-document services jobadInit("service/elision"); // Web services jobadInit("service/definition-lookup", "Look up definition", "http://jobad.mathweb.org/backend?action=definition-lookup &cdbase=$cdbase&cd=$cd&name=$name");
keyboard shortcuts, and utilizing the user interface of the integrated environment, e. g. global toolbars. Actions provided by services are represented by generic objects, which allows for providing diverse access to them. The same action can, e. g., be triggered via a local context menu, from a global toolbar, and via a keyboard shortcut – given that these user interface elements have been enabled. Besides rewriting formulæ, which will be explained below for the examples of definition expansion and unit conversion, JOBAD offers tooltip-like popups for displaying information on demand. These could be annotations hidden in the document (e. g. the author-defined descriptions mentioned in section 2.4.5.1), but we mainly use it for displaying responses from web services, such as the definition of a symbol that the user wanted to look up (cf. section 3.6.5.1).
3.6.4 In-Document Services The two in-document services that we have developed so far deal with elision, which is a common practice in mathematics. Experienced mathematicians frequently use shorthand notations to prevent distracting the reader with information that can be deduced from the context. In a first step, reading aids that support inexperienced readers in understanding the structure of an expression are removed, without affecting the well-formedness of the expression. Reading aids that we have investigated so far include redundant brackets and type annotations (cf. section 3.6.4.2). In a second step, entire ranges of symbols are removed from the expression and replaced by ellipses (usually “. . . ”). Ellipses are commonly used in discrete sets, vectors, and, most outstandingly, in matrices [SS06]; they abbreviate a finite or infinite range of discrete values that follows “obvious” construction rules, e. g. 1, . . . , n, or is not relevant for the current mathematical consideration, e. g. when discussing diagonal matrices that may have any content off the 0 ⎞ ⎛ d1 ⎟ [SS06]. Ellipses leave an expression in a state that is no longer strictly ⋱ diagonal: ⎜ ⎝ 0 dn ⎠
96
any study supporting this claim?
3 Services for Mathematical Knowledge Management
well-formed but still follows strict conventions. We have investigated folding (i. e. dynamic showing and hiding) of subterms and elision of presentational structures that guide the reader (here: brackets and type annotations) so far; ellipses are considered a more complex topic subject to future work. The presentation framework that we rely on (cf. section 3.1.1) operates in two steps: composing visual sub-presentations to larger ones, and then eliding formula parts that can be deduced from the context. If the desired output format is an interactive one, such as XHTML +MathML, the latter parts are not actually removed from the output, but only made invisible, so that interactive services like the ones described here can display them on request. 3.6.4.1 Folding Subterms and Undoing Interactions In section 2.4.5.1, I described how to render author-defined abbreviations for subterms given in the content markup into presentational maction elements that allow for switching between the term and its abbreviation. JOBAD’s folding service makes this switching operation accessible on its user interface and executes it by changing the value of the @selection attribute. Besides that, we have implemented interactive folding of arbitrary subterms, so that a reader can hide them if he feels distracted. Any subterm that is properly grouped (see requirement 2 in section 2.4.5.1) is eligible for folding. When the user requests folding of a subexpression for the first time, we put both the original subterm and its folded version into a dynamically generated maction element with actiontype folding for making the folding action undoable. This is a general pattern that will also be needed for other services: whenever a service rewrites a term t into t ′ (here: any subterm t into “. . . ”), an maction is created on the fly, using the name of the service as the value of the @actiontype attribute. This preserves the previous state t of a term is preserved in the second, hidden child of the maction, and the user can switch back to it. Not only do interactions become undoable locally, but they also become redoable: When information from a remote web service has been used to rewrite t ↝ t ′ , as is the case with unit conversion, e. g., and the user switches back to t, he can recover t ′ without causing the information to be retrieved from the web once more, as it is still cached in the maction. As an example, consider the expression [1 + [2 ⋅ x]], where square brackets denote mrows. Suppose the user selects [part of] the subterm 2⋅x or right-clicks somewhere in that term and requests it to be folded. Then, the formula will display as [1 + . . . ]. Clicking on the dots and selecting the “unfold” action from the user interface (e. g. the context menu) will restore the original appearance. 3.6.4.2 Flexible Elision and Display of Reading Aids The most prominent presentational structure that aids reading mathematical expressions are brackets: They are put around subterms in order to explicate their grouping structure. However, they are usually omitted in two cases: (i) when the operator of the current subterm binds stronger than the operator of the enclosing term (cf. section 2.3.4.5), or (ii) when the current subterm is formed by a constructor that has outer fences itself, e. g. the set constructor; consider {a, b} ∩ {b, c} vs. ({a, b}) ∩ ({b, c})21 . Particularly in the first case, bracket elision can be confusing for inexperi21
Such fences look like brackets but are not brackets in the strict sense, as they may never be omitted.
97
how to express this?
3 Services for Mathematical Knowledge Management
enced readers who do not know the binding precedences of all operators in a complex expression. This becomes apparent when operators from multiple fields of mathematics occur together in one expression. Consider the following example22 : 5(x + y)n+3 ≤ (ab)! ∨ ¬p ∧ ¬(q ≤ π)
(3.1) find out
Here, some additional brackets clarify the structure, . . . (5(x + y)n+3 ≤ (ab)!) ∨ (¬p ∧ ¬(q ≤ π))
which ones of the two is
(3.2)
able!
. . . or maybe some more, . . . (5(x + y)n+3 ≤ (ab)!) ∨ ((¬p) ∧ (¬(q ≤ π)))
more read-
(3.3)
. . . whereas the fully bracketed structure would again be unreadably cluttered: ((5 ⋅ (x + y)(n+3) ) ≤ ((a ⋅ b)!)) ∨ ((¬p) ∧ (¬(q ≤ π)))
(3.4)
The JOMDoc rendering algorithm outlined in section ?? enables a flexible elision of redundant brackets. Now, redundant brackets are not omitted completely, but only hidden, and annotated with the difference between input and output precedence as the elision level, which can be an integer number, or one of the two special values infinity or −infinity (cf. listing 3.2). Thus, the decision whether brackets should be displayed can be deferred to the time of reading a document. The user can then set a visibility threshold for elision levels; any bracket with an elision level below or equal to the threshold would be displayed. Listing 3.2: An elidable bracket in Presentation MathML (
The sequence of successively displaying more brackets from formula 3.1 to 3.2 to 3.3 above can be achieved by giving the operators involved the following precedences:23 ¬ ≤ ∧ ∨ 600 700 800 850 The pairs of brackets that are additionally displayed in formula 3.2 thus have elision levels of 150 (≤ vs. ∨) and 50 (∧ vs. ∨), which means that a visibility threshold of at least 150 is required to achieve that appearance. Increasing the threshold to 250 would lead to state 3.3. In the demo implementation shown in figure 3.7, we have additionally used the elision level information to display content with a high elision level in a lighter color. I leave the question for the most reasonable precedence values for these operators unanswered. This is subject to further investigation, once the elision service has been deployed to actual users. It might turn out that the elision level a user would assign to a particular bracket depends on 22
This example is, admittedly, contrived, but cases with operators from two or three fields are common, e. g. expressions with set operators and logical operators. 23 Here, I assume same values for input and output precedence for simplicity.
98
file under “future work”
3 Services for Mathematical Knowledge Management
Figure 3.7: Demo of bracket and type elision with global visibility threshold control and color depending on elision levels
99
3 Services for Mathematical Knowledge Management
his background knowledge [of the operators involved] and other personal preferences. It might even turn out that a totally ordered range of precedence values leads to undesirable results, which would make a partial order, as implemented by Autexier et al. (cf. section 2.3.4.5) more desirable. Combined with a notion of numeric elision levels this would lead to a function d that compares two operators and either returns a numeric value for their precedence difference, or undefined if they are not comparable, e. g. due to a type mismatch. For example, ¬(a ≤ b) is well-typed, as ≤ is of type number × number → boolean and ¬ is of type boolean → boolean; therefore, the notion of an elision level of the brackets in this expression is well-defined. (¬a) ≤ b is not well-typed, and thus it does not make sense to compute an elision level here. On the other hand, partially ordered precedences pose new challenges for collaborative knowledge management, as it would be easy to accidentally introduce cycles. Besides brackets, our rendering service supports other elision groups. So far, we have investigated type annotations: some of them are essential for determining the type of an expression, whereas others can be inferred. Consider the following expression:24 andI = λFGX ι .F(X) ∧ G(X)
(3.5)
Suppose it is background knowledge that the boolean conjunction operator ∧ is of type o → o → o. Then, all types in expression 3.5 can be inferred from knowing that X is of type ι: the types of the predicates, . . . andI = λFι→o G ι→o X ι .F(X) ∧ G(X)
(3.6)
. . . and the type of the complete expression: andI(ι→o)→(ι→o)→ι→o = λFι→o G ι→o X ι .F(X) ∧ G(X)
(3.7)
Finally, one could show all type annotations: andI(ι→o)→(ι→o)→ι→o = λFι→o G ι→o X ι .Fι→o (X ι ) ∧o→o→o G ι→o (X ι )
(3.8)
Note the interplay with bracket elision: As the function type constructor → is right associative, the type of the complete expression would render as (ι → o) → ((ι → o) → (ι → o)) with all brackets displayed. Again, a redundant type annotation in an interactive document is represented as an maction, just with a different elision group, as shown in listing 3.3. Listing 3.3: An elidable type annotation in Presentation MathML F ι → o 24
The base formula is courtesy of Michael Kohlhase and cited from his lecture notes on computational semantics of natural language. The formula defines the semantics of the “and” connective for verb phrases in natural language (e. g. “Ethel howled and screamed”) by reducing it to the boolean conjunction.
100
3 Services for Mathematical Knowledge Management
Our elision service allows the user to choose one visibility threshold for each elision group. If Tg is the threshold of elision group g, then all elements of group g whose elision level is above Tg are invisible. This is realized by using maction elements with action type elision that switch between an expression and an invisible mspace element. This also permits a document to provide initial visibility status for its elements. Besides one hand-made demo document shown in figure 3.7, we have not yet investigated elision of redundant type annotations further. Our implementation does not yet generate them automatically, but it seems feasible to add an implementation of a type calculus to the presentation pipeline, which would add inferred type annotations to the content markup before having it rendered. The best user interface for controlling visibility thresholds has to be determined yet. We believe that the thresholds for different elision groups should be controlled separately. We also believe that it is adequate to the complex rules of when reading aids are redundant if their display is controllable in a rather large scope. So far, we have implemented control on formula and document level, but not yet on the level of a complete document collection or system. It is likely that many distinct elision levels will occur in a document, but we believe that the reader will not care about their exact discrete values. Therefore, a slider seems most suitable for controlling the visibility threshold. Alternatively, if a slider is not available, a sequence of discrete radio buttons could be used (cf. figure 3.7), or a sequence of keyboard shortcuts (e. g. ranging from 0 to 9).
3.6.5 Symbol-Based Services Symbol-based services send a symbol identifier and an action verb to a web service in order to retrieve additional information about that symbol. So far, we have implemented support for looking up the definition of a symbol. Similar services, which can be implemented analogously, would look up type declarations or examples. Another symbol-based service, which I have developed conceptually so far, but not yet implemented, is interactive notation switching. In the backend, these services require some notion of content dictionaries – e. g. OpenMath Content Dictionaries, or OMDoc theories. 3.6.5.1 Definition Lookup The definition lookup service sends the ID of a symbol σ to the server and expects as a response a content-markup formula containing a term that defines σ. This complements the linking of symbols to the place where they are defined (cf. section 3.1) by a mode of interaction that does not force the reader to abandon his current context. Our implementation of definition lookup uses the RESTful URI format introduced in section 3.6.2.1. The information that we want to look up is encoded by the value definition-lookup of the action parameter. In the following, I will use def(σ) for the definition of a symbol, as looked up by this service, i. e. the whole definition with formal and informal parts. On the server side, the lookup is enabled by representing content dictionaries (CDs) in OMDoc [Koh06b]. There, a symbol with type declaration and definition is represented as shown in listing 3.4, which allows for easy retrieval, e. g., using XQuery (see listing 3.5). The situation in
101
later: subsection
3 Services for Mathematical Knowledge Management
an OpenMath CD is similar, but as “definitional mathematical properties” (DefMPs) have not yet been specified, there is no standard way of identifying a definition of a symbol among its various mathematical properties.
any useful reference to the discussion on
Listing 3.4: A symbol and its definition in OMDoc 1.2/1.3
DefMPs?
C → C sin z
=
1 (e i z 2i
− e −i z )
We have implemented the server side of definition lookup for the TNTBase backend [ZK09; Koh+09a]. There, the symbol is retrieved from the complete collection of OMDoc documents, and then the definition of that symbol is looked up in the same theory. This can be done with a single XQuery, as shown in listing 3.5, whose performance is speeded up using indices25 .
ref validity maybe a bit
Listing 3.5: XQuery for definition lookup
more detail
for $def in collection() //theory[@xml:id eq $cd] //definition return $def[tokenize(@for, ’\s+’) = $name]
on that, so that we can compare to
This query ignores the CDBase of the symbol, thus assuming that there are no two different theories of the same name. The query takes into account that an OMDoc theory might not contain symbol declarations and definitions as direct children, but only as children of intermediate grouping constructs (omgroup). It also takes into account that one definition can define more than one symbol: consider a mutually recursive definition of the predicated odd and even, which would be given as in OMDoc. While definition lookup has not yet been implemented in the SWiM environment, it would be even easier and more efficient there, as SWiM maintains an RDF graph containing structural information about theories, symbols, and definitions. A symbol node in this graph is identified by the symbol’s URI, which can be computed trivially by concatenating CDBase, CD, and name of the symbol, thus relieving us from the task of first locating the symbol declaration in the document collection. Then, the SPARQL query for the definition of that symbol reduces to SELECT ?d WHERE { ?d o:defines . }
Neither query given here takes into account that in an invalid document collection one might have multiple definitions claiming to define the same symbol. Here, I assume the OMDoc 1.2/1.3 syntax, where definitions are separate elements referring to their symbols by name. In OMDoc 1.6, this is expected to change to a syntax that is both easier to query and prevents having multiple definitions of the same symbol: symbols will then contain their definitions as child elements. 25
This query assumes sound OMDoc markup, i. e. that the symbol has been declared and that the definition is in the same theory.
102
SPARQL //o:symbol doesn’t yet work for same symbol name in nested theory ref
3 Services for Mathematical Knowledge Management
Figure 3.8: Looking up a definition (left: selecting the action, right: the result); example taken from our lecture notes; cf. Sect. ??. Our current client-side implementation displays render(def(σ)), where render∶ content → presentation is the rendering service (cf. section 3.6.6.1), in a tooltip overlay at the cursor position, as shown in figure 3.8. Content Negotiation As the desired MIME type of the response can be given in the HTTP request header for so-called content negotiation, one can distinguish requests for content markup from requests for a rendered formula while still using the same URL, as shown in listing 3.6. Analogously, the MIME type application/xhtml+xml would be used to obtain a response rendered in XHTML with Presentation MathML. Retrieving content markup (e. g. OMDoc or OpenMath) makes sense for definition expansion (see below), or when the content dictionaries are not in the same backend as the rendering service (cf. section 3.6.8) and thus the rendering service would have to be called separately. Directly retrieving presentation markup is more efficient when looking up definitions using a backend that contains both the content dictionaries and the rendering service. Listing 3.6: A request for the OpenMath source of a definition GET /backend?action=lookup-definition &cdbase=...&cd=transc1&name=sin HTTP/1.1 Host: jobad.mathweb.org Accept: application/openmath+xml
Definition Expansion An alternative to definition lookup is definition expansion. There, an occurrence of a symbol in a formula is replaced by its definition. This works differently depending on the type of definition. OMDoc supports the following types, which are common in mathematics [Koh06b, section 15.2.4]: simple: A plain symbol σ is defined by an expression σ ∶= def f σ 26 , which can be substituted for σ; consider the definition of one as the successor of zero: 1 ∶= s(0). pattern: A symbol that can be applied to arguments is defined by an expression, in which the arguments occur, but not the original symbol; consider the above-mentioned definition of 26
I use def f to denote the right side of a definitional equation, in case it exists uniquely.
103
cite ref
3 Services for Mathematical Knowledge Management
the sine function by the equation sin z ∶= e. g. for positive vs. negative arguments.
1 iz 2i (e
− e −iz ). There can also be several cases,
inductive: like “pattern”, but the symbol to be defined may also occur on the right side of an equation. Integer addition, for example, can be defined by recursion on the second argument: x + 0 ∶= x, x + s(y) ∶= s(x + y). implicit: An equation is given that has exactly one solution for the symbol to be defined; consider the following implicit definition of the exponential function: exp′ = exp ∧ exp(0) = 1. I can now describe how interactive definition expansion works27 – first on the level of content markup. For simple definitions, the symbol can simply be replaced by the right side of its definition. For pattern-based and inductive definitions, the equation for the right case has to be found by matching against the occurrence of the symbol in the formula. Then, the symbol and its arguments can be replaced by the right side of that equation, substituting the correct values for the arguments. The unification required heremay be non-trivial, so we recommend leaving it to a symbolic computation service that has access to the same content dictionaries. Simple cases, however, such as matching a left side that consists of a function with atomic arguments – like sin z in the example above –, can be implemented in a straightforward XML-processing way on the JOBAD client side. Finally, expansion of implicit definitions is not possible at all. Several interaction modes for definition expansion are possible. If a symbol occurs more than once in a formula, one could either only replace the occurrence the user selected, or all occurrences. For an inductive definition of a symbol σ, the service could perform one step of expansion only, then requiring the user to request the next step for the remaining occurrence(s) of σ, or one could offer a user interface that allows to request the desired number of expansion steps – a positive number, or “maximum”, i. e. until termination – at once. So far, I have described how definitions are expanded in content markup, but after expansion, the resulting expression will also have to be rendered. As long as compositional notation definitions are involved, it suffices to render the expanded expression e ∶= def f (σ), and to substitute render(e) for the original occurrence of the symbol in the presentation markup. This is possible thanks to the cross-linked parallel markup that we assume (cf. section 2.4.5.1). However, definition expansion must not violate the integrity of the cross-linked parallel markup of the whole formula. Therefore, we also have to substitute the content-markup part of render(e) for the contentmarkup counterpart of σ in the content markup of the formula. This approach will no longer work once non-compositional notation definitions based on pattern matching are involved. Suppose the symbol σ, defined as σ ∶= τ, occurs in a formula as f (σ), and a special notation is defined for f (τ) – then, the latter expression would have to be re-rendered. A sophisticated approach to this problem would probably maintain information about the compositionality of notation definitions involved in rendering a formula in the markup created by the renderer and then re-render the minimum necessary subterm. Given that (i) formulæ in documents are usually quite small, (ii) the time required for communication between client and server by far outweighs the time required for rendering- and substitution-related computations on client or server, (iii) and we can plausibly expect interactive definition expansion to be used infrequently and highly selectively, 27
This feature has not yet been implemented.
104
Is this unification? How hard is it?
3 Services for Mathematical Knowledge Management
the pragmatic approach of simply substituting σ by e in the content markup and re-rendering the whole formula seems reasonable – however, in that case, definition expansion would no longer be undoable. Undoing the expansion requires substituting the rendered expansion render(e) for the rendering of the original symbol as locally as possible, putting the original rendering into an maction container (cf. section 3.6.4.1). 3.6.5.2 Interactive Notation Switching Given that our renderer supports alternative notations per symbol, interactively switching among them should be supported. Such a service can act as a complement to adaptively generated document, or used in the absence of automatic adaptation, so that a reader can interactively adapt the notations in a document to his preferences. While this service remains to be implemented, I have designed it conceptually and will thus describe its operation. The only information that we need in addition to the one we require for our formula markup anyway (cf. section 2.4.5.1) is, for each symbol, what rendering from a notation definition has been used to render it. In accordance with requirement 4 and with Christine Müller’s work on context-sensitive adaptation of documents, our renderer implementation provides the URI of the rendering in an @ec attribute (EC = extensional context; cf. [Koh+09b]) to the topmost presentation element generated by a rendering, if the “rendering info” option is set28 . When the user wants to change the rendering of a symbol σ, the server is queried for the URIs and textual descriptions of all alternative renderings, given the URI of the rendering currently used. A menu is populated with this information, and the user can select the desired rendering. Then, the content markup of the formula is sent to the rendering service, specifying the URI of the desired alternative rendering as extensional context in an @ec attribute on the occurrence of σ in the content markup, thus requesting the formula to be re-rendered, using the desired rendering for σ. As said above (in section 3.6.5.1) for definition expansion, it depends on the compositionality of the notation definitions involved, whether we can confine ourselves to re-render the affected subterm, or whether the whole formula should rather be re-rendered. Same as for definition expansion, the question is also open how to handle multiple occurrences of the same symbol in one formula – whether to change the notation for the occurrence selected by the user, or for all occurrences.
3.6.6 Expression-Based Services Expression-based services send complex content-markup expressions to a a web service. A web service that receives a content-markup expression and returns a rendered fragment of presentation markup is at the core of our architecture. In the previous sections, it has already been described from the point of view of other services that use it. Other than that, we have implemented unit conversion by integrating an actual third-party service. The motivation for that was, on the one hand, that we considered unit conversion a relevant service for scientific and technical 28
The actual implementation that currently exists provides a whitespace-separated list that, following the rendering that has been used, also contains all alternative renderings known to the renderer. However, I argue that the server will have to be contacted anyway, in order to obtain descriptive labels for the alternative renderings.
105
3 Services for Mathematical Knowledge Management
documents, and, on the other hand, that we wanted to find out how easily a third-party service could be integrated, given that the other components of the architecture (document manipulation, client/server communication, user interface, rendering) had already been in place at that time. 3.6.6.1 Rendering The rendering service is a prerequisite for making output from other services human-readable. In its simplest form, it accepts as input (in the body of an HTTP POST request) a fragment of OpenMath and returns the result of rendering it to Presentation MathML, following the guidelines established in section 2.4.5.1. As the rendering service is most reasonably offered by the backend that holds the content dictionaries (cf. section 3.6.8), and, as a part of them, the notation definitions for the symbols (cf. section 2.2.5), its implementation depends on the environment of the backend. Its implementation in the initial proof-of-concept backend, as well as the TNTBase backend, relies on the JOMDoc library (cf. section 3.1.1), whereas its implementation in the MMT system relies on the MMT library (cf. section 3.1.1). importance
3.6.6.2 Unit Conversion
of conver-
In physics and engineering, many equivalent but different units are known (e. g. imperial units vs. SI units). We want to enable the reader of a document to interactively request any unit he is not familiar with to be converted to a more familiar unit. There are lots of unit converters on the web (see [Str08] for a survey), but instead of manually opening one and copying numbers into the entry form of such a converter, or of a locally installed calculator, we want to enable an in-place conversion. Same as for definition lookup, this allows the reader to focus his attention on the document. The unit conversion service assumes the OpenMath encoding for physical quantities as specified in [DN03] and shown in listing 3.7: Base units are symbols in special CDs; derived units can be formed by multiplication or division of base units with numeric factors or other base units. The unit conversion service accepts one such expression o, plus a target unit u t . If a conversion is possible, the result is returned as an OpenMath expression, which I denote by uc(o, u t ). On the client side, this result has to be integrated into the current formula. Let p with o = c(p) be the presentation markup that the user selected; then we add p′ = render(uc(o, u t )) as an mactionalternative for p to the document to achieve undoability (cf. section 3.6.4.1). Listing 3.7: A physical quantity in OpenMath 1.5
We have not implemented our own unit converter but use the one developed by Stratford and Davenport [SD08; Str08], which performs conversions according to the OpenMath FMPs
106
sion, Mars orbiter
3 Services for Mathematical Knowledge Management
of the unit symbols involved29 . In its current version, their web service does not talk OpenMath, despite being internally based on OpenMath, but uses string input/output. Therefore, we have to convert physical quantities between their OpenMath and string representation (e. g. “1 metre”). We have not done this for compound units (e. g. ms ) and prefixed units (e. g. km), but instead rely on the developers of the OpenMath unit converter to expose the OpenMath interface soon. The initial user interface offers conversion to a few hard-coded target units. Internally, the OpenMath unit converter knows whether a conversion is admissible, from analyzing the FMPs that define the unit symbols and constructing a graph of possible conversions. With an additional web interface, this information could be exposed. Then, the interaction with the unit converter would work as follows, here outlined for the context menu user interface: (i) The user opens the context menu for the selected expression. When this expression is recognized as the product of a number and a unit (a simple XPath node test on the content markup30 ), the unit conversion submenu is enabled. (ii) When the user opens the unit conversion submenu, a query for possible target units reachable from the source unit us is sent to the unit conversion web service. The service returns a list of symbol identifiers and labels, which are used to populate the submenu. (iii) When the user picks a target unit u t , the conversion is performed, as described above. While a context menu enables local conversion, a reader’s preference for certain units is actually global. A reader who is not familiar with imperial units probably does not want to see any such units in a document. Therefore, unit conversion should also be enabled document-wide using, e. g., a global toolbar. If a document adaptation engine is available, the user should actually be able to edit his unit preferences as a part of his user profile, and then expect the adaptation engine to perform unit conversion before serving a document.
3.6.7 Services Beyond Formulæ Rhetorics visualization: based on markup for SALT/RST-like structures integrated into OMDoc; translated to XHTML + microformats plus JavaScript by XSLT [Gic08], demo available at [Job]. Next steps: integrate into JOBAD, use RDFa
3.6.8 Integrated Backends and Environments For a clean conceptual model, I have treated web services separately. From an efficiency point of view it does, however, make sense to arrange multiple services in an integrated backend. Consider unit conversion: Stratford’s unit conversion web services internally relies on OpenMath content dictionaries that declare one symbol per unit and define conversion rules for obtaining derived units [Str08]. Definitions of symbols are looked up from content dictionaries as well. Last but not least, the rendering service needs notation definitions for the unit symbols, and content dictionary authors often provide default notations for their symbols. Thus, offering those three services independently requires redundantly storing knowledge about symbols in three places. 29
These are not strictly definitional FMPs, as such a feature has not yet been introduced in OpenMath, but in the unit content dictionaries there is always only one FMP per symbol, which the unit converter considers to be definitional. 30 . . . if we assume an easy check for whether a symbol is a unit. So far, all names of unit content dictionaries start with units_.
107
cite CM, or ref sect
3 Services for Mathematical Knowledge Management
An integrated backend also saves time, as can be seen for definition lookup: With separate lookup and rendering services, the client-side active document has to connect to two web services in succession. An integrated backend could, however, offer readily rendered definitions by composing two of its internal functions and only minimally extending its external HTTP interface (cf. section 3.6.5.1). 3.6.8.1 Proxy to Third-Party Web Services I have so far left out one problem from the practice of web development. The backend that serves the documents also has to act as a proxy for communicating with third-party web services on different hosts, such as the unit converter, to avoid security problems due to cross-site scripting. 3.6.8.2 Integrated Backend Implementations We have implemented an integrated proof-of-concept backend that performs rendering (using JOMDoc) and definition lookup (using XPath queries over a few hard-bundled OMDoc documents). TNTBase FlOMDoc (later: SWiM)
3.6.9 Further Possible Services automated theorem proving: out of my scope In Section 3.6, we present the main component of JOBAD, a collection of small JavaScript modules that add interactive services to a mathematical document. In Section ??, we present several web services that we have implemented and describe how to integrate third-party services. In Section ??, we briefly describe a first JOBAD case study, and we conclude in Section ??. making them accessible: the JOBAD definition lookup and unit conversion examples.
3.7 Integration with Knowledge Bases 3.7.1 Storage Backends We have to consider two aspects of storing mathematical knowledge: storing the original documents, which are written in a semantic markup language, and making [parts of] the knowledge from these documents accessible to services working on them in appropriate representations, which may differ from the original document languages. For the first aspect, any backend is suitable that at least allows for storing one document as one unit. This can be as simple as the file system, or a database in which each document is one field. However, our interactive services are not necessarily interested in whole documents, but in semantically relevant fragments thereof, such as definitions (cf. section 3.6.5.1). Therefore, any storage backend that facilitates accessing such fragments is to be preferred. This can, for example, be an XML database that makes fragments 16
Old Part: integrate
108
BegOP(16)
EndOP(16)
3 Services for Mathematical Knowledge Management
accessible via XQuery, such as eXist [Exi] or Oracle Berkeley DB XML [Ber]. The versioned XML database TNTBase, which relies on Oracle Berkeley DB XML but will also offer specific features for OMDoc documents, is a particularly suitable candidate [ZK09; Tnt]. A higher-level database is needed for semantic structures extracted from the XML documents (cf. section 2.4.2). As I have defined most of the semiformal semantics of mathematical documents by an XML→RDF translation, an RDF database (commonly known as “triple store”) is required. For the particular case of definition lookup I have demonstrated how a SPARQL query on RDF level allows for abstracting from the XML syntax of the mathematical documents, compared to an XQuery (cf. section 3.6.5.1). The difference between XQuery and SPARQL in terms of expressivity and abstraction becomes even more obvious when taking inference into account. XMLbased systems offer limited abstraction from the concrete syntax of an XML documents by way of abstract XML schema datatypes (e. g. a datatype statement that comprises the OMDoc elements for definitions, axioms, assertions, etc.) and custom XQuery functions. An RDF-based ontology with the possibility to define class and property hierarchies provides an easier and more powerful approach to abstraction. The difference between XQuery and SPARQL in terms of performance becomes apparent when links have to be queried both in forward and in backward direction, and when queries contain complex joins. XML databases, such as the above-mentioned eXist and Oracle Berkeley DB XML, can be set up to index attributes, child elements, and sometimes links.31 Assuming a suitable XML→RDF translation that represents links as properties, SPARQL transparently allows for querying them in both directions, and for joining resources across multiple link steps, which XQuery can only emulate via functions. SPARQL can not only cope with links that are explicitly represented in documents, but also with inferred RDF properties (e. g. a “symbol depends on symbol” relation assumed when one symbol occurs in a formula in the definition of another symbol). It does, however, depend on the reasoning engine attached to the RDF store what kinds of properties can be inferred. Some RDF databases, such as OpenLink Virtuoso [Olv], implement extensions to SPARQL that allow for querying transitive closures of properties even in the absence of a reasoner that can compute them.32 In mathematical knowledge bases, transitive closures are particularly crucial for dealing with theory graphs (spanned by the import relation) and computing dependencies.
3.7.2 Import/Export Interface While I am aiming at integrating interactive services for mathematical knowledge in a coherent work environment (cf. chapter 4), I am well aware of the fact that it will not be feasible to integrate all available services and tools. External tools are likely to rely on different interfaces than the ones native to our knowledge base; the lowest common denominator for exchanging mathematical knowledge will usually be the file system. Especially for editing documents (cf. section 3.3), users have quite individual preferences. Despite any assistance that is aware of syntactic and semantic structures of mathematical knowledge, experienced users might still feel that they can work more efficiently with powerful general-purpose editors, such as Emacs or vi. Unless the file system itself 31 eXist and Oracle Berkeley DB XML do not offer link indices, but they are mentioned as a standard feature of XML databases by Sipani et al. in [Sip+03]. 32 Transitive closures are a special case of the “property paths” feature scheduled for the next version of SPARQL [KP09].
109
OK, or elaborate?
elaborate; e. g. transitivity in OWL and why one shouldn’t do this with large ABoxes
3 Services for Mathematical Knowledge Management
is used as a primary means of storage, this means that the storage backend should at least offer a way of importing or exporting individual mathematical documents as files, or, preferably, offer an alternative file-based view on the complete collection of mathematical knowledge. 3.7.2.1 Translating between different Knowledge Representations External tools may also prefer different file formats than our knowledge base. Just suppose we represent our mathematical knowledge in OMDoc but would also like to make [parts of] it accessible to computer algebra systems that only support OpenMath. This would require an export of OMDoc theories as OpenMath CDs.33 Or suppose an external proof assistant with its own language. External verification of a proof from an OMDoc knowledge base would only be possible after the proof and all of its dependencies (i. e. the declarations of all required symbols, and all required axioms and theorems) had been exported in the language preferred by the proof assistant. Similarly, it is desirable to import knowledge in widely-used formats. There are certainly more fields of mathematics that have been covered by OpenMath CDs than by OMDoc theories. Converting existing OpenMath CDs to OMDoc will thus help to bootstrap a subsequent OMDoc formalization of these fields. Maybe mention semantic desktop scenario
3.7.2.2 Splitting and Reassembling Documents Particularly when exchanging knowledge between a specialized database and the file system and between different languages, granularity has to be taken into account. While XML databases can give access to arbitrary fragments of XML documents, only complete documents can be exported as files. The expressive OMDoc language allows for highly modularized files that only contain a minimum amount of specialized knowledge (e. g. one particular mathematical example) but refer to other documents for additional background knowledge (e. g. the theories that introduce the symbols used in the example), which allows for flexible reuse (see [KMM07] for examples). By contrast, an OpenMath CD file can exclusively contain symbol descriptions, for which mathematical properties and examples can optionally be given. The import/export interface therefore has to take care of what units or collections of knowledge are admissible for a particular translation and possibly split or recombine units. In the remainder of this section, I will formally specify this process. Let D(V , →) be an XML document tree with node set V and parent↦child relation →⊆ V × V , and root node34 r =∶ r D ∈ V . Let s∶ V → B be a boolean predicate that is satisfied for a node v ∈ V iff it is the root of a subtree that contains a knowledge unit that should be factored into a separate unit when importing D into our database. Let t(v) ∶= {w ∈ V ∣ (v → w) ∈ →∗ } the subtree starting at root v, where →∗ is the reflexive and transitive closure of →. On import, for each node v satisfying s(v) a new document tree Dv ∶= t(v) with root v must be added
33
A basic XSLT implementation of such a translation exists in the OpenMath source repository. For simplicity I assume that the root node is an element, whereas in XML the actual root node of a document is the parent node of the topmost element node. 34
110
ref XML document model
3 Services for Mathematical Knowledge Management
to the database.35 Dv shall have a unique name id(Dv ), which may depend on id(D) and the identifier id(t) of the subtree via some function id(Dv ) ∶= f (id(D), id(t)). This splitting process shall be recursively applied to Dv . Instead of the original document D, a document D ′ shall be added to the database, in which t(v) is replaced by a new node i(Dv ) that references Dv for the purpose of inclusion. The syntax of i is determined by the inclusion facilities of the respective language; if inclusion is not natively supported, XInclude [MOV06] can be used instead. For any two documents D1 and D2 , I will write D1 →i D2 if D1 includes D2 , i. e. if D1 contains a node i(D2 ). maybe graphics
When a document D is chosen for export from the database, one must first determine if its root node r D is an admissible root element element in an external file. For this check, I require another predicate e∶ V → B. If e(r D ) is satisfied, then D shall be exported as a file. The export function must replace any inclusion reference i(Ds ) pointing to a subunit stored in a document Ds in the database by the tree Ds , i. e. it resolves all inclusions. If ¬e(r), then the export function shall be applied to the nearest parent document D p (D) with D p (D) →∗i D that satisfies e(r D p (D) ). metadata handling
3.7.2.3 An Advanced Import/Export Infrastructure I consider the above-mentioned TNTBase database [ZK09; Tnt] a good basis for a knowledge base with a comprehensive import/export interface. Additionally to its XML interface, it gives access to complete documents via a Subversion server interface [Svn], so that users of file-based tools can check out files or directories containing the desired documents. By way of Subversion’s hook mechanism [PCSF08, chapter 5], documents can be committed to the database in various formats used by external tools (e. g. STEX) but translated to the format used internally (most suitably OMDoc). This would require a post-commit hook applying an STEX→OMDoc translation (which exists) to any committed file. TNTBase wouldn’t require splitting/reassembling, as we can access fragments by their XPath. Or otherwise get the same split/merge effect using virtual files (for fragments of a larger document, or for collections of smaller fragments) → more flexibility, easier to use, not hard-coded but accessible to the user
3.7.3 Extracting Structures from Semantic Markup As introduced in section 2.4.2, the semantics of markup languages can be defined by a translation to RDF. Here, I will describe my implementation of such a translation, the Krextor framework (KWARC RDF Extractor). This section covers the general architecture and usage of Krextor; section 4.3.3 will deal with its integration into SWiM. Originally, Krextor’s functionality was hard-coded into SWiM, but then it evolved into a standalone library. Having modeled an OWL ontology for integrating OMDoc into SWiM (cf. section 2.4.1.1), an OMDoc→RDF translation was needed, which I hard-coded in XSLT from scratch, It may be necessary to add some additional maintenance information to Dv to facilitate working with that knowledge unit while it is inside the database, but I will disregard this peculiarity here. 35
111
3 Services for Mathematical Knowledge Management
after an older, hard-coded Java implementation had proven to be too unflexible to maintain. The RDF was output in the RXR notation (Regular XML RDF [Bec04a]), which was then parsed by a Java implementation reused from the IkeWiki code base. Besides that fact that an RXR parser was already available to us, RXR has two key advantages over RDF/XML, when it is processed automatically36 : (i) Its basic structure can be validated by a validating XML parser, as RDF resources only occur in attribute values, not in element names. (ii) There are excessive choices for how to write the same RDF graph in RDF/XML; Davis counted 16 ways of writing down three RDF triples [Dav05]. Later, an analogous translation was required for OpenMath content dictionaries. This led to the decision to create a generic XSLT-based framework (cf. figure 3.9) that allows developers to define translations (“extraction modules”) from any XML language to RDF more easily than in pure XSLT, as will be shown in the following. 3.7.3.1 Krextor – an extensible XML→RDF extraction framework A generic module provides convenience templates and functions for defining extraction rules in a way that abstracts from the concrete output format and instead defining the semantics of XML structures on a high level, in terms of resources and properties. Krextor’s generic “representation” of XML is a transient one; the generic module is just a step in the pipeline, grouping extracted data into triples and forwarding them to the selected output module. Supported output formats, besides RXR, are: RDF/XML [Bec04b], the text notation Turtle [BBL08], and, thanks to the Saxon XSLT processor [Kay08], a direct interface to Java, for a more efficient integration into applications. In RDF/XML and Turtle output, the triples are grouped by common subjects and predicates. This is achieved by first obtaining RXR and then transforming it to the notation desired using XSLT grouping – a compromise between efficiency and a clean separation of concerns. Syntactic sugar, offered by some RDF notations, has only partly been implemented. At the moment, there is no support for author-defined namespace prefixes, “anonymous” blank nodes (bnodes) without identifiers, and RDF containers or collections in the output. Semantically, that does not make a difference. After all, my target “audience” are not humans, who usually do not want to read raw RDF, but applications that further process the RDF and conveniently prepare it for users – as, e. g., the semantic wiki SWiM does [Lan08b]. Nevertheless, some syntactic sugar remains on my agenda, as it facilitates testing Krextor during development. Krextor is available as a collection of XSLT style sheets, with an optional Java wrapper for direct integration into applications. For scripting and debugging, there is a shell script frontend, reading XML from the standard input and writing RDF in the desired notation to the standard output. Besides the input formats mentioned so far, Krextor also supports RDFa– embedded in XHTML or other host languages, such as Open Document37 , and in the following section, I will show how it can be extended to microformats. Moreover, we have implemented a translation from OMDoc to OWL as a Krextor extraction module, which allows for authoring Semantic Web ontologies with integrated documentation and in a more modular way (cf. section 2.4.3.5). Thus, Krextor can even be used as a bridge from the XML layer into the ontology layer of the Semantic Web cake. To add an input format, one has to provide XSLT templates that map XML structures of 36 37
Overall, Beckett points out 13 advantages [Bec04a] See http://rdfa.info/2008/03/13/rdfa-support-coming-in-odf-12/ and stay tuned for Open Document
1.2 ,
112
RDFa output in JOMDoc
mention LATEXML’s XMath for LaMaPUn [Gin+09]
3 Services for Mathematical Knowledge Management
OMDoc +RDFa
RDF/XML
OMDoc/OWL +RDFa
RXR
XHTML +RDFa
OpenMath
Turtle ?
generic representation
? your format
my XML +RDFa?
my Microformat
Java callback
input format output format
Figure 3.9: Krextor’s extraction process and modules the input to calls of Krextor’s generic templates, as shown in the following section. I follow the paradigm of making easy things easy and hard things possible – an inexpensive claim, actually, but considerable efforts have been made to implement convenience templates and functions for common extraction tasks, which are easier to use than plain XSLT. There are predefined templates for creating a resource that is instance of some class and for adding literal- or URI-valued properties to the current resource. Several ways of generating resource URIs are provided, including fragment URIs of the form “document’s URI ” # “fragment’s xml:id”, but an extraction module can also implement its own URI generator(s). (The latter has been done for OMDoc, which uses a document/theory/symbol URI pattern; cf. section 2.4.3.4.) The information that the developer has to provide explicitly is kept at a minimum level: When no subject URI for a resource is given, Krextor auto-generates one using the desired generation function. When no object value for a property is given, it is taken from the currently processed XML attribute or element. As an alternative for very simple formats, where XML elements directly map to ontology classes and properties, a declarative mapping can be given as annotated literal XML. For complex input formats like the above-mentioned OMDoc, the full computational power of XSLT can be used, at the expense of readability. A new output module merely has to implement one template for lowlevel RDF generation, accepting the parameters subject (URI or bnode ID), subject type (URI or bnode), predicate (URI), object, object type (URI, bnode ID, or literal), language, and datatype. More complex output modules can be realized by post-processing output from existing output modules. Extracting Metadata to RDF Similarly to the extraction of RDF representations of OWL ontologies written in OMDoc (cf. Sect. 2.4.3.5), I implemented a Krextor extraction module for RDFa. I then divided the RDFa extraction rules into XHTML-specific ones and into generic ones, the latter of which I combined with support for our OMDoc-specific metadata syntax (cf. section 2.4.4.3). The extraction of RDFa from OMDoc is performed both in the extraction of OWL from OMDoc, where it enriches the extracted ontologies with metadata, and in the extraction of RDF outlines from OMDoc in terms of the OMDoc ontology (cf. section 2.4.1.1). The latter is a foundation for semantic web applications having OMDoc (and not OWL) as their native 17
Old Part: revise this into a reference
113
BegOP(17)
3 Services for Mathematical Knowledge Management
language, such as the semantic wiki SWiM.
EndOP(17)
3.7.3.2 Usage BegOP(18) One application area of Krextor is the semantic wiki SWiM [Lan08b]. Mathematical documents can be imported and edited in their original formats, which allows for building on existing tool support. An RDF outline is only extracted from them after storing them in the database; the RDF plus the background knowledge from the ontologies then powers semantic services – currently navigation, querying, and problem-solving assistance [Lan08b; LHC08a]. OMDoc particularly needs complex extraction rules: Its mathematical symbols have their own URI schema, and it can mix formal and informal knowledge. The RDF graph extracted from a full-featured OMDoc document consists of two parallel trees, one tree of the mathematical structure, and one of the rhetorical structure, interwoven via an annotation ontology. Despite this complexity, 21 out of the 44 templates in the extraction module for OMDoc have completely been implemented using Krextor’s convenience templates only. 15 make use of additional XPath constructs, 5 use additional, more complex XSLT constructs, and 3 use both. OMDoc as a frontend for OWL ontologies, as mentioned above and detailed in [LK09], will eventually be integrated into SWiM. The extraction of OWL from special OMDoc documents has also been implemented using Krextor. In these documents, ontologies are modeled as mathematical theories, resources are declared as symbols having definitions, axioms, and theorems. Many of these mathematical statements are modeled in a way that is more familiar to people with a logics background: the range and domain of a property is, e. g., represented by a single relation type declared for the property symbol [LK09]. The OMDoc→OWL module makes considerably more use of XPath and XSLT than the above-mentioned module that obtains the structure of OMDoc documents as RDF, but still it paid off to implement it within Krextor, as part of the required functionality could be shared with the former module. I will exemplify Krextor’s extensibility, a major design goal, by an extraction module for a simple language, the hCalendar microformat [Çel08], using the RDF Calendar vocabulary [CM05]. The extraction rules for an event and its start date are given in listing 3.10, which is considerably shorter than an equivalent implementation in plain XSLT. The first template matches any element of class “vevent” and creates an instance of the ical:Vevent class from it. When a child link annotated as the URI of the event is present, its target is used to identify the event; otherwise, a bnode is created for the event. The second template matches any element of class “dtstart” and adds an ical:dtstart property of datatype xsd:date to the current resource. Krextor’s convenience templates automatically take care of recursing to child elements, keeping track of the current resource, and reading the values of properties if they are given in a reasonable place, such as the text content of an element. Given the following sample input, Turtle output can be obtained e. g. by calling krextor hcalendar..turtle infile.xhtml on the command line:
ESWC starts on
2009-05-31.
18
Old Part: revise using crossrefs
114
adapt
EndOP(18)
3 Services for Mathematical Knowledge Management
Figure 3.10: A hCalendar extraction module a ; "2009-05-31"^^ .
3.7.3.3 Related Work WEESA [RGJ05]; also compare concerning complexity; push vs. pull
Swignition’s [Ink] architecture is very similar to Krextor’s. For end-users and web developers, it offers much richer features, including support for many microformats, other legacy ways of embedding RDF into HTML, and GRDDL links. For knowledge engineers or developers who quickly want to define an RDF translation from a new XML language, Krextor performs better, being extensible by additional input formats with much less lines of code than the Swignition Perl library. So far, GRDDL is only “supported” by Krextor in the obvious sense that it facilitates the XSLT-based implementation of an XML→RDF translation that can then be linked to a schema or to documents using GRDDL; automatically choosing the right extraction module by interpreting GRDDL annotations in the input document is not yet supported. Both systems approach integration into semantic applications differently: Swignition comes with a TCP/IP interface, whereas Krextor features a Java API and benefits from the wide support for XSLT. XSDL [Liu+04] has been cited before as a theoretically elegant alternative without an implementation (cf. section 2.4.2). As its syntax uses XML and XPath, I do, however, consider it feasible to prove the theoretical results the authors have obtained for Krextor as well by rewriting XSDL definitions into equivalent Krextor extraction modules. This would also make XSDL usable as a convenient input language for Krextor, making extraction modules look less like XSLT. XSPARQL [Akh+08] mixes SPARQL into XQuery, resulting in a query language that fully breaks the boundaries between XML and RDF. It avoids the necessity of first converting from one representation into the other. A translator from the XSPARQL language to XQuery has been implemented. Like XQuery, XSPARQL is, however, more suitable for posing individual queries against XML documents. XSLT is more natural to use for translating complete documents, as it recursively processes its input by default; in XQuery, one would have to enforce recursion. Moreover, extracting RDF
115
3 Services for Mathematical Knowledge Management
from a complete XML document requires passing a lot of maintenance information from the code that processes parent elements to the code that processes child elements. In XSLT 2, most of this can be hidden in tunnel parameters, whereas in XQuery all parameters have to be made explicit.
also reuse development pages of Krextor Trac for the choice of
Acknowledgments
programming
The XSLT stylesheets for rendering OMDoc 1.2 documents discussed in section 3.1.1.1 have originally been developed by Michael Kohlhase. In 2006 I took over most of their maintenance and integrated them into the first version of the SWiM wiki [Lan07a]. The SIOC argumentation module presented in section 3.2.2 has been developed jointly with Uldis Boj¯ars and Tudor Groza. John Breslin and Siegfried Handschuh also contributed to the original publication on that topic [Lan+08b]; further advice was given by Tuukka Hastrup, Thomas Schandl, Christoph Tempich, Max Völkel, and Stefan Decker. Editing (exact ref): Alberto González Palomo [LGP08] Metadata validation 3.4.1: Michael Kohlhase
The JOBAD architecture for integrating services into interactive documents (cf. section 3.6) has been developed jointly with Florian Rabe and has first been implemented by Jana Gičeva under my supervision [GLR09]; Catalin David is now continuing the implementation. The work on flexibly eliding and displaying reading aids, presented in section 3.6.4.2, has mostly been done before the conception of the proper JOBAD framework, in collaboration with Michael Kohlhase and Florian Rabe. The integration of the definition lookup service into the TNTBase backend has been implemented jointly with Vyacheslav Zholudev; Michael Kohlhase and Jana Gičeva also contributed to the original publication on that topic [Koh+09a]. Jonathan Stratford kindly provided support for his unit converter [SD08; Str08], whose integration we present in section 3.6.6.2. maybe Siarhei for Krextor OWL
116
language
4 SWiM – An Integrated Collaboration Environment As a proof of concept of an integrated web collaboration environment for semiformal mathematical knowledge, I have developed SWiM, the Semantic Wiki for Mathematical Knowledge Management. I will first review the state of the art in collaborative web applications for mathematics, then the state of the art in semantic wikis in general, and then describe the architecture and user interface of SWiM. Say sth. about added value considerations. Wikipedia’s motivation (fun, benefitting from the large community) are not valid here, as the knowledge is less general-purpose, and the community is small.
4.1 Tools for Math Collaboration (State of the Art) • tools with just LATEX or presentation MathML input (e. g. MediaWiki) don’t count • sites with just some mathematical content (whose structures are not exploited) don’t count (e. g. Wikipedia)
TODO: more non-wiki systems
Adessoweb, Active Essays
4.1.1 PlatΩ Managing changes to notation definitions has been investigated for the TEXmacs editor before, which has been extended towards semantic markup in the PlatΩ project [Aut+07]. The developers focus on notations that use natural language and on parsing text and formulæ the user writes in a presentational style back to a semantic representation. Both features have not yet been investigated in SWiM; here the focus is rather on making the semantic markup editable in a convenient way. As a change to a notation definition in PlatΩ/TEXmacs involves regenerating parser rules, special attention is paid to making this efficient by only regenerating those rules that are affected by a change.
4.1.2 PlanetMath Free Math Encyclopedia (cf. Wikipedia) Dedicated wiki engine, mathematical metadata and keywords used for search and navigation http://www.planetmath.org
nnexus paper
117
4 SWiM – An Integrated Collaboration Environment
4.1.3 Connexions CMS content commons idea Content MathML (but structures not exploited) Hard-coded, non-extensible metadata schema
4.1.4 vdash Integrates the automated theorem prover Isabelle Not yet released but promising roadmap: web crawling, consistency http://vdash.org
4.1.5 ProofWiki Integrates automated theorem prover Coq (future: more); prototype Prototype/mockup at http://prover.cs.ru.nl/wiki.php Don’t confuse with non-semantic proofwiki.org Proof Wiki is a prototype of a wiki that contains fully formalized mathematical content [CK07]. The semantic structures of the content are, however, only used by the integrated Coq proof assistant, not to facilitate browsing or editing. Human-readable descriptive texts are written in non-semantic LATEX.
4.1.6 Logiweb 4.1.7 ASciencePad Personal wiki (for collaboration with yourself, or a few people, w/o online sync) TiddlyWiki (local, single-user JavaScript wiki) with MathML and SVG Can calculate and graph http://math.chapman.edu/~jipsen/asciencepad/
4.1.8 (web)Mathematica MediaWiki rewrite integrating WebMathematica CAS frontend Can do computer algebra, computations, graphing, . . . http://www.mathematica-users.org/webMathematica/wiki/
4.1.9 SlugMath Semantic MediaWiki containing formal core of math lectures http://slugmath.ucsc.edu/mediawiki/
4.1.10 Math-Net http://www.math-net.org
118
4 SWiM – An Integrated Collaboration Environment
4.2 Wikis (State of the Art) The first wiki was invented in 1994 by Ward Cunningham as “the simplest online database that could possibly work” [Cun+02]. A wiki is a web server application that allows users to browse, create, and edit hyperlinked pages in a web browser, often using a simple text input syntax that corresponds to a subset of HTML (cf. section 3.3.1). Some wikis use the term “topic” for a page. Encyclopedic wikis mostly use the term “article”. Some semantic wikis (see below) refer to pages as “concepts”, as one page usually describes one real-world concept. In any case, one page usually holds knowledge about one distinct topic, or about a set of closely related topics [EGH08]. In contrast to documents in many content management systems, wiki pages are accessible via an URL containing their title. A new page can be created by entering the URL of a non-existent page, or, preferably, by linking from an existent page to the page to be created. This link will then lead to an edit form. Usually, anyone is allowed to edit pages on a wiki, but access can as well be restricted to a work group – in corporate settings, for example. Other characteristics of wikis include permanent storage of old page versions (with facilities to display differences between two versions and to restore a certain version), notification about recent changes, a full-text search and, in most cases, a simple kind of user management. Due to their simplicity and flexibility, many use cases for wikis are known, from content management to e-learning to groupware to corporate and personal knowledge management. Wikis became established chiefly through their use for public and open community projects. The first public wiki, Cunningham’s WikiWikiWeb, started as a repository for know-how on design patterns and extreme programming1 , had grown to more than 30,000 pages by 2004. The biggest and most well-known wiki, however, is Wikipedia [Wik], a project to create a free encyclopedia, started in 2001. By November 2009, Wikipedia has grown to more than 13 million pages in over 200 languages2 . The main advantages of wikis in general are openness, simplicity, as well as – thanks to hyperlinking – their incremental and organic structure3 . Considering open wiki-based communities in special, up-to-dateness, principles of grassroots democracy – for example, when discussing about the bias of some page – and the motive of learning from each other come along4 . Several disadvantages of (non-semantic) wikis will be analyzed in section.
reuse that section from the TR? compare
4.2.1 Reporting and Discussing Issues
K¯uk¯ak¯uk¯a [SX02]
In conventional wikis, there is no well-defined workflow of reporting issues with knowledge items. Many systems allow for tagging knowledge items (e. g. as “needs improvement”), or for commenting on them in more detail. In wikis, users can insert a warning message directly into a page affected by an issue, and probably add a detailed explanation or justification to the discussion page that is usually associated with every content page and thus serves as local discussion forum about one subject of interest. 1
see http://c2.com/cgi/wiki?PeopleProjectsAndPatterns Of these, about 3.1 million pages are in English and 1 million in German. 3 See [Cun+] for more. 4 A survey of many well-known wiki communities and their characteristics can be found in [Lan06b, chapter 4]. 2
119
4 SWiM – An Integrated Collaboration Environment
Technical support is mostly limited to the possibility of creating building blocks for such warning messages, which can then be included into pages, and to a button that allows for adding a new section to a discussion page which otherwise does not enforce any structure – see, e. g., MediaWiki [Med]. The discourse itself proceeds without technical support. The community is left alone with devising reasonable issue warning messages and establishing a workflow of reporting, discussing, and solving issues and documenting the solutions. This is mostly done by jointly agreeing on best practices in conflict resolution and authoring, and making them official policies for the community [Kit+07]. As a concrete example, consider a Wikipedia article. If one regards the rather informal Wikipedia articles as a result of conceptualization and formalization, only the “formalization” happens inside Wikipedia. The concepts have already existed before and most be widely agreed upon [Wik09f]. Therefore, Wikipedia’s policy demands that only the “formalization”, i. e. the way the concept is presented in an article, be discussed on the corresponding discussion page (officially called “talk page” in the English Wikipedia). Suppose that an article violates the fundamental principle of a neutral point of view [Wik09e]5 . Any author who is concerned about this can tag the article by inserting the building block “POV” (neutrality [Wik09h]). It is then recommended to justify why the neutrality of the article is debated by adding a respective section to the discussion page of the article. Within that section, the general conventions for discussions pages apply [Wik09g]: The author has to make clear what section of the article his discussion post applies to6 , he has to verbalise his report in a comprehensible way, and finally has to append his signature (a link to his user profile with a timestamp). An author who wants to discuss an existing issue has to look up the corresponding section on the discussion page and then indent his reply by one more level than the post he is replying to. Solutions to issues would be proposed in natural language only, and if users come to vote on proposals, they would do it in an ad hoc manner, e. g. using list items prefixed with “yes” or “no”. A solution for restoring the neutrality of a controversial article could be citing reliable arguments in favour of the view that has been underrepresented so far. Eventually, one author who is trusted by the community would judge whether there is a consensus about a particular solution, or simply count the votes, and then implement the solution approved by the community, again without any assistance from the system. A justification for the resulting revision of a page can be given by a descriptive edit summary that links to the section of the discussion page where the respective issue was discussed [Wik09a]. However, authors do not always do this, which sometimes makes it hard to retrace decisions. Note that for procedures with a more serious effect, above all the deletion of an article, it is more highly regimented who may implement a solution. Only users with administrative permissions, which are awarded by public vote, may technically do so. However, for the work I have done on wiki discussions, I have not assumed any such technical restrictions but have assumed well-behaved and cooperative users. Encouraging or enforcing orderly behaviour is an interesting research question in itself but not considered here. In large knowledge collections like Wikipedia, these procedures work sufficiently thanks to 5
Wikipedia’s different language editions have developed slightly different conventions. Here, I refer to the English Wikipedia. Pointers to related information in other Wikipedias can be found on the respective pages by following the links to other languages. 6 The English Wikipedia has special building blocks referring to the neutrality of a section of an article, its introduction, or its title. Still, there is no explicit relation from the discussion post to the disputed part of a page.
120
4 SWiM – An Integrated Collaboration Environment
the large user base; indeed, the quality of articles has been found to strongly correlate with the number of authors [Brä05]. In the Foucault@Wiki study, Pentzold and Seidenglanz analysed Wikipedia discourses: how edit summaries and discussion posts related to changes made to Wikipedia articles, and what types of changes occurred. From this, they derived a model for Wikipedia argumentations – in fact an informal site-specific argumentation ontology. This analysis had to be made without machine support, as Wikipedia articles are largely unstructured, and discussions and edit summaries are given in natural language, and the space of possible arguments is unrestricted, as opposed to the finite set of types in a formal argumentation ontology. Note that the goal was not to design software support for discourses or for improving knowledge items, i. e. Wikipedia articles. I consider the latter hard to achieve, as Wikipedia articles frequently contain multiple sections that do not focus on one knowledge item only, which makes it hard to express what part of an article a discussion post refers to. As set out in section 3.2, I am aiming at knowledge collections that probably have a small user base but are operated by a system capable of certain knowledge management tasks: a system that has a basic understanding of what types of issues are reported with what types of knowledge items, what solutions are proposed, and whether people agree or disagree with these proposals. Semantic wikis provide a suitable foundation for that (cf. section 4.2.2.1). My implementation of discussiong about mathematical knowledge items will be presented in section 4.4.3.
4.2.2 Semantic Wikis
BegOP(19) Wikis have been successfully used for a diverse range of purposes, ranging from personal knowledge management to enterprise-internal collaborative project management to open collection of encyclopedic knowledge. Semantic wikis combine this idea with the Semantic Web vision, in two ways: Semantic Web technologies are used to enhance content presentation, navigation, personalization, social networking, and data exchange in wikis. On the other hand, semantic wikis are used as lightweight collaborative knowledge formalization environments or ontology editors. Semantic wikis have grown up. Foundational research on them is done in high-ranking projects, such as KiWi. Some semantic wikis are sold commercially, such as SMW+, an enterprise distribution of Semantic MediaWiki, or zAgile Wikidsmart. New features are rather prototyped and developed as plugins of established, stable systems, than implemented from scratch; these semantic wiki systems have evolved into operating-system-like platforms for semantic social software. Basic concepts of semantic wikis, e.g. typed links, have been incorporated into classical wikis that had not originally been “semantic”, such as TikiWiki. Key aspects of the “wiki spirit”, such as easy collaboration and linking of knowledge, are more and more being adopted by applications that have not even been “wikis” originally; consider Google Wave. These recent development demand that we continue the process that we have successfully initiated with the 2008 and 2009 workshops – shifting the focus from proofs of concept and hacks to real-world use cases. However, besides evaluations of such use cases, foundational research and technical innovation are still needed, as the large-scale application of semantic wikis has unveiled a number of research questions that the academic community has to answer now in a consolidated effort. EndOP(19) semantic features in mainstream wikis, e.g. phpwiki, tikiwiki 19
Old Part: from workshop proposal draft
121
4 SWiM – An Integrated Collaboration Environment How do semantic wikis improve on the above-mentioned things?
BegOP(20)
In a semantic wiki, the knowledge is more structured than in a non-semantic one. Other than employing a knowledge representation based on semantic web technologies such as RDF [W3Cc] or ontologies (see e. g. [Fen03]), semantic wikis are quite diverse: In some systems, shallowly annotated text prevails, whereas in others, formal knowledge prevails and unstructured text only appears in comments or labels that describe formal concepts [Ore+06; Buf+08]. Even others mix annotated text and highly formalised problem-solving knowledge [BP08]. The most common approach is, however, to represent knowledge about one subject of interest – a “knowledge item”, in the terminology of this article, – by one wiki page and to annotate pages and links between pages with types defined in an ontology. In this kind of semantic wikis it is advisable to keep pages small and refactor them if they tend to describe more than one knowledge item. The graph of typed pages and links is commonly represented in RDF [W3Cc]. Existing ontologies, such as FOAF (cf. section 3.2.2), are either preloaded into the wiki or imported later, or a new, custom ontology is built collaboratively during the annotation of the wiki pages. Semantic MediaWiki, for example, prefers the latter approach, where an ontology is implicitly extended whenever a page or a link is assigned a new type [Völ+06].
EndOP(20)
4.2.2.1 Semantic Discussion Threads with SIOC To the best of my knowledge, there are currently only two semantic wikis where not only the primary knowledge items but also the discussions about them are semantically structured. In IkeWiki [Sch06] and its successor KiWi [Sch+09], the relationship between the knowledge item represented by a page and the discussion about it is represented in the RDF graph, and the associated discussion page itself is not an opaque block of text, but a self-contained discussion forum. It consists of SIOC threads and posts (cf. section 3.2.2), with links to the user profiles of their authors. Making every post a distinct RDF resource and preserving the threaded structure of a discussion serves as a basis for adding an argumentative layer, as shown in section 3.2.2. Semantically structured argumentative discussions have been used at least in two other semantic wikis, but not on discussion pages of knowledge items. Nevertheless, I will shortly review those two related approaches here. Lekapidia was a case study in which the authors of the DILIGENT argumentation ontology (cf. section 3.2.1.2) preloaded a semantic wiki (coefficientMakna) with the argumentation ontology. Then, they replayed the collaborative engineering of a simple dessert recipe ontology, which had earlier been developed using the DILIGENT methodology, in their wiki, and found out that the wiki “significantly reduces the effort to capture the arguments in a structured way” [Tem+07]. Special attention was paid to detecting inconsistent argumentations (e. g. when one user first votes in favour of one argument a but then introduces a new one that contradicts a [Tem+05]) and fostering consensus, both of which I have not investigated in detail. Cicero is a Semantic MediaWiki extension for DILIGENT-like argumentation [Del+08a; Del+08b]. It has also been integrated into the NeOn toolkit [Neo], an integrated development environment for networked (= versioned, modular, and interdependent) ontologies. In contrast to my model of argumentation about mathematical knowledge items (cf. section 3.2.3), Cicero is not made for 20
Old Part: from LWA 08, see what can be reused
122
4 SWiM – An Integrated Collaboration Environment
arguing about knowledge items, but for solving problems in projects in general. One wiki page corresponds to one project, issue, or solution proposal (= idea). Arguments are represented as subsections of a solution proposal page. Cicero offers versatile options for voting and deciding. The ontology is DILIGENT-like but slightly different. It is currently only available inside the wiki, not implemented externally in an ontology language. For the non-argumentative infrastructure, no ontology (such as, e. g., SIOC) is used. Cicero offers versatile options for voting and deciding, as outlined in section 3.2.2.1.
4.3 Architecture How do the levels of mathematical knowledge relate to wikis?
4.3.1 IkeWiki, the Underlying System cite SWiM TR for reasons for preferring IkeWiki
4.3.2 Storage Backend wiki-style granularity, look into OM09 and SemWiki08 papers
3.7.1 4.3.2.1 Document Storage The current implementation of SWiM relies on IkeWiki for storing documents. It does not make a difference whether a wiki page contains fragments from mathematical markup languages or not. SWiM encourages and partly enforces wiki pages to consist of small logical units of knowledge. One reason for that is the same as in most other semantic or non-semantic wikis (cf. section 4.2.2): it is easier to offer usable support for managing small pages – and, if required, enabling hierarchically nested pages via an inclusion mechanism – than to offer reliable and usable support for editing, versioning, linking, browsing, and searching small subsections of large pages. The second reason is the way how semantic structures are made accessible in SWiM by way of extracting RDF from markup (cf. section 4.3.3): Neither in its OpenMath support nor in its OMDoc 1.2/1.3 support does the extraction assume a hierarchical scheme of addressing structures within documents, but only URIs of a document-URI#fragment-ID pattern. Stable hierarchical addressing will be introduced into OMDoc 1.6 by way of the MMT core, which is not yet supported by SWiM. Thus, I rely on any of the major hierarchical units (see below for details) to be physically represented as wiki pages of their own. Whenever a wiki page is saved inside SWiM, or whenever an external document is imported into SWiM (from a file, or from a connected Subversion repository; see section 4.3.2.3), it is split into fragments of the required granularity. I will now describe how OMDoc documents and OpenMath content dictionaries (CDs) are split (and reassembled on export), using the terminology introduced in section 3.7.2.2. In OMDoc, theories and statements (both formal and informal, i. e. omtext, too) are considered knowledge units suitable for wiki pages; exactly for such XML
123
4 SWiM – An Integrated Collaboration Environment
elements e, I define sOMDoc (e) to be satisfied. A document inclusion iOMDoc (D) can natively be represented in OMDoc as .7 On export, complete documents, theories, and nonconstitutive statements are considered viable in file-based documents; exactly for such XML elements n, I define eOMDoc (n) to be satisfied. In OpenMath CDs, symbol definition blocks (CDDefinition) are considered knowledge units suitable for wiki pages (in terms of sOpenMath ), as well as mathematical properties (CMP, FMP)8 and examples (Example). A document inclusion iOpenMath (D) is represented using XInclude as . On export, only whole CDs (having a CD root element) are viable. Signature dictionaries and notation dictionaries in the format of section 2.3.4.2 are treated analogously; the only subunits being the type signature for a single symbol (Signature), or the notation definitions for symbols (notation), respectively. For all markup languages, the identifier (here: URI) of a document Dv split off from D is computed as id(Dv ) ∶= concat(id(D),′ +′ , id(v)). As an identifier id(v) of a node v, I take its fragment ID, which is given by the @xml:id attribute, if it is available. Otherwise, the @name attribute is used, which exists for OMDoc symbols and OpenMath type signatures; similarly, I use the content of a CDDefinition/Name element for OpenMath symbols. Split-off notation definitions are named after the symbol they match, plus a running number, as there can be more than one notation definition per symbol. As an ultimate fallback, I use the XML element name of v, plus a running number, e. g. FMP2 if v is the second child element of its parent and its element name is FMP. 4.3.2.2 Other (Notations, RDF) my mail on notation management in SWiM vs. TNTBase/JOMDoc relation to queries (SemWiki08)
Whenever the notation of a symbol σ has changed, all the presentation markup generated from formulæ containing σ has to be invalidated and re-rendered upon the next request. This addresses the use case outlined initially. A notation definition for a symbol maps a pattern of content markup (a prototype) to a fragment of presentation markup (a rendering). For example, a √ notation definition for the root operator could look like @(arith1#root, arg, n) ⊢ n arg9 . From this, the RDF triple omo:rendersSymbol would be extracted. Whenever a wiki page containing notation definitions is saved or imported, the notation definitions are put into a cache read by the mmlproc renderer. If a notation definition has been added, deleted, or changed, the affected documents have to be re-rendered. In order to do this properly, SWiM has to (1) identify changes to notation definitions, and (2) identify documents affected by a change. (1) is done by computing an XML diff between the cached and the newly inserted version of a notation definition. (2) is done by querying the RDF graph for all FMPs and examples using the symbol rendered by the respec7 Here, I have neglected the case when authors want such inclusions to be preserved even on export. This is, however, possible by choosing a custom reference type different from “include”. 8 In OpenMath 2 CDs, CMPs and FMPs are off, whereas in OpenMath 3 CDs, the proposed property container is supported instead, which combines a semantically equivalent pair of one CMP and one FMP. 21 Old Part: integrate 9 In this abstract syntax, @ means an application of a symbol to arguments, and underlined variables are placeholders for subtrees that are rendered recursively [KMR08]. Actually, all this is encoded in XML.
124
BegOP(21)
4 SWiM – An Integrated Collaboration Environment
fmp fmp fmp rendersSymbol
notDef
usesSymbol
contains
sym
symDef symDef symDef
usesSymbol
ex ex ex
contains
cd cd cd
contains
Figure 4.1: Finding pages (depicted as stacks of nodes) affected by changes to a notation definition. Both sym and the symDef s are instances of the class SymbolDefinition. tive notation definition, as shown in fig. 4.1 and technically explained in [Lan08a]10 . Not only the wiki pages holding these FMPs and examples have to be re-rendered, but also those pages (symbol definitions and CDs) that directly or indirectly include these fragments.
EndOP(21)
4.3.2.3 Subversion Integration Subversion is a client/server version control system [Svn]. It is of particular importance for users of semiformal mathematical documents, as many of them are hosted and maintained in Subversion repositories, such as the OpenMath official and contributed content dictionaries (cf. section 5.1.2), and Michael Kohlhase’s collection of STEXlecture notes and talks. The KWARC research group is working on improved Subversion-compatible systems both on the client (locutor [Loc]) and server side (TNTBase; cf. section 3.7.2.3 and [ZK09; Tnt]). SWiM integrates Subversion on the import/export level. Files originally stored in a Subversion repository are retrieved from there, and committed on every change. The original IkeWiki page database is fully retained, now acting as a working copy for those documents that are originally from a Subversion repository. The working copy is used for all SWiM-internal knowledge management tasks (cf. section 4.3.3). While this solution accommodates for the different granularities of knowledge representations in the internal database vs. the file system (cf. section 4.3.2.1), it does not turn SWiM into a full Subversion client. On every access to a page that is already in the SWiM working copy, SWiM tries to retrieve its latest version from the Subversion repository, but changes made by other Subversion clients are not automatically visible in SWiM; e. g. they do not show up in the revision history of a page.11 Currently, SWiM only implements the bare minimum of Subversion commands that are required for connecting to a repository: update, commit, lock, and unlock. While a document of a working copy is opened for editing, SWiM locks it in the repository. This is contrary to Subversion’s approach of “optimistic locking”, also called “copy-modify-merge” [PCSF08, chapter 1], where users may simultaneously edit files but the user trying to commit a file that has already been changed by somebody else has to resolve any resulting conflicts first.12 File locking does 10
Currently, (1) is implemented. Queries as needed for (2) work, but the IkeWiki system underlying SWiM [Lan08a] still flushes the whole cache of rendered pages when a page is saved. 11 This could be accommodated for by installing a post-commit hook [PCSF08, chapter 5] on the repository that notifies SWiM about any commit of a file to the repository. 12 In well-structured documents, conflicts actually do not frequently occur. When two users edit different sections
125
ref some section, maybe JOBAD
4 SWiM – An Integrated Collaboration Environment
not require a specific user interface, as would be required for conflict resolution.13 All Subversion commands resulting from an access to a document D in SWiM’s working copy are applied to the nearest admissible parent document D p (D), if D does not correspond to a file-level knowledge unit (cf. section 3.7.2.2). The identifier of the subunit of the whole document that has actually changed is preserved in the log message of the commit and will display in the revision log for the Subversion resource as shown in listing 4.1.14 Listing 4.1: Log message for a revision of the description of the oms:transc1#sin symbol r1234 | clange | 2009-05-11 13:06:41 +0200 (Mon, 11 May 2009) | 2 lines [Administrator@SWiM] replaced metadata field dc:description Actually changed fragment cd:transc1+sin
text vs. XML-based diff/patch/merge
Subversion repository access is configured per namespace.15 Each namespace in the wiki can be defined as a Subversion working copy. In the easiest case, a qualified name nsprefix:localname of a wiki page is mapped to the Subversion resource concat(nsuri, localname), where nsprefix and nsuri are the prefix and URI of the namespace, respectively. More complex mappings are possible. In the OpenMath wiki, the OpenMath 3 content dictionaries are available in a namespace having the prefix cd and the URI http://www.openmath.org/cd/, but the wiki page cd:arith1 is mapped to the Subversion resource https://svn.openmath.org/OpenMath3/cd/MathML/arith1.ocd. SWiM and Subversion do not currently use unified user accounts. For each combination from Users × Namespaces inside SWiM, a Subversion username and password can be configured. One can set up an 1:1 mapping of SWiM to Subversion users, but it is less effort to only maintain one Subversion account for each group of SWiM users who should have the same permissions in the repository. SWiM enables identification of the wiki user who initiated a commit by including its name in every log message (cf. listing 4.1). metadata handling Future: TNTBase on database level
4.3.3 Structure Extraction Krextor integration into SWiM (cf. section 3.7.3)
of a file, Subversion can merge the two changes. Conflicts only occur when two users commit conflicting changes to the same line. 13 Many wikis feature conflict resolution user interfaces, but IkeWiki does not. 14 One shortcoming of the form-based metadata editor is that the user cannot freely choose an editing summary; therefore the log is not more detailed. In the structural editor, the author can give a custom summary, which would then be used in the first line of the log message. 15 This has to be configured manually in the database; see http://mathweb.org/wiki/SWiM/Subversion_client for full details.
126
4 SWiM – An Integrated Collaboration Environment
4.4 User Interface 4.4.1 Browser
credit to
SWiM exploits typed links on the level of whole pages, displaying them in a navigation box, grouped by type – as shown on the right for a symbol definition.
IkeWiki
open subsection as a page
4.4.2 Editor Main features: • formulae (Sentido) The same
• higher-level markup (TinyMCE with annotation toolbar)
holds for
• notation editing workflow [Lan08a]
the revision
• metadata: for example, the names of the author and the contributors of a document are known from the user profiles of these persons and only inserted into the metadata record of a document when it is exported from the wiki to a file.
history. (not yet implemented)
4.4.3 Argumentation Based on the semantic structure of the discussions provided by IkeWiki (cf. section 4.2.2.1), I have implemented support for argumentation according to the model introduced in sections 3.2.2 and 3.2.3, plus a very simple semi-automatic assistance with implementing a few common types of solutions, according to section 3.2.4. I have extended the user interface for discussions by the possibility to make not just untyped comments but to post arguments of specific types, including domain-specific ones, as shown in figure 4.3. For every possible type of reply to a discussion post, there is a dedicated reply button. Some buttons open menus, from which the type of relationship of the reply can be chosen. For example, for a position, one can choose whether it should agree or disagree with the current post, or be neutral. Similarly, for an argument, one can choose whether it should support or challenge the current post, and what specific type it should have. However, the OpenMath case study has shown that our argumentation ontology does not yet cover all types of statements that occur in practice, or that users are not always sure how to classify their statements, as explained in section 5.1.4.1. Therefore, I have more recently added a button for posting “untyped” replies16 . This obviously prevents further automated assistance. On the top level of a discussion page, the user is invited to post issues of types that are applicable to the type t k of the knowledge item to be discussed; a reply to an issue tis can have one of the idea types that are applicable to (t k , tis ). Thanks to IkeWiki’s built-in RDFS and OWL ontology editor, privileged members of the community can even dynamically and interactively adapt the argumentation ontology to the community’s needs. The formulæ for determining unsolved 16
not yet shown in the screenshots in this section, but see figure 5.2
127
maybe also survey replies say sth. about MOLE/OMDoc editor future
4 SWiM – An Integrated Collaboration Environment
Figure 4.2: Navigation links for an OpenMath symbol definition
128
4 SWiM – An Integrated Collaboration Environment
Figure 4.3: A complete discourse (mind the chronological order when reading!)
129
4 SWiM – An Integrated Collaboration Environment
Figure 4.4: Warning about an issue and the offer to solve it legitimate issues and “winning” ideas are implemented as sequences of SPARQL [PS08] queries against the RDF graph of the discussion about the knowledge item currently viewed. The assistance with implementing a solution is hard-coded into SWiM for now. Some functions, such as deleting an article, had been available before, whereas I have implemented the creation of a related knowledge item (e. g. an example) as a first proof of concept for additional assistance. To demonstrate the system, I consider the situation that there is a definition, of which it is not clear, whether it is useful. The user Alice wants to report that issue. She opens the discussion page for the definition, posts a new issue, and thus starts the discussion thread depicted in figure 4.3. As a type of the issue, she can select any type that is applicable to definitions. Afterwards, she realizes that her statement might not have been entirely clear, and appends an elaboration. Bob does not agree that there is actually a problem with the definition and voices his position. Cecil has the idea that the problem could be solved by giving an example; he contributes it by clicking the “Idea” reply button in the issue post and selecting an idea type that is applicable to definitions whose utility is unclear. Dan argues from his previous experience that examples are useful. Now assume that Alice replies to the idea with another agreement and that, after that, Eric, the administrator of the knowledge base visits the theorem: By then SWiM will have identified the idea to provide an example for the definition as the best one to resolve the issue and display a message that proposes this, offering a link to start a semi-automatic assistant (cf. figure 4.4; here shown for a similar case). If Eric decides to provide the example and clicks on the link, a new example page, pointing to the original theorem, would be created, and he can fill out the template (cf. figure 4.5). SWiM is not yet capable of closing a discussion thread by posting an auto-generated decision statement; therefore, that was done manually here. The OWL-DL implementations of the SIOC argumentation module and the math-specific extensions are preloaded into SWiM; discussion posts are represented in RDF as instances of these types. Thus, the structure of the argumentation forms an overlay network on top of the raw structure of the threads represented in SIOC Core. This is shown in figure 4.6; an example of how to query this graph is given in listing 5.1.
130
4 SWiM – An Integrated Collaboration Environment
Figure 4.5: Editing the newly created example
hasDiscussion (IkeWiki ontology)
forum1
has_reply
` definition exemplifies
post1: Issue (UnclearWh.Useful) elaborates_on
example
post2: Elaboration agrees_with
has_container
post3: Position
resolvesInto
proposes_ solution_for
post4: Idea (ProvideExample) supports
post5: Evaluation agrees_with
decides
knowledge items (OMDoc ontology) on wiki pages
post6: Position post7: Decision physical structure (SIOC Core)
discussion page
supported_by argumentative structure (SIOC Arg.)
Figure 4.6: RDF graph of the sample discussion (cf. figure 4.3)
131
4 SWiM – An Integrated Collaboration Environment
Acknowledgments The overview of the state of the art in wikis, i. e. the introductory part of section 4.2 is based on my master’s thesis [Lan06a] but has been updated to reflect the developments since 2006.
132
5 Case Studies 5.1 OpenMath Content Dictionary Wiki SWiM has been used as a frontend for browsing and maintaining the OpenMath 2 and 3 content dictionaries (CDs) – figure 5.1 depicts a CD in SWiM. While every OpenMath user is free to define his own CDs for his purposes (cf. section 2.3.2 for an introduction to OpenMath and CDs), the OpenMath Society maintain a collection of official CDs [DL08] that have undergone a review process [Bus+04, section 4.5]. Still, the content of an official CD is not fixed: It might still contain mistakes that have slipped through the review, or there might be ways to improve the informal descriptions of symbols, or relevant mathematical properties and examples to add. The main motivation for deploying SWiM at http://wiki.openmath.org in September 2008 was that there was no tool support for common CD maintenance tasks. I will first describe the setup of the wiki, then give an introduction to CD maintenance, then show how SWiM supports common maintenance tasks, and then evaluate the case study. The evaluation is based on analyzing the data available in the wiki, and on interviewing the users who have worked with the system.
5.1.1 Setup
specific sub-
SWiM was installed following the usual procedure (cf. section 4.3.1) and then preloaded with the OpenMath ontology (cf. section 2.4.1.1). Mappings from namespaces to Subversion resources were set up for the four document collections listed in table 5.1 (see section 4.3.2.3 for SWiM’s Subversion integration). User roles (cf. section 4.3.1) were set up for visitors (allowed to comment on everything), CD editors (allowed to edit the CDs), and administrators (additionally allowed to edit special pages like the entry page). CD editing permissions were granted to seven people who had expressed interest in that; these were largely the same as those who have also regularly edited the OpenMath CDs in the original Subversion repository.
5.1.2 Traditional Ways of Working on CDs One CD is essentially a file – containing several metadata fields on top, and then one CDDefinition block per symbol. The official CDs and other contributed ones are maintained in a Subversion repository at https://svn.openmath.org. Developers participating in their maintenance check out a working copy of that repository, edit the CD files locally with a text or XML editor, and then commit their changes. The RIACA group and Jónathan Heras Vicente have independently developed two standalone CD editors [Ria; HV], the only ones besides SWiM that I am aware of. The RIACA CD editor mainly focuses on generating Java code for programs dealing with OpenMath objects from CDs than on CD maintenance, and its development seems to have been discontinued for at least three years. Heras Vicente’s editor is a general purpose one for CDs
133
section specific subsection
5 Case Studies
Figure 5.1: The arith1 OpenMath CD in SWiM
Prefix cd
Subversion URL
https://svn.openmath.org/OpenMath3/↩ cd/MathML/*.ocd
ntn
https://svn.openmath.org/OpenMath3/↩
cd2
cd/MathML/*.ntn https://svn.openmath.org/www/↩ cdfiles2/cd/
cd2↩ contrib
https://svn.openmath.org/www/↩ cdfiles2/contrib/cd/
Description OpenMath/MathML 3 draft CDs OpenMath 3 notation definitions (inofficial) official OpenMath 2 CDs contributed OpenMath 2 CDs
Table 5.1: Subversion resources accessible in the OpenMath wiki
134
5 Case Studies
and type signatures. It was first released in 2008 and is still in use – but only by a few users – and under development. Issues with the CDs are usually being discussed on the OpenMath mailing list (om@openmath. org) in case of fixing bugs in existing CDs, or on the OpenMath 3 mailing list (
[email protected]) in case of the overhaul of the CDs and alignment with the Content MathML specification for the upcoming OpenMath 3 [DK09]. As an alternative for OpenMath 3, there is an installation of the Trac issue tracking system (cf. [Traa]) at https://trac.mathweb.org/OM3. So far, however, it has not been used for discussing concrete issues with CDs, but rather for more general issues with the design and the specification of the OpenMath object and CD language. For presenting a CD to human readers, it is usually transformed to the desired output format (most commonly XHTML) using XSLT, and the OpenMath objects occurring inside the FMPs and examples are rendered following one of the approaches mentioned in section 3.1.1, often relying on some variant of notation definitions (cf. section 2.2.5). This presentation process is usually controlled by makefiles. The OpenMath wiki case study had a special focus on three common use cases. I will first introduce the traditional way of handling these cases, to pave the way for showing how they are handled in the OpenMath wiki. 5.1.2.1 Minor Edits Fixing minor mistakes does not change the semantics of a symbol. Consider correcting a spelling mistake in a description, or renaming a bound variable in a mathematical object that does not occur as a free variable in a subexpression. Supported by a text or XML editor only, which is not aware of the particular features of OpenMath CDs, such a fix would be done as follows (assuming that the mistake is in a CD from openmath.org): 1. Update the working copy of the OpenMath CDs, 2. open the CD file in question, 3. navigate to the Description child of the symbol in question, 4. fix the mistake, 5. commit the file (and, ideally: commit that file only, and give a meaningful log message that exactly refers to the symbol where the mistake was fixed). 5.1.2.2 Discussing and Implementing Revisions Major revisions that change the semantics of a symbol have to be discussed among the developers before implementing them. Usually, the discussion starts with pointing out a problem (e. g. an FMP for a concrete symbol is wrong or misleading). Let us assume that the developer who identified the problem does not know how to solve it. Then, he would have to make others aware of the problem, e. g. by an e-mail to the OpenMath mailing list. Pasting a link to the Subversion
135
5 Case Studies
URL of the CD in question into that e-mail helps others to inspect the problematic part.1 Other developers would then reply to this e-mail and propose solutions, and again by replying to their mails, the solutions would be discussed, until the community agrees on one to be implemented. 5.1.2.3 Editing and Verifying Notations Suppose a wrong notation has been defined for some symbol. From spotting a mis-rendered occurrence of that symbol in some formula, the traditional workflow of fixing the notation definition would roughly be as follows: 1. Identify what symbol (CD, Name) it is, 2. find the file where the notation of the symbol is defined, 3. try to fix the notation definition, 4. regenerate the document in which the rendered symbol was spotted originally (and, ideally: regenerate the rendered presentations of all CDs where the symbol occurs), 5. open the regenerated document and check whether the symbol is rendered correctly (if not, repeat from step 2), 6. commit the notation dictionary or XSLT file, giving a meaningful log message.
5.1.3 How SWiM Supports the CD Maintenance Use Cases Having introduced three central use cases in the previous section, I will now describe the special support SWiM offers for them. 5.1.3.1 Minor Edits The three main types of knowledge in OpenMath CDs are: the structural outline of a CD (e. g. the symbols that a CD defines, and their mathematical properties), metadata (of such structural units, e. g. their informal descriptions or the date of revision), and OpenMath objects (inside FMPs and examples). SWiM offers dedicated editors for each of them (cf. section 3.3). It was a requirement for SWiM to allow for revisions in a context as local as possible – i. e. committing a “fixed description” to the CD repository instead of committing a “new revision of a CD with ‘something’ changed”. SWiM acts as a browser and editor on top of the OpenMath Subversion repository but adopts a finer granularity. In SWiM, CDs are split into smaller logical units that are semantically subject to a revision (cf. sections 3.7.2.2 and 4.3.2.1). Of the wiki pages on CD and symbol definition levels, only the structural outline is editable, which keeps the content of the page editor small and maintainable; the split-off subparts are editable separately and only represented as XInclude links [MOV06] in the editing view. Nevertheless, a complete CD can be viewed at once, as explained in section 3.1.1. Metadata fields are either editable within the structural outline editor, or in a separate form-based view. Much attention was paid to avoiding
136
align terminology with section 2.2 why? give bad real examples
ref or refactor
5 Case Studies
Figure 5.2: Part of a discussion page from the OpenMath wiki. Notice the post types and the specialised reply buttons.
any disruption of the file granularity of CDs in the Subversion repository, which are still editable by legacy tools.2 A Subversion log message for an edit made in SWiM is shown in listing 4.1. The naming of CDs and parts thereof that is used in SWiM varies from OpenMath conventions and instead reflects the design of SWiM (by using nsprefix:localname document names and concatenating the names of subunits using “+”) but is still close enough to OpenMath to be recognizable. 5.1.3.2 Discussing and Implementing Revisions For each page (i. e. for each CD, symbol = CDDefinition, mathematical property, and example), SWiM offers a discussion page. That allows for discussions in the same granularity as the units of mathematical knowledge have. The problems of conventional wiki discussion pages have been discussed in section 4.2.1. IkeWiki improves on that by giving the discussion threads a semantic structure (cf. section 4.2.2.1). SWiM additionally allows users to indicate the argumentative type of their discussion posts (cf. section 3.2 for the argumentation ontology, and section 4.4.3 for its integration into SWiM). The domain-specific extension of the argumentation ontology, as described in section 3.2.3, has not yet been deployed in the OpenMath wiki. SWiM represents both the semantic structure of discussion threads and of CDs as an RDF graph in terms of an ontology (cf. section 2.4.1.1). For OpenMath CDs, part–whole links, as identified during the splitting of CDs described in section 4.3.2.1, links from symbol occurrences in mathematical objects to the place where they have been defined, as well as metadata are represented as RDF. This whole graph can be queried. On the entry page of the OpenMath wiki, this 1
Trac features a more immediate and comprehensive integration of a trouble ticket system with a Subversion repository (cf. section 3.2.1.1 and [Trab]), but that is not currently possible for OpenMath, as the Trac and the Subversion repository are running on different servers. 2 As we will see in section SOMEWHERE, SWiM does have, and will always have, certain technical but also conceptual limitations, be they bugs or deliberate design choices, that disqualify it as a one-size-fits-all CD editor.
137
ref
5 Case Studies
is done in order to draw attention to unresolved issues by the SPARQL query shown in listing 5.1: Listing 5.1: SPARQL query for unresolved issues in the OpenMath wiki SELECT DISTINCT ?P WHERE { ?P ikewiki:hasDiscussion ?D . ?C a sioc_arg:Issue; sioc:has_container ?D . OPTIONAL { ?Dec sioc_arg:decides ?C . } FILTER (!bound(?Dec)) }
P is a variable for a wiki page, which could be further restricted by its type in terms of the OpenMath ontology, e. g. one could restrict the query to symbols (CDDefinition). This query returns all pages P having a discussion forum D containing a comment C of type Issue on which no decision has been made so far. Such queries can be entered anywhere by an experienced user and result in a list of links to wiki pages. 5.1.3.3 Editing and Verifying Notations In section 3.1.1, I described how SWiM renders documents, using notation definitions for symbols. The notation definitions are browsable and editable in the wiki. The workflow of editing and verifying them, as outlined in section 5.1.2.3, is facilitated as follows: 1. A developer can directly follow the link from the occurrence of a rendered symbol to its CDDefinition (cf. section 3.1.1), and from there its notation definition is only one more click away via the “references” navigation toolbar. 2. The XHTML +MathML output of rendering a wiki page (= a CD or a fragment thereof) is cached, but after changing a notation definition of a symbol, the rendered output for all pages P containing a formula in which the symbol occurs is removed from the cache, forcing its re-generation. Note that the set P contains not only the FMP or example that immediately holds the OpenMath object using the symbol, but also the enclosing CDDefinition and CD. The set P is obtained by another SPARQL query on the database.
ref notation editing section
give exact ref ref partly move to editing section
5.1.4 Evaluation 5.1.4.1 Quantitative Analysis of the Argumentation Support I have verified the principal utility of the basic argumentation ontology (without the domainspecific extensions yet) for OpenMath by importing an old corpus of e-mail conversations from an early phase of aligning the OpenMath/MathML 3 CDs by Chris Rowley, David Carlisle, Michael Kohlhase, an others, into the wiki, following the discussion structure. Further discussion posts have been contributed by OpenMath developers afterwards. Overall, this resulted in 90 discussion posts by May 11, 2009. A breakdown of this figure can be evaluated by post type and by post granularity: by type: 69 posts fit into one of the types from the argumentation ontology, mainly Issue (48) and Idea (10). Only counting the 23 posts contributed by the users themselves (who were
138
maybe elaborate
5 Case Studies
obviously less familiar with the background of the argumentation ontology), the result is slightly less convincing; for 9 of them the users were not sure how to classify them. The post type that was missing in most cases was nothing argumentative at all, but the question – either a direct question about some concept from a CD, or a follow-up question on an argumentative post, such as “what do you mean by this issue description?”. It will be easy to solve that problem by adding such a post type. Some other posts could not be uniquely classified because they both raised an issue and proposed a solution (= idea) in the same sentence. Annotating different argumentative types not at the level of posts but within posts is highly non-trivial, both concerning conceptual modelling and user interface design, though, as discussed in [Lan+08b]. by granularity: 36 posts (but only posts taken from the e-mail corpus) had individual symbols as their subject; the remaining 54 posts (including all of the posts made by users) were made on CD-level discussion pages. This shows that either the users did not find it intuitive (or not necessary) to access subparts of a CD when they saw a complete CD in the browser, or that it was not possible to identify individual symbols a post referred to. The latter is the case for certain posts that argue on design issues of a CD in general, sometimes naming certain individual symbols as examples. A few other posts from the e-mail corpus referred to two closely related symbols; I filed copies of them with both affected symbols. Overall, this shows that the OpenMath CD editors have understood how to make use of this way of discussing problems, which is more exact than writing an e-mail or opening a Trac ticket. adapt/revise model accordingly
5.1.4.2 User Survey summarize results of qualitative evaluation of the 3 use cases by questionnaire (some 15 participants) gave useful feedback, but only one feature (= argumentation) had really been used – made tests with test persons to get more focused feedback on the three use cases (following section)
5.1.4.3 Personal Experiments 3 use cases in personal experiments; replay them and think aloud. 7 persons done so far (coincidentally disjoint with survey participants, all familiar with one or more of OMDoc, OpenMath, SWiM or SemWikis in general), more to come procedure for each use case: 1. let them read a one-page descriptions of the feature 2. gave additional explanations (e. g. about the OpenMath setting) as appropriate, when people were not familiar with it 3. let them do the task (edit description of a symbol, change rendering of an operator, discuss about something); let them explore, but gave hints if they didn’t know what to do 4. thinking aloud; feedback collected: what am I trying to do, how am I trying to accomplish it, what do I think about the interface I’m using, how would I expect it to be
139
incorporate
5 Case Studies
Figure 5.3: . . . most of the feedback was about the user interface, taking the knowledge model and the task to be performed for granted but (suggestion by Andrea) certain feedback also reveals insights about the knowledge model and the reasonability of the tasks
5.2 Semantic Web Ontology Engineering the content of this section is complete, but the integration into the thesis still has to be done summarize Siarhei’s RDF →OMDoc translation; use this to translate FOAF
An example for the latter can be seen in the FOAF ontology (cf. section 3.2.2 and [BM07]). FOAF is a semantic web ontology implemented in OWL, but nevertheless tries to capture concepts that exceed DL. Usually, a foaf:Group has members of type foaf:Agent, where an agent can be a group, a person, or an organization. The foaf:membershipClass property can be used to be more specific about the type of the members of a group by linking an instance of foaf:Group to an RDFS or OWL Class. One can, e. g., require that all members of the KWARC research group in Bremen be computer scientists. Then, if we state that Michael is a member of KWARC, we would like a reasoner to infer that he is a computer scientist, or, vice versa, to complain, if he is classified as a type of person that is not consistent with being one. This combination of ABox and TBox (instance- and terminology-level) reasoning is not supported by OWL reasoners, though.
140
5 Case Studies
Therefore, foaf:membershipClass is not formally described in the OWL-DL implementation of FOAF, but an informal text in the specification explains how application developers can implement hand-crafted support for the missing inference step.3 Such informal descriptions are often ambiguous4 and have to be turned into algorithms manually. We evaluated our approach on a reimplementation of FOAF in OMDoc. From studying the OWL implementation and the specification of FOAF, we noticed the following problems, which we were able to solve using OMDoc: 1. FOAF references entities from other ontologies (DC, WordNet, Geo Positioning, etc.), but it does not import them. With OMDoc tools (as described in [RK08]), we can identify imports missing in an OMDoc ontology, and our OMDoc→OWL translation (Sect. 2.4.3.5) adds them to the OWL ontology resulting from the translation. 2. The source code contains notes for developers as XML comments. In the OMDoc version of FOAF, we were instead able to create informal text sections for them. Other XML comments divide the ontology into sections like “naming properties”. In OMDoc, we were able to model document sections without disrupting the logical structure of the ontology. 3. Some of these comments were attached to individual triples, e. g. foaf:mbox_sha1sum rdf:type owl:DatatypeProperty. Thanks to literate programming in OMDoc, we could precisely add them as informal comments (CMPs) to the respective OMDoc statements. 4. The following properties are inverses of each other: foaf:maker = foaf:made− , foaf:depiction = foaf:depiction− , foaf:topic = foaf:page− , and foaf:primaryTopic = foaf:isPrimaryTopicOf − . While for each p = q− , an OWL reasoner can infer q = p− , using its built-in axioms for DL reasoning, FOAF redundantly declares each inverse relationship for both participating properties for the purpose of documentation. OMDoc allows for making the difference explicit: For any of the above p, q property pairs, we picked one p and stated p = q− as an axiom, but q = p− as an assertion that can (provably) be derived from the axiom and the semantics of owl:inverseOf 5 , as shown in Fig. 5.3. Domain and range of inverse properties can be handled similarly. 5. We were able to express the non-OWL semantics of foaf:membershipClass. We chose the first-order-logic representation shown in Fig. 5.3. 6. The correspondence of foaf:maker to dc:creator is only defined in prose. The specification suggests using foaf:maker whenever the agent who created something is known by URI, and to use the less semantic dc:creator, which neither has range nor domain declared, when the creator is only known by a string. Then, it also informally states a rule that the foaf:name or rdfs:label of the foaf:maker of something is the same as the dc:creator of that thing. The rule can be captured by a first-order-logic expression in OMDoc, or alternatively by an OWL 2 Note that a way has been found to replace foaf:membershipClass by a semantically equivalent OWL-DLcompatible construct using property restrictions [Alf07]. Nevertheless, we keep this as an example as it is easy to understand, the proposed solution has not yet been officially implemented, and is less intuitive for non-experts. 4 as can be seen in the mail thread following [Alf07] 5 A proof is only required in OMDoc if one wants to do automated theorem proving. 3
141
move this example here
5 Case Studies
property chain inclusion [CG+08]. The notion that foaf:maker is similar to dc:creator but has a stronger semantics can be captured by having the FOAF theory import the DC theory and defining a view on DC, namely a morphism that maps dc:creator to foaf:maker. Views frequently occur in mathematics. We can, for example, model the theory of integers by a view {○ ↦ +, e ↦ 0} on the theory of monoids, where ○ is the binary operation and e the unit element of the monoid. 7. Finally, we were able to include the informal sections and descriptions of the FOAF specification [BM07] right into the ontology document. This allows for a unified management of the formal specification and its informal explanation, including the introductory chapters and the change log, in a single, coherent document, of which both OWL and XHTML can be generated. The original FOAF specification is generated from the OWL ontology and a set of HTML snippet files with detailed informal descriptions as input using a script, a FOAF-independent version of which is also available [FBS]. This enhanced expressivity of the OMDoc implementation comes at the expense of much more verbosity. While in RDF one can easily attach another axiom to a class (stating, e. g., a subclass relationship or disjointness), most of these triples have to be represented as a individual axiom in OMDoc, unless there is an intuitive way of capturing their semantics as types. While better annotation tools could help (cf. Sect. ??), there is also a mathematical approach to improving this6 : One could add additional axioms to the OMDoc theory for OWL, which introduce operators for shorthand notations (such as pairwise disjointness of a whole set of classes) that imply multiple atomic statements7 – but then all these axioms would have to be applied before generating OWL from OMDoc. This can be done by supporting λ-calculus at the meta level and β-reducing all OMDoc axioms before generating OWL. Either describe how an integrated ontology engineering workflow would work in SWiM, or actually get it implemented. In the latter case, consider engineering SWiM’s argumentation ontology (system ontology!) in SWiM itself to make it self-contained.
fix
5.3 Metadata Two aspects of the new OMDoc metadata framework presented in section 2.4.4.3 have been evaluated so far: its compatibility to RDFa maybe drop this completely; still too early for research-oriented evaluation (SAMSDocs)
5.3.1 RDFa Compatibility BegOP(22) 6 By technical coincidence, another way is possible: writing OWL statements as metadata using the new RDFa syntax. We discourage this, however, as abuses metadata (= data about data) for information that should actually be part of the proper data, and as no other OMDoc-aware tool except our own OMDoc→OWL translator would be able to interpret the semantics of that syntax. 7 A shorthand syntax for pairwise disjointness has also been introduced in OWL 2 [MPSP09], but an ontology engineer usually does not enjoy the freedom of defining additional such shorthands. 22 Old Part: update or discard
142
5 Case Studies
Meanwhile, the main challenge was getting the RDFa extraction right. I first implemented XHTML +RDFa support and then generalized that, so I could evaluate our implementation against the W3C RDFa test suite [HY07]. Currently, it passes 90 out of 100 test cases. Some of the cases where it fails do not apply to OMDoc, however: A design goal for RDFa in XHTML had been to avoid any disruption of XHTML structures by annotations – which is not a design goal for OMDoc, where RDFa annotations can only occur in designated metadata blocks, and where I chose to introduce a dedicated element (resource) for creating bnodes. The remaining cases have been documented in the issue tracker on the Krextor homepage [Lan+] and will be resolved soon. EndOP(22)
5.4 Miscellaneous
properly integrate
5.4.1 JOBAD very preliminary results: feedback from a few persons
BegOP(23)
Proof-of-concept demonstrations of individual JOBAD services can be tried at the JOBAD web site [Job]. Besides that, we conducted two evaluations to analyze the feasibility and scalability of our framework: 1. We loaded a large OMDoc document into our server and activated the elision, subterm folding, and definition lookup services. 2. We integrated an external unit conversion service, which was added after the main phase of the development, to get an understanding of the investment needed to integrate further services. The former involved the complete lecture notes of a first-year undergraduate computer science course. These lecture notes are originally maintained in LATEX with semantic annotations, which can be automatically converted to OMDoc [Koh08d]. The annotations in the source documents comprise content-markup formulae, informal definitions of symbols, and notation definitions. The OMDoc representation is then rendered into the JOBAD format, which is viewed using the JOBAD client, which offers flexible bracket elision, subterm folding, and definition lookup [Job]. So far, we have used this as a stress test, but for the Fall lecture we plan an evaluation where one group of our students will work with the static XHTML version of the lecture notes and a second group with the JOBAD-enriched active document. The most complicated step in the latter evaluation was adapting the string-oriented interface of Stratford’s unit conversion web service to our OpenMath interface. Most of the other required functionality turned out to be already available and just had to be composed. We chose the context menu interface and added a submenu containing the target units.8 Checking whether the selected term was a quantity with a unit reduced to looking up its corresponding content markup (cf. Sect. ??) and performing a simple XPath node test on the latter. Sending a string to the web service and waiting for the response is a standard JavaScript function. Rendering the result of conversion (after converting it back to OpenMath) is done by another service. Finally, replacing two XML subtrees in a formula (both in the presentation and the content markup) and hiding the 23
Old Part: integrate, maybe split: architectural evaluation, teaching “case study” A static list at the moment; obtaining admissible target units for a given input is neither supported by our clientside implementation nor by the unit conversion web service at the moment. 8
143
5 Case Studies
previous presentation tree in an maction, is a utility function provided by the JOBAD core and also used by other services.
EndOP(23)
5.4.2 Argumentation preliminary coverage evaluation; no user evaluation done
As this work is still in progress, I focus my evaluation on discussing the extent of argumentation and assistance that my ontology and the implementation support. In section ??, I will provide an outlook to domain-specific case studies that I am planning to conduct myself. Further feedback is anticipated from users of IkeWiki, where I have recently integrated the basic argumentative discussion functionality without domain-specific extensions. For the sake of simplicity, arguments about issues and ideas have been left out in my first prototype in favour of simple agreement or disagreement, but I am planning to complete the coverage of the DILIGENT argumentation ontology. Following the findings of the DILIGENT authors, I do not, however, consider the currently restricted set of available domain-specific issue and idea types an obstacle; I think it just has to be refined to cover the most common situations in mathematical knowledge management. Concerning their restricted set of argument types (e. g. “challenge” and “justification”), the DILIGENT authors have found out in their case studies that this made discussions more effective and focused [Tem+05]; I assume that the same will turn out for my issue and idea types. Note that the current formula for selecting the best idea is quite simplistic. Once arguments will have been introduced, it will no longer suffice and should be replaced by a more sophisticated weighting function like the one proposed in [GK97]. As issues in my model refer to knowledge items, and the assistance the system can offer depends on the type of knowledge item, the knowledge must already be structured to some extent. In two situations, this is likely not to be the case: When knowledge about some new topic has not even been conceptualised, or when it has been conceptualised and put on a wiki page, but not yet formalised (here: annotated with a type). To improve on this, I consider providing a global discussion space, where conceptualisation issues can be raised, as it was originally intended with DILIGENT (cf. section ??), and to introduce a generic issue type “needs formalisation”, which can be filed with any knowledge item that does not yet have a type from the domain-specific ontology. Assistance in the latter case would likely require techniques like natural language processing (NLP), which I have not yet considered: Once a knowledge item or discussion post has been given a type, I base all further decisions about assistance exclusively on this type. Obviously, this is only meaningful if the informal text of a knowledge item does not contradict its formal type. Currently, I hold the users responsible for that, but NLP would help here, too. Finally, I object that, while the model that I adopted allows for arguing on wicked problems, those issues for which my system is actually able to offer semi-automatic assistance are not really wicked ones by definition. For a solution to be supported by the system, users are, first of all, required to clearly pinpoint the issue using a type from the domain-specific extension of the argumentation ontology – contradicting the traits of a wicked problem to have “no definitive formulation” and to be “essentially unique” in the sense that no pattern of solution could be applied to a whole class of problems [RW73]. For every type of issue, I support a set of predefined solution patterns, but a wicked problem does “not have an enumerable (or an exhaustively describable) set
144
now they are in, but not considered for assistance
5 Case Studies
of potential solutions” [RW73]. In supporting solutions I currently focus on one issue with one knowledge item and neglect the previous history or related issues and knowledge items; however, “every wicked problem can be considered to be a symptom of another problem” [RW73]. Then, my system currently assumes that an issue has definitely been solved once a “decision” reply exists – contradicting the definition that “wicked problems have no stopping rule” [RW73]. However, a user who considers an existing solution inadequate is always free to file a new issue on the affected knowledge item.
BegOP(24)
5.4.3 Related work panta rhei is an interactive and collaborative reader for mathematical documents, currently being evaluated in the educational context of a computer science lecture [Pr ; Mül08a]. With SWiM, it shares the same domain of application – in fact, either system can import OMDoc – and in the possibility to create typed discussion posts about knowledge items; differences lie in the use of knowledge items – arranged for reading in panta rhei, whereas the main concern of SWiM is to edit them – and the different knowledge representation: SWiM relies on an underlying RDF model and an ontology for browsing, searching, and editing workflows, whereas panta rhei uses hand-made SQL database queries. A panta rhei page usually contains one exercise or a sequence of one to a few lecture slides, mostly containing some related mathematical statements, such as the definitions of a few symbols with an example where they can occur, or a theorem followed by its proof. However, content authors can assign identifiers to any subitem of a page to make it annotatable. SWiM is less flexible here: Only knowledge items that have their own wiki page are annotatable. While imported documents are automatically split into pages of statement size, authors would have to do this manually to achieve a finer granularity. In panta rhei, threaded discussion items can be posted on any annotatable knowledge item; in addition, there is a global forum. Each post has to be typed as, e. g., “advice”, “answer”, “comment”, “example”, or “question”9 . The set of possible types of a post is not restricted by the type of the knowledge item nor the post it replies to. Compared to SWiM’s use of an argumentation ontology, which encourages a targeted discussion towards solutions, this potentially makes discussion threads less focused. panta rhei currently uses the types of posts for statistical purposes and for search. Statistics are currently not computed in SWiM; semantic search is possible and powerful thanks to the support for inline SPARQL queries, but not yet friendly to users who do not know SPARQL. Another annotation-related feature of panta rhei without a SWiM counterpart is the ability to rate knowledge items on a scale from 0 to 10 w. r. t. several measures like their difficulty or helpfulness. While the argumentations in SWiM implicitly rate knowledge items, there is no agile, one-click user interface for this.
Acknowledgments OM wiki eval: thanks to participants 24 9
Old Part: integrate somewhere This is a custom vocabulary made for panta rhei, not an argumentation ontology.
145
EndOP(24)
5 Case Studies
section 5.2: joint work with Michael Kohlhase [LK09], thanks to Siarhei Kuryla
146
6 Conclusion • Standards are crucial for interoperability. Theoretically, only the concepts matter, but without standards, a proper implementation becomes impossible (e. g. Dojo/MathML) Cruz’ ontology integration approach [CX05]: As I aim at providing structural services for mathematical knowledge at a low entry barrier for developers, I will bridge the semantic gap by extracting structural outlines from XML documents to RDF in terms of appropriate ontologies. Ontology engineering “methodology” for domains where more-or-less semantic markup languages already exist SWiM is quite fragile. “minor edits” use case depends on a chain with weak links: correct transformation to the editor, correct RDF extraction, otherwise no correct svn export. (Solution: more unit testing?) Permissions not yet handled properly (but will improve with TNTBase anyway): xi:include/refinclude bypasses permission check (but would be possible to implement check there)
6.1 RDF to XML The Krextor framework supports many XML→RDF conversion tasks and can easily be extended by additional input and output formats and integrated into semantic applications. Thereby, I have opened new paths from the XML layer of the Semantic Web architecture to the RDF and higher layers. When designing new ontologies, knowledge engineers can now take the creative challenge of developing a convenient XML syntax for domain-specific knowledge and provide a Krextor extraction module that translates this XML to RDF in terms of these ontologies. I will continue using Krextor for mathematical markup but are also interested in proving its extensibility on other semantic markup languages.
6.2 Metadata Neue Möglichkeiten eröffnet: 1. Es gibt einen neuartigen RDFa-Browser: Dem müsste man eine einzige Spezialität der RDFa-Integration in OMDoc beibringen, und schon könnte man OMDoc damit browsen. 2. Es gibt eine neue Ontologie, z.B. für Zertifizierung oder Lernen. Dann annotieren wir OMDoc (als Autoren) erstmal in strikter RDFa-Syntax mit dieser Ontologie, weil es erstmal nur so geht. Später, wenn sich bestimmte Praktiken der Verwendung dieser Ontologie in OMDoc herauskristallisieren, erweitern wir OMDoc um eine pragmatische Syntax für diese Ontologie.
147
Diesen Spruch vorne nochmal bringen compare SIOC ME
6 Conclusion
6.3 JOBAD We presented JOBAD, our architecture for active mathematical documents. Our documents are generated dynamically from content-markup and viewed in a web browser, via which the reader can change interactively both content and form of the document. JOBAD constitutes the reader interaction component of our research group’s framework for mathematical documents. As such, it is fully integrated into the authoring [Koh08d; Lan08b], notation management [KMR08], and storage [Tnt] work flows developed by our group. We gently extended the Presentation MathML format to create an interface language, in which the document server can embed into the served document information about interactivity or instructions on how to retrieve that information. This extension is backwards compatible in the sense that the markup is still valid MathML, and switching off JavaScript yields the same static documents as before. We have implemented and evaluated an initial set of services that constitute a representative selection of the possibilities we envision. Folding and flexible elision work locally, type and definition lookup retrieve additional information based on a symbol URI, unit conversion sends a content markup object as an argument to a web service. The former service is based on presentation markup generated by the server a priori. For the latter two, the JOBAD modules are passed initialization parameters that instruct them about the server and its URL format. For the latter one, parallel markup is utilized to obtain the content representation of a presentation expression. A specific design feature of JOBAD is its extensibility. Offering new services for documents in the JOBAD interactive document can be achieved by adding very little new JavaScript code. Adding new user interaction components and binding them to JOBAD services is possible with minimal effort. Finally, the JOBAD client code requires only very few properties of the specific server backends, so that the same client can be easily used with different web services even in the same document.
6.4 Argumentation I have justified the need for structured and annotated discussions in a collaborative structured knowledge base. With a domain-specific argumentation ontology, I have covered common types of issues that can occur with mathematical knowledge items, common types of ideas on how to solve these issues, and a semi-automatic assistance in implementing a solution approved by knowledge engineers. The discourse that led to the implementation of a solution can be traced back transparently from the affected knowledge items, which supports the experience management of the community. I have implemented and demonstrated a first proof of concept in the OMDoc-based semantic wiki SWiM but consider the methodology easily transferable to other collaborative knowledge management systems, as it is largely based on an abstract ontology. As our system is built on RDF, semantic web technologies can be used for integration with other systems. For example, the Exhibit framework [HKM07] can consume RDF data and provide us with a timeline visualisation of the argumentation process (fig. 6.1). If we follow the linked data guidelines for semantic web publishing, the knowledge items and discussions in our system will be visible to linked data crawlers such as Sindice [TDO07] and browsers such as Tabula-
148
left out MoC, as we do not yet have it for OMDoc docs
6 Conclusion
Figure 6.1: The Exhibit timeline view of fig. 4.6, starting at a hard theorem and ending at a helpful example. tor [BL+06]. If we can crosslink with other sites, discussions can refer to knowledge items across system boundaries.
6.5 Future Work
order properly
• OpenMath CD review • flyspeck idea
Therefore, SWiM needs
6.5.1 Editing
better sup-
Creating custom CDs and symbol definitions is a common use case OpenMath has been designed for [DL08]: Authors need to do it when the mathematical concepts they are concerned about have not been covered by the existing CDs sufficiently deeply or rigorously [DL08; AEB07]. So far, SWiM assumes one notation definition per symbol. mmlproc supports callbacks to an algorithm that selects the most appropriate out of a set of multiple possible renderings for a symbol [KMR08]. In future, it is planned to provide a user interface inside SWiM that lets the user select his preferred rendering for every symbol. While SWiM supports browsing CDs well, with typed navigational links and symbols in formulæ linked to their definitions, searching formulæ is not yet supported. Search could be provided by the MathWebSearch engine [Koh+08], which would be instructed to crawl SWiM’s database of documents. Editing CDs and their subparts
149
port for creating new CDs
6 Conclusion
currently works best if an existing CD is imported and then split automatically. We are aware of a demand for better refactoring support within SWiM [LHC08b], which remains to be implemented.
discuss whether it was reasonable to go
6.5.2 Knowledge Representation
the RDF path for
A future research direction that I want to explore is adding extraction rules as annotations to XML schema languages like RELAX NG [Rel], thereby unifying two tasks that belong together but have been separated so far: specifying the syntax and the semantics of an XML language.
embedding OWL into OMDoc. Alternative:
6.5.3 Ontology Engineering
functional representation
• Ontology engineering, particularly for modular/heterogeneous/collaborative ontologies, needs better documentation • Particular challenges: – making modularity explicit (cf. OWL imports, SUMO module comments) – formally representing knowledge that is “too expressive” –
mehr?
• OMDoc offers close-to-literate-programming approach to engineering formal knowledge and interlinking it with documentation • employ this for documenting semantic web ontologies • goals: – make ontology [specs] more comprehensible – provide ontology documentation workflow as integrated part of ontology engineering methodologies
keep aligned with MOLE WPs
• roadmap: 1. extend OMDoc to an ontology language: a) shorthand syntaxes for common ontology [documentation] primitives b) document structured formulæ (improve statement level parallel markup, harmonize attributions and metadata on object level) c) literate programming (capture common ontology documentation practices, formulate workflow) d) represent foundational logics in OMDoc (as done for OWL): meta-theory with vocabulary, optionally semantics
150
6 Conclusion
e) ensure compatibility to existing ontology-based tools by translation from OMDoc to the original ontology languages; max out documentation/annotation capabilities of the target language and also somehow preserve any documentation that exceeds that (for round-tripping) 2. tools a) translate existing ontologies to OMDoc (bootstrapping when ontologies existed, or when non-OMDoc tools are easier to use) b) add documentation to existing ontologies (extend Protégé) c) formalize informal ontologies: invasive approach with CPoint, sTeX d) collaboration: extend SWiM into an OMDoc ontology editor e) materialize design discourse: dynamic argumentation → persistent documentation f) automatic identification of ontological patterns in [rhetorically] structured text 3. making documented ontologies comprehensible a) improving rendering output quality: symbol notations, statement-level rendering (e. g. for Manchester syntax or groups of axioms), theory morphisms, literal programming b) adaptive specifications: different notations, user profiles, aggregation c) interactive services: JOBAD with ontology-based reasoning services, interactive editing with consistency checks 4. case studies with practical ontologies – requirements: – “complex” ontology with documentation available – still in development, so we can interview the developers – evaluation metrics for coverage of formal translation and informal documentation; questionnaires for usability of documentation tools and rendered/interactive documents
6.5.4 Scientific Publishing 1. MKM-Proceedings annotieren 2. sTeX light als Format; jeder Autor schreibt selbst 3. Importieren in SWiM 4. Kategorisieren 5. Bewerten (vgl. JEM-Kriterien Deliverable 2.1/2.2) – insgesamt läuft das so wie bei JEM, nur semantischer 6. benutzerdefinierte Proceedings exportieren (PDF)
151
6 Conclusion
7. Integration in easychair: Export eines Papers direkt ins Wiki 8. Copyright-Frage => Wiki-Lizenz?
6.5.5 Argumentation • idea for systematic evaluation: faulty knowledge base, once with assistance, once without. Question: What user group gets more problems solved (or faster, or with less discussion) • implementing more problem-solving assistance: towards better refactoring • corpus analysis (beyond what we’ve done with the OpenMath CD mails) • involving agents (that do case based reasoning and remind users of e. g. common objections) • use same structure for proof explanation BegOP(25) Fraser et al. have developed an argumentation ontology for e-mails [FHT06]. They shallowly annotate on the top level of every e-mail to keep the annotation easy for users. That means, however, that if an e-mail agrees with some statements of another e-mail but disagrees with others, the value of the argumentative annotation is limited. This issue can also be present in our use cases, and that is why we intend to solve it in the near future, by allowing the representation of fine-grained structures within posts. In this paper we presented the first steps that we have made towards creating an Argumentation Module for SIOC. We started with a series of use-cases that have two facts in common: (i) their structure can be represented semantically with SIOC, and (ii) part of the content created by the users has an implicit argumentative structure. Our goals were to externalize these argumentative discussions and make them explicit via models that are machine-understandable. The model that we have proposed is in its initial stage, and thus we are looking forward to improving it based on the community’s feedback. Most of the use cases presented here deal with problem solving, but we believe that another important benefit of making argumentative structures on social media sites explicit will be a precise documentation of discourses that led to earlier decisions. This strengthens the collective memory of a community and will allow new members to retrace and understand the steps of their “ancestors”. For future work, we consider unleashing the potential of SIOC in representing distributed conversation and interlinking argumentations across multiple social media sites. An analysis of the RDF graphs of the argumentations on a single site enables the identification of the merited members of one community, e. g. by counting how many of their ideas have received positive feedback (by Arguments or Positions) and finally got accepted (by Decisions). Then, by making the data of several SIOC-enabled social media sites available to a linked data crawler such as Sindice [TDO07], we can identify traces of the same users in other communities. Such merited 25 26
Old Part: integrate Old Part: integrate
152
EndOP(25) BegOP(26)
6 Conclusion
users could then automatically be promoted to moderators that are allowed to take decisions. Argumentation in distributed blog conversations can also be an interesting topic to explore in this way. A second direction we want to follow is to model and enable the representation of fine-grained structured for argumentation in social media sites. Some of the main challenges here are: the creation of the appropriate underlying structures and their links to the SIOC concepts, proper identification of such structures for building the argumentation model, and how to make users willing to split their discourse and to describe its rhetorical structure, all without disrupting their normal flow of work. In terms of deployment, an interesting direction would be enhancing the existing wiki talk pages (e. g. as used on Wikipedia for discussions and issue solving [LHC08a]) with a structured argumentation module as described here. Benefits of doing so can be a more efficient workflow for improving wiki content. Two practical settings, where I will evaluate my system, are the Flyspeck and OpenMath projects. Flyspeck is a large-scale proof formalisation effort concerned with developing a machine-verifiable representation of a proof of the Kepler sphere packing conjecture [LMR08]. I have gathered first requirements for supporting this project with a wiki and consider support for discussing formalisation issues and for refactoring formal structures highly important. In OpenMath, SWiM will be used for editing content dictionaries (collections of definitions of mathematical symbols). Conceptualisation and formalisation of symbols and their notations need to be supported [Lan08a]. A key question that I anticipate these case studies will answer is to what extent the automated identification of “winning” solutions and the support offered for implementing them will be found satisfactory by knowledge engineers – whether the 80/20 rule will apply (i. e. 80% of the everyday issues will be solvable with semi-automatic assistance implemented for 20% of all possible solutions), or whether a large percentage of issues turns out to be too wicked. In the latter case I still hope that the argumentation ontology will support a more focused and productive discussion about wicked problems than in a system with unstructured discussions and thus facilitate finding and implementing solutions even without further automatic assistance. EndOP(26)
6.5.6 Change Management • simple workflow sketch [Lan07b] • diffing [LK08b]
6.5.7 Social Software CoP: reviewers liked use case in [Lan07b], but the environment for realizing this is still not there
6.5.8 Integration with other Mathematical Systems big picture of KWARC system integration: RDF-centric backend services go to TNTBase, SWiM remains wiki frontend for browsing/editing database
153
6 Conclusion
6.5.8.1 JOBAD Future Future work will be based on this, and we intend to rapidly develop more services, but also to invite contributions from external developers. Due to the modularity of our framework, we expect that this work load can be divided into small and manageable units that can be handled efficiently by students. In particular, we intend to approach the following services: Notation selection: Our rendering service can already annotate every rendered symbol with a reference to the notation definition in the backend that was used for rendering it [Koh+09b]. This information can be used to ask the backend for alternative notations, to allow the user to select from them, and have the current formula re-rendered accordingly. Guided tour (extension of lookup): This service generates a linear tutorial containing an explanation of every symbol in the current selection, and of every symbol occurring in these explanations, and so on, until some foundational theory is reached. Flattening: Many documents consist of components that are combined by a module system (see [RK08]). A flattening service replaces import links with the (possibly translated) copy of the imported document. Search: Our group has developed a semantic search engine for mathematical formulae [Koh+08]. Therefore, a service that searches the web (or the server database) for the selected expression will be easy to realize. Links to web resources: The OpenMath wiki [Lan] not only provides symbol definitions, but also hosts discussions about them. Its architecture allows for linking symbols to further web resources, e. g. Wikipedia articles about mathematical concepts, which can then be made available in a document. Adaptive display of statement-level structures: On the level of definitions, theorems, and proofs, we generate a different kind of parallel markup from OMDoc sources, namely XHTML +RDFa. We have already used this for visualizing rhetorical structures in mathematical documents and plan to extend it to structured proofs. Editing: Our group has developed the Sentido formula editor [LGP08]. An edit service will pass the selected term to a Sentido popup window and eventually replace it in the current document. Saving: After a user has adapted a document, it is desirable to upload its configuration to the database. JOBAD integration into an environment that does not have a notion of user accounts (such as MMT), or that knows user accounts but not user profiles or sessions (such as TNTBase). However, JOBAD will also be integrated into environments that have profiles and sessions; think of SWiM and panta rhei. Furthermore, we will integrate the JOBAD architecture into our various integrated document management systems, such as the semantic wiki SWiM [LGP08], and the panta rhei document browser and community tool [Pr ].
154
A Namespace Prefixes Table A.1 on the following page summarizes the namespace prefixes that are used throughout this thesis for XML elements and URIs in RDF statements.
155
1
Language/Vocabulary ccREL OpenMath CDs DC Elements DCMI Terms FOAF GAMS HELM objects MathML MARC relators Math-Net Properties MONET: OpenMath OMDoc OpenMath OpenMath symbols OMV OMDoc ontology OWL RDF RDFS SIOC SIOC Argumentation SIOC Types XHTML vocabulary XML Schema datatypes2
Table A.1: Common namespace prefix↦URI bindings used in this thesis
http://www.w3.org/2001/XMLSchema#
http://www.w3.org/1999/xhtml/vocab#
http://rdfs.org/sioc/types#
http://rdfs.org/sioc/argument#
http://rdfs.org/sioc/ns#
http://www.w3.org/2000/01/rdf-schema#
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://www.w3.org/2002/07/owl#
http://omdoc.org/ontology#
http://omv.ontoware.org/2005/05/ontology#
http://www.openmath.org/cd/
http://www.openmath.org/OpenMath
http://omdoc.org/ns
http://www.openmath.org/cd#
http://www.iwi-iuk.org/material/RDF/1.1/Schema/Property/mnp#
http://www.loc.gov/loc.terms/relators/
http://www.w3.org/1998/Math/MathML
http://www.cs.unibo.it/~schena/schema-h.rdf#
http://gams.nist.gov#
http://xmlns.com/foaf/0.1/
http://purl.org/dc/terms/
http://purl.org/dc/elements/1.1/
http://www.openmath.org/OpenMathCD
http://creativecommons.org/ns#
URI
3.2.2 3.2.2 3.2.2 2.4.4.3
2.1.3
2.3.1 2.4.4.2, 2.4.4.3 2.4.1.3 2.4.1.1 2.3.3 2.3.2 2.3.2 2.4.4.3 2.4.1.1
Section 2.2.6.3, 2.4.1.3 2.3.2 2.2.6.3, 2.4.1.3 2.2.6.3, 2.4.1.3
1 This is the namespace of the ccREL RDF vocabulary. OMDoc 1.2 used a version without a trailing hash for the XML elements in the CC module, but we will now deprecate this version. 2 The URI with a trailing hash is suitable for use in RDF. For XML namespaces, there is an alternative URI without that hash.
Prefix cc cd dc dct foaf gams h m marcrel mnp mom o om oms omv oo owl rdf rdfs sioc sioc_arg sioc_t xhv xsd
A Namespace Prefixes
156
B Surveys B.1 Reporting and Solving Issues with Mathematical Knowledge Items
157
Bibliography [AB08]
Ben Adida and Mark Birbeck. RDFa Primer. Bridging the Human and Data Webs. W3C Working Group Note. World Wide Web Consortium (W3C), 2008. url: http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/. See p. 61.
[Abe+08]
Hal Abelson, Ben Adida, Mike Linksvayer, and Nathan Yergler. ccREL: The Creative Commons Rights Expression Language. Tech. rep. Creative Commons, 2008. url: http://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf (visited on 2009/10/22). See pp. 21, 45, 57, 62.
[ABT04]
Andrea Asperti, Grzegorz Bancerek, and Andrej Trybulec, eds. Mathematical Knowledge Management, MKM’04. LNAI 3119. Springer Verlag, 2004.
[Acm]
The 1998 ACM Computing Classification System. 1998. url: http://www.acm.org/ about/class/ccs98 (visited on 2009/11/18). See p. 20.
[Act]
ActiveMath. url: http://www.activemath.org/ (visited on 2009/10/22). See pp. 21, 93.
[Adi+08]
Ben Adida, Mark Birbeck, Shane McCarron, and Steven Pemberton. RDFa in XHTML: Syntax and Processing. W3C Recommendation. World Wide Web Consortium (W3C), 2008. url: http://www.w3.org/TR/2008/REC-rdfa-syntax20081014/. See pp. 35, 49, 58, 59.
[AEB07]
Miguel A. Abánades, Jesús Escribano, and Francisco Botana. “First Steps on Using OpenMath to Add Proving Capabilities to Standard Dynamic Geometry Systems”. In: Towards Mechanized Mathematical Assistants. MKM/Calculemus 2007. Ed. by Manuel Kauers, Manfred Kerber, Robert Miner, and Wolfgang Windsteiger. LNAI 4573. Springer Verlag, 2007, pp. 131–145. See p. 149.
[Akh+08]
Waseem Akhtar, Jacek Kopecký, Thomas Krennwallner, and Axel Polleres. “XSPARQL: Traveling between the XML and RDF worlds – and avoiding the XSLT pilgrimage”. In: ESWC. Ed. by Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis. Vol. 5021. Lecture Notes in Computer Science. Springer, 2008. See p. 115.
[Alf07]
Ron Alford. PROPOSAL: Deprecate membershipClass, add memberOf. May 25, 2007. url: http://lists.foaf- project.org/pipermail/foaf- dev/2007May/008551.html. See p. 141.
[And]
Eric Andrès. LaTeX2OQMath. url: http://www.activemath.org/~eandres/ l2o.php (visited on 2010/01/06). See pp. xiii, 85.
158
Bibliography
[Ank+06]
Anupriya Ankolekar, Katia Sycara, James Herbsleb, Robert Kraut, and Chris Welty. “Supporting Online Problem-Solving Communities with the Semantic Web”. In: Proc. 15th International Conference on World Wide Web (WWW’06). 2006, pp. 575–584. See p. 75.
[Ank+08]
Anupriya Ankolekar, Markus Krötzsch, Thanh Tran, and Denny Vrandečić. “The two cultures: Mashing up Web 2.0 and the Semantic Web”. In: Web Semantics 6.1 (2008), pp. 70–75. See pp. 6, 94.
[Aus+09]
Ron Ausbrooks, Stephen Buswell David Carlisle, Giorgi Chavchanidze, Stéphane Dalmas, Stan Devitt, Angel Diaz, Sam Dooley, Roger Hunter, Patrick Ion, Michael Kohlhase, Azzeddine Lazrek, Paul Libbrecht, Bruce Miller, Robert Miner, Murray Sargent, Bruce Smith, Neil Soiffer, Robert Sutor, and Stephen Watt. Mathematical Markup Language (MathML) Version 3.0. W3C Candidate Recommendation. World Wide Web Consortium (W3C), 2009. url: http://www.w3.org/TR/2009/ CR-MathML3-20091215. See pp. 22, 55, 66, 68, 69.
[Aut+07]
Serge Autexier, Armin Fiedler, Thomas Neumann, and Marc Wagner. “Supporting User-Defined Notations When Integrating Scientific Text-Editors with Proof Assistance Systems”. In: Towards Mechanized Mathematical Assistants. MKM/Calculemus 2007. Ed. by Manuel Kauers, Manfred Kerber, Robert Miner, and Wolfgang Windsteiger. LNAI 4573. Springer Verlag, 2007, pp. 176–190. See pp. 31, 73, 117.
[Aut+08]
Serge Autexier, John Campbell, Julio Rubio, Volker Sorge, Masakazu Suzuki, and Freek Wiedijk, eds. Intelligent Computer Mathematics, 9th International Conference, AISC 2008 15th Symposium, Calculemus 2008 7th International Conference, MKM 2008 Birmingham, UK, July 28 - August 1, 2008, Proceedings. LNAI 5144. Springer Verlag, 2008.
[BA08]
Joachim Baumeister and Martin Atzmüller, eds. Wissens- und Erfahrungsmanagement LWA (Lernen, Wissensentdeckung und Adaptivität) Conference Proceedings. Vol. 448. 2008.
[BB07]
Uldis Boj¯ars and John G. Breslin. SIOC Core Ontology Specification. W3C Member Submission. World Wide Web Consortium (W3C), 2007. url: http://www.w3. org/Submission/2007/SUBM-sioc-spec-20070612/. See p. 46.
[BBL08]
David Beckett and Tim Berners-Lee. Turtle – Terse RDF Triple Language. W3C Team Submission. World Wide Web Consortium (W3C), 2008. url: http://www. w3.org/TeamSubmission/2008/SUBM-turtle-20080114/. See pp. 12, 112.
[Bec04a]
Dave Beckett. “Modernising Semantic Web Markup”. In: XML Europe 2004, 18–21 April 2004. Amsterdam, The Netherlands 2004. url: http://www.dajobe.org/ papers/xmleurope2004/. See pp. 12, 112.
[Bec04b]
Dave Beckett. RDF/XML Syntax Specification (Revised). W3C Recommendation. World Wide Web Consortium (W3C), 2004. url: http://www.w3.org/TR/2004/ REC-rdf-syntax-grammar-20040210/. See pp. 12, 49, 53, 112.
159
Bibliography
[Bec+08]
Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis, eds. The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Spain, June 1-5, 2008, Proceedings. Vol. 5021. Lecture Notes in Computer Science. Springer, 2008.
[Ber]
Berkeley DB XML. url: http://www.oracle.com/database/berkeley-db/xml/ (visited on 2009/10/22). See p. 109.
[Ber06]
Anders Berglund. Extensible Stylesheet Language (XSL) Version 1.1. W3C Recommendation. World Wide Web Consortium (W3C), 2006. url: http://www.w3. org/TR/2007/REC-xsl11-20061205/. See p. 11.
[Ber+07]
Anders Berglund, Scott Boag, Don Chamberlin, Mary F. Fernández, Michael Kay, Jonathan Robie, and Jérôme Siméon. XML Path Language (XPath) 2.0. W3C Recommendation. World Wide Web Consortium (W3C), 2007. url: http://www. w3.org/TR/2007/REC-xpath20-20070123/. See p. 11.
[Ber+09]
Diego Berrueta, Dan Brickley, Stefan Decker, Sergio Fernández, Christoph Görn, Andreas Harth, Tom Heath, Kingsley Idehen, Kjetil Kjernsmo, Alistair Miles, Alexandre Passant, Axel Polleres, and Luis Polo. SIOC Core Ontology Specification. Ed. by Uldis Boj¯ars and John G. Breslin. Version 1.31. 2009. url: http:// rdfs.org/sioc/spec/ (visited on 2009/10/27). See pp. viii, 46, 77.
[BG04]
Dan Brickley and Ramanathan V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation. World Wide Web Consortium (W3C), 2004. url: http://www.w3.org/TR/2004/REC- rdf- schema- 20040210/. See pp. 15, 91.
[Bir09]
Mark Birbeck. Proposal for ‘URIs everywhere’. Nov. 25, 2009. url: http://lists. w3.org/Archives/Public/public- rdf- in- xhtml- tf/2009Nov/0081.html. See p. 60.
[Biz+08]
Chris Bizer, Mark Butler, Stephen Garland, David Huynh, David Karger, Ryan Lee, Stefano Mazzocchi, Emmanuel Pietriga, Dennis Quan, and Karun Bakshi. Fresnel – Display Vocabulary for RDF. Tech. rep. World Wide Web Consortium (W3C), 2008. url: http://www.w3.org/2005/04/fresnel-info/. See p. 13.
[BK07]
Grzegorz Bancerek and Michael Kohlhase. “Towards a Mizar Mathematical Library in OMDoc Format”. In: From Insight to Proof: Festschrift in Honour of Andrzej Trybulec. Ed. by R. Matuszewski and A. Zalewska. Vol. 10(23). Studies in Logic, Grammar and Rhetoric. University of Białystok, 2007, pp. 265–275. url: http://kwarc.info/kohlhase/papers/trybook.pdf. See p. 25.
[BL+06]
Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly, Ruth Dhanaraj, James Hollenbach, Adam Lerer, and David Sheets. “Tabulator: Exploring and Analyzing linked data on the Semantic Web”. English. In: Proceedings of the The 3rd International Semantic Web User Interaction Workshop (SWUI06). 2006. See p. 149.
[BLFM05]
Tim Berners-Lee, Roy T. Fielding, and Larry Masinter. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986. Internet Engineering Task Force, 2005. url: http://www.ietf.org/rfc/rfc3986.txt. See p. 9.
160
Bibliography
[BLHL01]
Tim Berners-Lee, James Hendler, and Ora Lassila. “The Semantic Web - Computers navigating tomorrow’s Web will understand more of what’s going on making it more likely that you’ll get what you really want”. In: Scientific American 284 (2001). See p. 6.
[BM04a]
M. Bidoit and Peter D. Mosses. CASL — the Common Algebraic Specification Language: User Manual. Vol. 2900. LNCS. Springer Verlag, 2004. See p. 25.
[BM04b]
Paul V. Biron and Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. W3C Recommendation. World Wide Web Consortium (W3C), 2004. url: http: //www.w3.org/TR/2004/REC-xmlschema-2-20041028/. See p. 62.
[BM07]
Dan Brickley and Libby Miller. FOAF Vocabulary Specification 0.91. Tech. rep. ILRT Bristol, 2007. url: http://xmlns.com/foaf/spec/20071002.html. See pp. 46, 140, 142.
[BM09]
Mark Birbeck and Shane McCarron. CURIE Syntax 1.0. A syntax for expressing Compact URIs. W3C Candidate Recommendation. World Wide Web Consortium (W3C), 2009. url: http://www.w3.org/TR/2009/CR-curie-20090116. See p. 59.
[Boj+08]
Uldis Boj¯ars, John G. Breslin, Vassilios Peristeras, Giovanni Tummarello, and Stefan Decker. “Interlinking the Social Web with Semantics”. In: IEEE Intelligent Systems 23.3 (May/June 2008). See p. 46.
[BP08]
Joachim Baumeister and Frank Puppe. “Web-based Knowledge Engineering using Knowledge Wikis”. In: Proc. of the AAAI 2008 Spring Symposium on “Symbiotic Relationships between Semantic Web and Knowledge Engineering”. 2008, pp. 1–13. See p. 122.
[Brä05]
Andreas Brändle. “Zu wenige Köche verderben den Brei. Eine Inhaltsanalyse der Wikipedia aus Perspektive der journalistischen Qualität, des Netzeffekts und der Ökonomie der Aufmerksamkeit.” MA thesis. Universität Zürich, 2005. See p. 121.
[Bra+08]
Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau. Extensible Markup Language (XML) 1.0 (Fifth Edition). W3C Recommendation. World Wide Web Consortium (W3C), 2008. url: http://www.w3.org/TR/2004/ REC-xml-20081126. See pp. 10, 11.
[Bra97]
Scott Bradner. Key words for use in RFCs to indicate requirement levels. RFC 2119. Internet Engineering Task Force, 1997. url: http://www.ietf.org/rfc/rfc211 9.txt. See p. 69.
[Bru87]
N. G. de Bruijn. “The Mathematical Vernacular, a language for mathematics with typed sets”. In: Proceedings of the Workshop on Programming Languages. Ed. by P. Dybjer et al. 1987. See p. 2.
[BS05]
Jonathan Borwein and Terry Stanway. “Knowledge and Community in Mathematics”. In: The Mathematical Intelligencer 27.2 (2005), pp. 7–16. See p. 5.
[BTS+]
Abraham Bernstein, Jonas Tappolet, Henry Story, et al. baetle – Bug And Enhancement Tracking Language. url: http://code.google.com/p/baetle (visited on 2009/10/27). See p. 75.
161
Bibliography
[Buf+08]
Michel Buffa, Fabien Gandon, Guillaume Ereteo, Peter Sander, and Catherine Faron. “SweetWiki: A semantic wiki”. In: Web Semantics: Science, Services and Agents on the World Wide Web (2008). doi: http://doi.acm.org/10.1016/ j.websem.2007.11.003. See p. 122.
[Bug]
Bugzilla. url: http://www.bugzilla.org (visited on 2009/10/27). See p. 75.
[Bus+04]
Stephen Buswell, Olga Caprotti, David P. Carlisle, Michael C. Dewar, Marc Gaetano, and Michael Kohlhase. The Open Math Standard, Version 2.0. Tech. rep. The Open Math Society, 2004. url: http://www.openmath.org/standard/om20. See pp. 23, 35, 133.
[Car+05]
Jeremy J. Carroll, Pat Hayes, Christian Bizer, and Patrick Stickler. “Named Graphs, Provenance and Trust”. In: Proceedings of the 14th WWW conference. Ed. by Allan Ellis and Tatsuya Hagino. ACM Press, 2005, pp. 613–622. isbn: 1-59593-046-9. See p. 49.
[Car+09]
Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. Watt, eds. MKM/Calculemus 2009 Proceedings. LNAI 5625. Springer Verlag, 2009. See p. 174.
[CB87]
Jeff Conklin and Michael L. Begeman. “gIBIS: A Hypertext Tool for Team Design Deliberation”. In: ACM Hypertext. ACM Press, 1987, pp. 247–251. See pp. 74, 76.
[CC09]
Larry Cable and Thorick Chow. Streaming API for XML. Java Specification Request (JSR) 173. Version Maintenance Draft Review 3. 2009. url: http://jcp.org/en/ jsr/detail?id=173 (visited on 2009/12/07). See p. 12.
[CDT04]
Olga Caprotti, Mike Dewar, and Daniele Turi. “Mathematical Service Matching Using Description Logic and OWL”. In: Mathematical Knowledge Management, MKM’04. Ed. by Andrea Asperti, Grzegorz Bancerek, and Andrej Trybulec. LNAI 3119. Springer Verlag, 2004, pp. 73–87. See pp. 34, 44, 45.
[Çel08]
Tantek Çelik. hCalendar. Microformat specification. Technorati, 2008. url: http: //microformats.org/wiki/hcalendar (visited on 2009/10/22). See p. 114.
[CG+08]
Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter PatelSchneider, and Ulrike Sattler. “OWL 2: The next step for OWL”. In: Web Semantics: Science, Services and Agents on the World Wide Web 6.4 (2008), pp. 309–322. See pp. 49, 142.
[CK07]
Pierre Corbineau and Cezary Kaliszyk. “Cooperative Repositories for Formal Proofs”. In: Towards Mechanized Mathematical Assistants. MKM/Calculemus 2007. Ed. by Manuel Kauers, Manfred Kerber, Robert Miner, and Wolfgang Windsteiger. LNAI 4573. Springer Verlag, 2007, pp. 221–234. See p. 118.
[CM01]
James Clark and Makoto Murata. RELAX NG Specification. Tech. rep. OASIS, 2001. url: http://www.relaxng.org/spec-20011203.html. See p. 11.
[CM05]
Dan Connolly and Libby Miller. RDF Calendar. W3C Interest Group Note. World Wide Web Consortium (W3C), 2005. url: http://www.w3.org/TR/2005/NOTErdfcal-20050929/. See p. 114.
[Cnx]
ConneXions. url: http://cnx.org (visited on 2009/10/22). See p. 17.
162
Bibliography
[CO01]
Olga Caprotti and Martijn Oostdijk. “On Communicating Proofs in Interactive Mathematical Documents”. In: Proceedings of Artificial Intelligence and Symbolic Computation, AISC’2000. Ed. by Eugenio Roanes Lozano. LNAI 1930. Springer Verlag, 2001, pp. 53–64. See p. 23.
[Coh+06]
Arjeh M. Cohen, Hans Cuypers, Dorina Jibetean, and Mark Spanbroek. “Interactive Learning and Mathematical Calculus”. In: Mathematical Knowledge Management, MKM’05. Ed. by Michael Kohlhase. LNAI 3863. Springer Verlag, 2006, pp. 330–345. See p. 93.
[Con07]
Dan Connolly. Gleaning Resource Descriptions from Dialects of Languages (GRDDL). W3C Recommendation. World Wide Web Consortium (W3C), 2007. url: http://www.w3.org/TR/2007/REC-grddl-20070911/. See p. 47.
[CS04]
Jeremy J. Carroll and Patrick Stickler. TriX: RDF Triples in XML. Tech. rep. HPL–2004–56. HP Laboratories Bristol, 2004. url: http://www.hpl.hp.com/ techreports/2004/HPL-2004-56.pdf. See p. 58.
[CT04]
XML Information Set (Second Edition). W3C Recommendation. World Wide Web Consortium (W3C), 2004. url: http : / / www . w3 . org / TR / 2004 / REC - xml infoset-20040204. See p. 10.
[Cun+]
Ward Cunningham et al. Wiki Design Principles. url: http://c2.com/cgi/wiki? WikiDesignPrinciples (visited on 2009/10/29). See p. 119.
[Cun+02]
Ward Cunningham et al. What is Wiki. June 27, 2002. url: http://wiki.org/ wiki.cgi?WhatIsWiki (visited on 2009/10/28). See p. 119.
[Cuy+08]
Hans Cuypers, Arjeh M. Cohen, Jan Willem Knopper, Rikko Verrijzer, and Mark Spanbroek. “MathDox, a system for interactive Mathematics”. In: Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008. Vienna, Austria: AACE, 2008, pp. 5177–5182. url: http : / / go . editlib.org/p/29092. See p. 93.
[CX05]
Isabel F. Cruz and Huiyong Xiao. “The Role of Ontologies in Data Integration”. In: Engineering Intelligent Systems for Electrical Engineering and Communication 13.4 (2005), pp. 245–252. issn: 1363–2078. See pp. xi, 15, 36, 44, 147.
[Dav05]
Ian Davis. The Sixteen Faces of Eve. Sept. 27, 2005. url: http://iandavis.com/ blog/2005/09/the-sixteen-faces-of-eve (visited on 2009/10/22). See p. 112.
[Dav99]
James H. Davenport. A Small OpenMath Type System. Tech. rep. The OpenMath Esprit Project, 1999. url: http : / / monet . nag . co . uk / cocoon / openmath / standard/sts.pdf. See p. 25.
[Dcm]
Dublin Core Metadata Element Set. DCMI Recommendation. Version 1.1. Dublin Core Metadata Initiative, 2008. url: http://dublincore.org/documents/2008/ 01/04/dces/. See pp. 20, 56.
[DCM05]
DCMI Usage Board. MARC Relator terms and Dublin Core. Tech. rep. Dublin Core Metadata Initiative, 2005. url: http : / / dublincore . org / usage / documents / relators/. See p. 58.
163
Bibliography
[DCM08]
DCMI Usage Board. DCMI Metadata Terms. DCMI Recommendation. Dublin Core Metadata Initiative, 2008. url: http://dublincore.org/documents/2008/ 01/14/dcmi-terms/. See p. 20.
[Del+08a]
Klaas Dellschaft, Hendrik Engelbrecht, José Monte Barreto, Sascha Rutenbeck, and Steffen Staab. “Cicero: Tracking Design Rationale in Collaborative Ontology Engineering”. In: ESWC. Ed. by Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis. Vol. 5021. Lecture Notes in Computer Science. Springer, 2008, pp. 782–786. See p. 122.
[Del+08b]
Klaas Dellschaft, Aldo Gangemi, Jose Manuel Gomez, Holger Lewen, Valentina Presutti, and Margherita Sini. Practical Methods to Support Collaborative Ontology Design. Ed. by Klaas Dellschaft. NEON EU-IST-2005-027595 Deliverable D2.3.1. Feb. 2008. url: http://www.neon-project.org/web-content/images/ Publications/neon_2008_d2.3.1.pdf. See pp. 79, 122.
[Div+09]
Renata Dividino, Simon Schenk, Sergej Sizov, and Steffen Staab. “Provenance, Trust, Explanations – and all that other Meta Knowledge”. In: Künstliche Intelligenz 2 (Feb. 2009), pp. 24–30. See p. 49.
[DK09]
James H. Davenport and Michael Kohlhase. “Unifying Math Ontologies: A tale of two standards”. In: MKM/Calculemus 2009 Proceedings. Ed. by Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. Watt. LNAI 5625. Springer Verlag, 2009, pp. 263–278. url: http://kwarc.info/kohlhase/papers/mkm09MMLOM3.pdf. See pp. 23, 58, 135.
[DL08]
James H. Davenport and Paul Libbrecht. “The Freedom to Extend OpenMath and its Utility”. Ed. by Manfred Kerber. In: Journal of Mathematics and Computer Science, special issue on Mathematical Knowledge Management (2008). See pp. 133, 149.
[DN03]
James H. Davenport and William A. Naylor. Units and Dimensions in OpenMath. 2003. url: http : / / www . openmath . org / documents / Units . pdf (visited on 2009/10/22). See p. 106.
[Dol+03]
Peter Dolog, Rita Gavrioloaie, Wolfgang Nejdl, and Jan Brase. “Integrating Adaptive Hypermedia Techniques and Open RDF-based Environments”. In: Proceedings of the 12th WWW conference. ACM Press, 2003. See p. 45.
[Dra+07]
Denise Draper, Peter Fankhauser, Mary Fernández, Ashok Malhotra, Kristoffer Rose, Michael Rys, Jérôme Siméon, and Philip Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C Recommendation. World Wide Web Consortium (W3C), 2007. url: http://www.w3.org/TR/2007/REC-xquery-semantics-20070123/. See p. 28.
[EGH08]
Anja Ebersbach, Markus Glaser, and Richard Heigl. Wiki: Web Collaboration. Springer-Verlag New York, 2008. See p. 119.
[EH05]
Allan Ellis and Tatsuya Hagino, eds. Proceedings of the 14th international World Wide Web conference, WWW 2005, Chiba, Japan, May 10–14, 2005. ACM Press, 2005. isbn: 1-59593-046-9.
164
Bibliography
[Eix]
Ramon Eixarch. WIRIS Plugin for Moodle. url: http : / / www . wiris . com / content/view/96/ (visited on 2009/11/10). See p. 86.
[EPF]
École polytechnique fédérale de Lausanne. The Scala Programming Language. url: http://www.scala-lang.org (visited on 2009/10/22). See p. 71.
[Evo]
EvoOnt – A Software Evolution Ontology. url: http://www.ifi.uzh.ch/ddis/ evo/ (visited on 2009/10/27). See p. 75.
[Exi]
eXist database. url: http://exist.sourceforge.net/ (visited on 2009/10/22). See p. 109.
[Far04]
William M. Farmer. “MKM: A new Interdisciplinary Field of Research”. In: Bulletin of the ACM Special Interest Group on Symbolic and Automated Mathematics (SIGSAM) 38.2 (2004), pp. 47–52. See p. 5.
[FBS]
Sergio Fernández, Uldis Boj¯ars, and Christopher Schmidt. SpecGen v5 – ontology specification generator tool. url: http://forge.morfeo-project.org/wiki_en/ index.php/SpecGen (visited on 2009/10/22). See p. 142.
[Fen03]
Dieter Fensel. Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2003. See p. 122.
[FFJ03]
Jon Ferraiolo, Jun Fujisawa, and Dean Jackson. Scalable Vector Graphics (SVG) 1.1 Specification. W3C Recommendation. World Wide Web Consortium (W3C), 2003. url: http://www.w3.org/TR/2008/REC-SVG11-20030114/. See p. 11.
[Fgd]
Content Standard for Digital Geospatial Metadata Workbook. Workbook. Version 2.0. Federal Geographic Data Committee, 2000. url: http://www.fgdc. gov/metadata/documents/workbook_0501_bmk.pdf. See p. 18.
[FGT92]
William Farmer, Josuah Guttman, and Xavier Thayer. “Little Theories”. In: Proceedings of the 11th Conference on Automated Deduction. Ed. by D. Kapur. Vol. 607. LNCS. Saratoga Springs, NY, USA: Springer Verlag, 1992, pp. 467–581. See pp. 16, 50.
[FHT06]
Colin Fraser, Harry Halpin, and Kavita E. Thomas. “Developing an Argumentation Ontology for Mailing Lists”. In: AIMSA 2006. Ed. by J. Euzenat and J. Domingue. Vol. 4183. LNAI. Springer, 2006, pp. 150–161. See p. 152.
[Fie00]
Roy T. Fielding. “Architectural Styles and the Design of Network-based Software Architectures”. PhD thesis. University of California, Irvine, 2000. url: http:// www.ics.uci.edu/~fielding/pubs/dissertation/top.htm. See p. 95.
[Fie01]
Armin Fiedler. “P.rex: An Interactive Proof Explainer”. In: Automated Reasoning — 1st International Joint Conference, IJCAR 2001. Ed. by Rajeev Goré, Alexander Leitsch, and Tobias Nipkow. LNAI 2083. Siena, Italy: Springer Verlag, 2001, pp. 416–420. See p. 16.
[For]
Formalized Mathematics. a computer assisted approach. url: http:/fm.mizar.org (visited on 2009/12/02). See p. 3.
165
Bibliography
[Gam]
GAMS: Guide to Available Mathematical Software. National Institute of Standards and Technology (NIST). url: http://gams.nist.gov (visited on 2009/12/16). See p. 21.
[Gar05]
Jesse James Garrett. Ajax: A new approach to web applications. Tech. rep. Seen February 2006. Adaptive Path, 2005. url: http : / / www . adaptivepath . com / publications/essays/archives/000385.php. See p. 93.
[GC07]
Jeremy Gow and Paul Cairns. “Closing the Gap Between Formal and Digital Libraries of Mathematics”. In: Studies in Logic, Grammar and Rhetoric 10.23 (2007). See pp. 2, 5.
[Ger+08]
Alex Gerdes, Bastiaan Heeren, Johan Jeuring, and Sylvia Stuurman. Feedback Services for Exercise Assistants. Tech. rep. UU-CS-2008-018. Utrecht University, 2008. See p. 93.
[GF+92]
M. Genesereth, R. Fikes, et al. Knowledge Interchange Format: Version 3.0 Reference Manual. Tech. rep. Computer Science Department, Stanford University, 1992. See p. 15.
[Gic08]
Jana Giceva. Capturing Rhetorical Aspects in Mathematical Documents using OMDoc and SALT. Technical Report. Jacobs University, DERI Galway, 2008. url: https://svn.kwarc.info/repos/supervision/intern/2008/giceva _ jana/ project/internship%20report.pdf. See pp. xiv, 107.
[Gin+09]
Deyan Ginev, Constantin Jucovschi, Stefan Anca, Mihai Grigore, Catalin David, and Michael Kohlhase. “An Architecture for Linguistic and Semantic Analysis on the arXMLiv Corpus”. In: Applications of Semantic Technologies (AST) Workshop at Informatik 2009. 2009. url: http://www.kwarc.info/projects/lamapun/ pubs/AST09_LaMaPUn+appendix.pdf. See pp. xiv, 16, 112.
[GK97]
Thomas F. Gordon and Nikos Karacapilidis. “The Zeno Argumentation Framework”. In: Sixth International Conference on Artificial Intelligence and Law. ACM Press, 1997, pp. 10–18. See p. 144.
[GLR09]
Jana Giceva, Christoph Lange, and Florian Rabe. “Integrating Web Services into Active Mathematical Documents”. In: MKM/Calculemus 2009 Proceedings. Ed. by Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. Watt. LNAI 5625. Springer Verlag, 2009, pp. 279–293. url: https://svn.omdoc.org/ repos/jomdoc/doc/pubs/mkm09/jobad/jobad-server.pdf. See pp. 70, 116.
[GM05]
Stephan Grimm and Boris Motik. “Closed World Reasoning in the Semantic Web through Epistemic Operators”. In: OWL: Experiences and Directions (OWLED). Ed. by Bernardo Cuenca Grau, Ian Horrocks, Bijan Parsia, and Peter PatelSchneider. 2005. See pp. xiii, 91.
[GM08]
George Goguadze and Erica Melis. “Feedback in ActiveMath Exercises”. In: International Conference on Mathematics Education (ICME). 2008. See p. 93.
166
Bibliography
[GMA03]
George Goguadze, Erica Melis, and Andrea Asperti. A proposal for a unified Metadata Model and Search Architecture. Deliverable D5.5. MKM-NET, 2003. url: http://monet.nag.co.uk/mkm/Final- docs/MKMNetTN- D5- 5.pdf. See pp. xi, 19, 38.
[GMB08]
Aurona Gerber, Alta van der Merwe, and Andries Barnard. “A Functional Semantic Web Architecture”. In: ESWC. Ed. by Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis. Vol. 5021. Lecture Notes in Computer Science. Springer, 2008. See p. 9.
[Gog03]
George Goguadze. Metadata for Mathematical Libraries. Deliverable D3.a. MoWGLI, 2003. url: http : / / mowgli . cs . unibo . it / misc / deliverables / metadata/D3a_metadata_for_math/math_metadata.pdf. See p. 18.
[Gog+04]
George Goguadze, Carsten Ullrich, Erica Melis, Jörg Siekmann, Chistian Gross, and Rafael Morales. LeActiveMath Structure and Metadata Model. Deliverable D6. LeActiveMath Consortium, 2004. url: http : / / www . activemath . org / pubs / LeAM-D6.pdf. See pp. 56, 58.
[Gon08]
Georges Gonthier. “Formal proof – The Four-Color Theorem”. In: Notices of the AMS 11 (2008), pp. 1382–1393. url: http : / / www . ams . org / notices / 200811 / tx081101382p.pdf. See p. 3.
[GP06a]
Alberto González Palomo. “QMath: A Human-Oriented Language and Batch Formatter for OMDoc”. In: OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. Chap. 26.2. url: http: //omdoc.org/omdoc1.2.pdf. See pp. 29, 84.
[GP06b]
Alberto González Palomo. “Sentido: an Authoring Environment for OMDoc”. In: OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. Chap. 26.3. url: http://omdoc.org/omdoc1. 2.pdf. See p. 87.
[Gro+03]
Paul Grosso, Eve Maler, Jonathan Marsh, and Norman Walsh. W3C XPointer Framework. W3C Recommendation. World Wide Web Consortium (W3C), 2003. url: http://www.w3.org/TR/2003/REC-xptr-framework-20030325/. See p. 10.
[Gro+07]
Tudor Groza, Siegfried Handschuh, Knud Möller, and Stefan Decker. “SALT – Semantically Annotated LATEX for Scientific Publications”. In: ESWC. Ed. by Enrico Franconi, Michael Kifer, and Wolfgang May. Vol. 4519. Lecture Notes in Computer Science. Springer, 2007, pp. 518–532. isbn: 978-3-540-72666-1. See pp. xi, 33, 45.
[Gru93]
Thomas R. Gruber. “A translation approach to portable ontology specifications”. In: Knowledge Acquisition 5.2 (June 1993), pp. 199–220. issn: 1042-8143. doi: 10. 1006 / knac . 1993 . 1008. url: http : / / portal . acm . org / citation . cfm ? id = 173747. See p. 15.
[GS+]
John Gruber, Aaron Swartz, et al. Markdown. url: http : / / daringfireball . net/projects/markdown/ (visited on 2009/11/11). See p. 85.
167
Bibliography
[GSC03]
Ferruccio Guidi and Claudio Sacerdoti Coen. “Querying Distributed Digital Libraries of Mathematics”. In: Proceedings of the 11th Symposium on the Integration of Symbolic Computation and Mechanized Reasoning (Calculemus 2003). Ed. by Thérèse Hardin and Renaud Rioboo. Rome, Italy 2003. url: http : / / www . calculemus.net/meetings/rome03/Proceedings/final.pdf. See p. 34.
[GSMT09]
Shudi (Sandy) Gao, C. M. Sperberg-McQueen, and Henry S. Thompson. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures. W3C Candidate Recommendation. World Wide Web Consortium (W3C), 2009. url: http : / / www.w3.org/TR/2009/CR-xmlschema11-1-20090430/. See p. 11.
[Har]
Elliotte Rusty Harold. XOM. url: http://xom.nu (visited on 2009/10/22). See pp. 12, 71.
[Har+07]
Jens Hartmann, Raúl Palma, Peter Haase, and Asunción Gómez-Pérez. Ontology Metadata Vocabulary – OMV. Sept. 17, 2007. url: http://omv.ontoware.org (visited on 2009/10/22). See p. 64.
[HC09]
Aidan Hogan and Richard Cyganiak. Frequently Observed Problems on the Web of Data. Tech. rep. Version v0.3. Pedantic Web Group, 2009. url: http://pedanticweb.org/fops.html. See p. 61.
[Hec+05]
Dominik Heckmann, Tim Schwartz, Boris Brandherm, Michael Schmitz, and Margeritta von Wilamowitz-Moellendorff. “Gumo – The General User Model Ontology”. In: User Modeling 2005. Ed. by L. Ardissono, P. Brna, and A. Mitrovic. 2005, pp. 428–432. doi: http://dx.doi.org/10.1007/11527886_58. url: http: //dx.doi.org/10.1007/11527886_58. See p. 46.
[Hee+09]
Ralf Heese, Markus Luczak-Rösch, Radoslaw Oldakowski, Olga Streibel, and Adrian Paschke. “One Click Annotation”. In: Proceedings of the Workshop on Collaborative Construction, Management and Linking of Structured Knowledge (CK2009). Ed. by Tania Tudorache, Gianluca Correndo, Natasha Noy, Harith Alani, and Mark Greaves. Vol. 514. CEUR Workshop Proceedings. 2009. url: http://CEUR-WS.org/Vol-514/. See p. 86.
[Her+08]
Ivan Herman, Eric Prud’hommeaux, Thomas Roessler, and Rigo Wenning. Team Comment on ccREL: The Creative Commons Rights Expression Language Member Submission. W3C Team Comment. World Wide Web Consortium (W3C), 2008. url: http://www.w3.org/Submission/2008/02/Comment. See p. 62.
[HHA08]
Michael Hausenblas, Ivan Herman, and Ben Adida. RDFa – Bridging the Web of Documents and the Web of Data. 2008. url: http://www.w3.org/2008/Talks/ 1026-ISCW-RDFa/ (visited on 2009/11/26). See p. 61.
[HKM07]
David Huynh, David Karger, and Rob Miller. “Exhibit: Lightweight Structured Data Publishing”. In: 16th International World Wide Web Conference. Banff, Alberta, Canada: ACM, 2007. url: http://www2007.org/paper161.php. See p. 148.
168
Bibliography
[HKS06]
Eberhard Hilf, Michael Kohlhase, and Heinrich Stamerjohanns. “Capturing the Content of Physics: Systems, Observables, and Experiments”. In: Mathematical Knowledge Management, MKM’06. Ed. by Jon Borwein and William M. Farmer. LNAI 4108. Springer Verlag, 2006, pp. 165–178. url: http : / / kwarc . info / kohlhase/papers/mkm06physml.pdf. See p. 3.
[Hov+]
Jean-François Hovinne et al. WYMEditor – web-based XHTML editor. url: http: //www.wymeditor.org (visited on 2009/11/11). See p. 86.
[HPS09]
Matthew Horridge and Peter F. Patel-Schneider. OWL 2 Web Ontology Language: Manchester Syntax. W3C Candidate Recommendation. World Wide Web Consortium (W3C), 2009. url: http : / / www . w3 . org / TR / 2009 / NOTE - owl2 manchester-syntax-20091027/. See p. 55.
[HPSH03]
Ian Horrocks, Peter F. Patel-Schneider, and Frank van Harmelen. “From SHIQ and RDF to OWL: The Making of a Web Ontology Language”. In: Web Semantics 1.1 (2003), pp. 7–26. See p. 15.
[HR03]
Thérèse Hardin and Renaud Rioboo, eds. 11th Symposium on the Integration of Symbolic Computation and Mechanized Reasoning (Calculemus 2003). Rome, Italy 2003. url: http : / / www . calculemus . net / meetings / rome03 / Proceedings / final.pdf.
[HR09a]
Peter Horn and Dan Roozemond. “OpenMath in SCIEence: SCSCP and POPCORN”. In: MKM/Calculemus 2009 Proceedings. Ed. by Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. Watt. LNAI 5625. Springer Verlag, 2009, pp. 474–479. See p. 84.
[HR09b]
Peter Horn and Dan Roozemond, eds. The Popcorn OpenMath Representation. Version 1.0. 2009. url: http://java.symcomp.org/FormalPopcorn.html (visited on 2009/11/12). See p. 84.
[HV]
Jónathan Heras Vicente. An OpenMath Content Dictionary Editor. url: http:// www.unirioja.es/cu/joheras/ (visited on 2009/10/18). See p. 133.
[HY07]
Michael Hausenblas and Wing C Yung. RDFa Test Suite. W3C Editor’s Draft. World Wide Web Consortium (W3C), 2007. url: http://www.w3.org/2006/ 07/SWD/RDFa/testsuite/. See p. 143.
[IEE02a]
IEEE Learning Technology Standards Committee. Standard for Learning Object Metadata. Tech. rep. 1484.12.1. IEEE, 2002. See p. 21.
[IEE02b]
IEEE Learning Technology Standards Committee. Standard for Resource Description Framework (RDF) binding for Learning Object Metadata data model. Tech. rep. 1484.12.4. IEEE, 2002. See p. 45.
[IL09]
Toby A. Inkster and Christoph Lange. RDFa Host Languages. Nov. 30, 2009. url: http://rdfa.info/wiki/?title=RDFa_Host_Languages&oldid=836. See p. 59.
[Ink]
Toby A. Inkster. Swignition. url: http://buzzword.org.uk/swignition/ (visited on 2009/10/22). See p. 115.
169
Bibliography
[Jan06]
Peter Jansen. “An Emacs mode for editing OMDoc Documents”. In: OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. Chap. 26.16. url: http : / / omdoc . org / omdoc1 . 2 . pdf. See p. 84.
[Jip]
Peter Jipsen. ASciencePad – a TiddlyWiki suitable for scientific notes. url: http: //math.chapman.edu/~jipsen/asciencepad/asciencepad.html (visited on 2009/11/10). See pp. 86, 89.
[Job]
JOBAD Framework – JavaScript API for OMDoc-based active documents. url: http://jomdoc.omdoc.org/wiki/JOBAD (visited on 2009/10/22). See pp. xiv, 107, 143.
[Joh05]
Pete Johnston. MARC Relator Properties in Dublin Core Metadata. Tech. rep. UKOLN, 2005. url: http://www.ukoln.ac.uk/metadata/dcmi/marcrel-ex/. See p. 61.
[Jom]
JOMDoc Project — Java Library for OMDoc documents. url: http : / / jomdoc . omdoc.org (visited on 2009/10/22). See pp. 29, 71.
[Kal+06]
Aditya Kalyanpur, Bijan Parsia, Evren Sirin, Bernardo Cuenca Grau, and James A. Hendler. “Swoop: A Web Ontology Editing Browser”. In: Web Semantics 4.2 (2006), pp. 144–153. See p. 43.
[Kap92]
D. Kapur, ed. Proceedings of the 11th Conference on Automated Deduction. Vol. 607. LNCS. Saratoga Springs, NY, USA: Springer Verlag, 1992.
[Kau+07]
Manuel Kauers, Manfred Kerber, Robert Miner, and Wolfgang Windsteiger, eds. MKM/Calculemus 2007. LNAI 4573. Springer Verlag, 2007.
[Kay07]
Michael Kay. XSL Transformations (XSLT) Version 2.0. W3C Recommendation. World Wide Web Consortium (W3C), 2007. url: http://www.w3.org/TR/2007/ REC-xslt20-20070123/. See pp. 11, 28.
[Kay08]
Michael Kay. Saxonica: XSLT and XQuery Processing. 2008 (visited on 2009/10/22). See p. 112.
[KBT07]
C. Kiefer, A. Bernstein, and J. Tappolet. “Analyzing Software with iSPARQL”. In: Proc. 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE ’07). 2007. See p. 75.
[Kit+07]
Aniket Kittur, Bongwon Suh, Bryan A. Pendleton, and Ed H. Chi. “He says, she says: conflict and coordination in Wikipedia”. In: CHI. Ed. by Mary Beth Rosson and David J. Gilmore. ACM, 2007, pp. 453–462. isbn: 978-1-59593-593-9. See p. 120.
[KK08]
Andrea Kohlhase and Michael Kohlhase. “Semantic Knowledge Management for Education”. In: Proceedings of the IEEE; Special Issue on Educational Technology 96.6 (June 2008), pp. 970–989. url: http://kwarc.info/kohlhase/papers/ semkm4ed.pdf. See pp. 15, 70.
170
Bibliography
[KLR07]
Michael Kohlhase, Christoph Lange, and Florian Rabe. “Presenting Mathematical Content With Flexible Elisions”. In: OpenMath/JEM Workshop 2007. Ed. by Olga Caprotti, Michael Kohlhase, and Paul Libbrecht. 2007. url: http://www. openmath.org/meetings/linz2007/. See p. 31.
[KLW95]
Michael Kifer, Georg Lausen, and James Wu. “Logical Foundations of ObjectOriented and Frame-Based Languages”. In: Journal of the ACM 42.4 (1995), pp. 741–843. See pp. 15, 49.
[KMM07]
Michael Kohlhase, Christine Müller, and Normen Müller. “Documents with flexible Notation Contexts as Interfaces to Mathematical Knowledge”. In: Mathematical User Interfaces Workshop 2007. Ed. by Paul Libbrecht. 2007. url: http://www. activemath.org/~paul/MathUI07. See p. 110.
[KMR08]
Michael Kohlhase, Christine Müller, and Florian Rabe. “Notations for Living Mathematical Documents”. In: Intelligent Computer Mathematics, 9th International Conference, AISC 2008 15th Symposium, Calculemus 2008 7th International Conference, MKM 2008 Birmingham, UK, July 28 - August 1, 2008, Proceedings. Ed. by Serge Autexier, John Campbell, Julio Rubio, Volker Sorge, Masakazu Suzuki, and Freek Wiedijk. LNAI 5144. Springer Verlag, 2008, pp. 504–519. url: http: //omdoc.org/pubs/mkm08-notations.pdf. See pp. 27, 29, 124, 148, 149.
[Knu92]
Donald E. Knuth. Literate Programming. The University of Chicago Press, 1992. See p. 4.
[Koh]
Michael Kohlhase. OAF: Semiformalizations. url: http://trac.kwarc.info/ oaf/wiki/semiformalizations?version=1 (visited on 2009/12/14). See p. 2.
[Koh06a]
Michael Kohlhase, ed. Mathematical Knowledge Management, MKM’05. LNAI 3863. Springer Verlag, 2006.
[Koh06b]
Michael Kohlhase. OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. url: http://omdoc.org/omdoc1. 2.pdf. See pp. 15–17, 25, 27, 29, 35, 56, 58, 61, 70, 72, 91, 101, 103.
[Koh06c]
Michael Kohlhase. “Standardizing Context in System Interoperability”. In: OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. Chap. 26.18. url: http://omdoc.org/omdoc1.2.pdf. See p. 25.
[Koh+08]
Michael Kohlhase, Ştefan Anca, Constantin Jucovschi, Alberto González Palomo, and Ioan A. Şucan. “MathWebSearch 0.4, A Semantic Search Engine for Mathematics”. manuscript, see http://mathweb.org/projects/mws/pubs/mkm08.pdf. 2008. url: http://mathweb.org/projects/mws/pubs/mkm08.pdf. See pp. 87, 149, 154.
[Koh08a]
Michael Kohlhase. “Compiling OpenMath Type systems to Relax NG Grammars”. In: 3rd JEM Workshop – Joining Educational Mathematics. Ed. by Olga Caprotti, Sebastian Xambó, Maria-Antonia Huertas, Michael Kohlhase, and Mika Seppälä. 2008. url: http://jem-thematic.net/workshop3. See pp. 91, 92.
171
Bibliography
[Koh08b]
Michael Kohlhase. Generic Metadata Element. July 1, 2008. url: http://lists. jacobs- university.de/mailman/private/project- omdoc- dev/2008- July/ thread.html#73. See p. 55.
[Koh08c]
Michael Kohlhase. reqdoc.sty: Semantic Markup for Requirements Specification Documents. Version 0.3. June 26, 2008. url: https://svn.kwarc.info/repos/ stex/trunk/sty/reqdoc/reqdoc.pdf (visited on 2009/12/05). See p. 90.
[Koh08d]
Michael Kohlhase. “Using LATEX as a Semantic Markup Format”. In: Mathematics in Computer Science (2008), pp. 279–304. url: https://svn.kwarc.info/repos/ stex/doc/mcs08/stex.pdf. See pp. 33, 143, 148.
[Koh09a]
Michael Kohlhase. dcm.sty: An Infrastructure for marking up Dublin Core Metadata in LATEX documents. Self-documenting LATEX package. Version 0.3. 2009. url: https://svn.kwarc.info/repos/stex/trunk/sty/stex/dcm/dcm.pdf. See p. 90.
[Koh+09a]
Michael Kohlhase, Jana Giceva, Christoph Lange, and Vyacheslav Zholudev. “JOBAD – Interactive Mathematical Documents”. In: AI Mashup Challenge 2009, KI Conference. Ed. by Brigitte Endres-Niggemeyer, Valentin Zacharias, and Pascal Hitzler. 2009. url: https://svn.omdoc.org/repos/jomdoc/doc/pubs/aimashup09/jobad.pdf. See pp. 102, 116.
[Koh09b]
Michael Kohlhase. sproof.sty: Structural Markup for Proofs. Self-documenting LATEX package. Version 0.3. 2009. url: https://svn.kwarc.info/repos/stex/ trunk/sty/sproof/sproof.pdf. See p. 87.
[Koh+09b]
Michael Kohlhase, Christoph Lange, Christine Müller, Normen Müller, and Florian Rabe. Notations for Active Mathematical Documents. KWARC Report 20091. Jacobs University Bremen, 2009. url: http://kwarc.info/publications/ papers/KLMMR_NfAD.pdf. See pp. 105, 154.
[KP09]
Kjetil Kjernsmo and Alexandre Passant. SPARQL New Features and Rationale. W3C Interest Group Note. World Wide Web Consortium (W3C), 2009. url: http://www.w3.org/TR/2009/WD-sparql-features-20090702/. See p. 109.
[KR09]
Michael Kohlhase and Florian Rabe. “Semantics of OpenMath and MathML3”. In: 22nd OpenMath Workshop. Ed. by James H. Davenport. 2009. url: http://kwarc. info/kohlhase/submit/om09-semantics.pdf. See pp. 23, 35.
[KR70]
Werner Kunz and Horst W. J. Rittel. Issues as elements of information systems. Working paper 131. Institute of Urban and Regional Development, University of California, Berkeley, 1970. See pp. 74, 76, 78.
[Kut+08]
Oliver Kutz, Dominik Lücke, Till Mossakowski, and Immanuel Normann. “The OWL in the CASL – Designing Ontologies Across Logics”. In: OWL: Experiences and Directions (OWLED). Ed. by Uli Sattler, Cathy Dolbear, and Alan Ruttenberg. 2008. See p. 25.
172
Bibliography
[KWZ08]
Fairouz Kamareddine, J. B. Wells, and Christoph Zengler. “Computerising Mathematical Text with MathLang”. In: Electron. Notes Theor. Comput. Sci. 205 (2008), pp. 5–30. issn: 1571-0661. doi: http://dx.doi.org/10.1016/j.entcs.2008.03. 063. url: http://www.cedar-forest.org/forest/papers/drafts/mathlangcoq-short.pdf. See pp. xi, 31, 38.
[Lan]
Christoph Lange. OpenMath Wiki. url: http://wiki.openmath.org (visited on 2009/10/22). See pp. viii, 73, 154.
[Lan+]
Christoph Lange et al. Krextor – The KWARC RDF Extractor. url: http://kwarc. info/projects/krextor/ (visited on 2009/10/22). See p. 143.
[Lan06a]
Christoph Lange. “A Semantic Wiki for Mathematical Knowledge Management”. Diploma thesis. Universität Trier, 2006. url: http://kwarc.info/projects/ swim/pubs/swim-thesis-final.pdf. See pp. 132, 173.
[Lan06b]
Christoph Lange, ed. Wikis und Blogs – Planen, Einrichten, Verwalten. C&L Computer- und Literaturverlag, 2006. isbn: 3-936546-44-4. See p. 119.
[Lan07a]
Christoph Lange. SWiM – A Semantic Wiki for Mathematical Knowledge Management. Tech. rep. 5. Revised, updated and reviewed version of [Lan06a]. Jacobs University Bremen, 2007. url: http://kwarc.info/projects/swim/pubs/trswim.pdf. See pp. 3, 72, 116.
[Lan07b]
Christoph Lange. “Towards Scientific Collaboration in a Semantic Wiki”. In: Bridging the Gap between Semantic Web and Web 2.0 (SemNet 2007). Ed. by Andreas Hotho and Bettina Hoser. 2007. See p. 153.
[Lan08a]
Christoph Lange. “Mathematical Semantic Markup in a Wiki: The Roles of Symbols and Notations”. In: Proceedings of the 3rd Workshop on Semantic Wikis, European Semantic Web Conference 2008. Ed. by Christoph Lange, Sebastian Schaffert, Hala Skaf-Molli, and Max Völkel. Vol. 360. CEUR Workshop Proceedings. Costa Adeje, Tenerife, Spain 2008. See pp. 125, 127, 153.
[Lan+08a]
Christoph Lange, Sebastian Schaffert, Hala Skaf-Molli, and Max Völkel, eds. 3rd Workshop on Semantic Wikis. Vol. 360. CEUR Workshop Proceedings. Costa Adeje, Tenerife, Spain 2008.
[Lan08b]
Christoph Lange. “SWiM – A semantic wiki for mathematical knowledge management”. In: ESWC. Ed. by Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis. Vol. 5021. Lecture Notes in Computer Science. Springer, 2008, pp. 832–837. See pp. 112, 114, 148.
[Lan+08b]
Christoph Lange, Uldis Boj¯ars, Tudor Groza, John Breslin, and Siegfried Handschuh. “Expressing Argumentative Discussions in Social Media Sites”. In: Social Data on the Web (SDoW2008), Workshop at the 7th International Semantic Web Conference. Ed. by John Breslin, Uldis Boj¯ars, Alexandre Passant, and Sergio Fernández. CEUR Workshop Proceedings 405. 2008. url: http://ceur- ws.org/ Vol-405/paper4.pdf. See pp. 70, 75, 77, 116, 139.
173
Bibliography
[Lan09]
Christoph Lange. “Krextor – An Extensible XML→RDF Extraction Framework”. In: Scripting and Development for the Semantic Web (SFSW2009). Ed. by Chris Bizer, Sören Auer, and Gunnar Aastrand Grimnes. 2009. url: http://kwarc. info/projects/krextor/pubs/sfsw09-krextor.pdf. See p. 70.
[LGP08]
Christoph Lange and Alberto González Palomo. “Easily Editing and Browsing Complex OpenMath Markup with SWiM”. In: Mathematical User Interfaces Workshop 2008. Ed. by Paul Libbrecht. 2008. url: http : / / www . activemath . org/workshops/MathUI/08/proceedings/LangeGonzales- OMEdit.html. See pp. xiv, 116, 154.
[LHC08a]
Christoph Lange, Tuukka Hastrup, and Stéphane Corlosquet. “Arguing on Issues with Mathematical Knowledge Items in a Semantic Wiki”. In: Wissens- und Erfahrungsmanagement LWA (Lernen, Wissensentdeckung und Adaptivität) Conference Proceedings. Ed. by Joachim Baumeister and Martin Atzmüller. Vol. 448. 2008. See pp. 114, 153.
[LHC08b]
Christoph Lange, Tuukka Hastrup, and Stéphane Corlosquet. “Improving mathematical knowledge items by acting on issue-based community feedback”. In: Proceedings of the 2nd SCooP Workshop. Ed. by Christine Müller. 2008. url: http: //kwarc.info/events/scoop/scoop2.html. See p. 150.
[Li+06]
Yuan Fang Li, Jing Sun, Gillian Dobbie, Jun Sun, and Hai Wang. “Validating Semistructured Data using OWL”. In: 7th International Conference on Web-Age Information Management (WAIM’06). Vol. 4016. LNCS. Springer, 2006, pp. 520–531. See p. 92.
[Lib]
Paul Libbrecht. Collection Management in ActiveMath. presented without publication at [Car+09]. url: http : / / www . activemath . org / ~paul / copy _ left / Content-Storage-and-Patterns.pdf (visited on 2009/11/18). See pp. x, 27.
[Lib06]
Paul Libbrecht. “Authoring Tools for ActiveMath”. In: OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. Chap. 26.9. url: http://omdoc.org/omdoc1.2.pdf. See p. 84.
[Liu+04]
Shengping Liu, Jing Mei, Anbu Yue, and Zuoquan Lin. “XSDL: Making XML Semantics Explicit”. In: SWDB. Ed. by Christoph Bussler, Val Tannen, and Irini Fundulaki. Vol. 3372. 2004, pp. 64–83. isbn: 3-540-24576-6. See pp. 47, 115.
[LK08a]
Christoph Lange and Michael Kohlhase. A Mathematical Approach to Ontology Authoring and Documentation. KWARC Report 2008-3. Jacobs University Bremen, 2008. url: https : / / svn . omdoc . org / repos / omdoc / trunk / doc / blue / foaf/note.pdf. See p. 91.
[LK08b]
Christoph Lange and Michael Kohlhase. “A Semantic Wiki for Mathematical Knowledge Management”. In: Emerging Technologies for Semantic Work Environments: Techniques, Methods, and Applications. Ed. by Jörg Rech, Björn Decker, and Eric Ras. IGI Global, 2008, pp. 47–68. url: http://www.igi-global.com/ reference/details.asp?ID=7543. See p. 153.
174
Bibliography
[LK09]
Christoph Lange and Michael Kohlhase. “A Mathematical Approach to Ontology Authoring and Documentation”. In: MKM/Calculemus 2009 Proceedings. Ed. by Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. Watt. LNAI 5625. Springer Verlag, 2009, pp. 389–404. url: https://svn.omdoc.org/ repos/omdoc/trunk/doc/blue/foaf/mkm09.pdf. See pp. 70, 114, 146.
[LMR08]
Christoph Lange, Sean McLaughlin, and Florian Rabe. “Flyspeck in a Semantic Wiki – Collaborating on a Large Scale Formalization of the Kepler Conjecture”. In: Proceedings of the 3rd Workshop on Semantic Wikis, European Semantic Web Conference 2008. Ed. by Christoph Lange, Sebastian Schaffert, Hala Skaf-Molli, and Max Völkel. Vol. 360. CEUR Workshop Proceedings. Costa Adeje, Tenerife, Spain 2008. See p. 153.
[Loc]
locutor:
An Ontology-Driven Management of Change. url: http : / / locutor .
kwarc.info (visited on 2009/10/22). See p. 125.
[Man+06]
Shahid Manzoor, Paul Libbrecht, Carsten Ullrich, and Erica Melis. “Authoring Presentation for OpenMath”. In: Mathematical Knowledge Management, MKM’05. Ed. by Michael Kohlhase. LNAI 3863. Springer Verlag, 2006, pp. 33–48. See pp. x, 27–29, 89.
[Mar]
MARC code list for Relators, Sources, Description Conventions. 2003. url: http: //www.loc.gov/marc/relators (visited on 2009/10/22). See p. 57.
[Mar03]
Massimo Marchiori. “The Mathematical Semantic Web”. In: Mathematical Knowledge Management, MKM’03. Ed. by Andrea Asperti, Bruno Buchberger, and James Harold Davenport. LNCS 2594. Springer Verlag, 2003. See p. 34.
[Mas+03]
Claudio Masolo, Stefano Borgo, Aldo Gangemi, Nicola Guarino, and Alessandro Oltramari. Ontology Library. WonderWeb Deliverable 18. Laboratory for Applied Ontology – ISTC-CNR, 2003. url: http://www.loa-cnr.it/Papers/D18.pdf. See pp. 15, 48.
[Mata]
Math-Net RDF Collection. url: http://www.iwi-iuk.org/material/RDF/1.1/ (visited on 2009/12/12). See p. 45.
[Matb]
MathML Software – Editors. url: http://www.w3.org/Math/Software/mathml_ software_cat_editors.html (visited on 2009/11/11). See p. 86.
[MCR07]
Viviana Mascardi, Valentina Cordì, and Paolo Rosso. “A Comparison of Upper Ontologies”. In: WOA. Ed. by Matteo Baldoni, Antonio Boccalatte, Flavio De Paoli, Maurizio Martelli, and Viviana Mascardi. Seneca Edizioni Torino, 2007, pp. 55–64. isbn: 978-88-6122-061-4. See pp. xi, 46.
[Med]
MediaWiki. url: http://www.mediawiki.org (visited on 2009/10/22). See p. 120.
[Mel+01]
E. Melis, E. Andrés, J. Büdenbender, Adrian Frischauf, G. Goguadze, P. Libbrecht, M. Pollet, and C. Ullrich. “ActiveMath: A generic and adaptive web-base learning environment”. In: International Journal of Artificial Intelligence in Education 12.4 (2001), pp. 385–407. See p. 22.
175
Bibliography
[Mel+03]
E. Melis, J. Buedenbender E. Andres, A. Frischauf, G. Goguadse, P. Libbrecht, M. Pollet, and C. Ullrich. “Knowledge Representation and Management in ActiveMath”. In: International Journal on Artificial Intelligence and Mathematics, Special Issue on Management of Mathematical Knowledge 38.1-3 (2003), pp. 47–64. See p. 56.
[Mel+06]
Erica Melis, Giorgi Goguadze, Martin Homik, Paul Libbrecht, Carsten Ullrich, and Stefan Winterstein. “Semantic-aware components and services of ActiveMath”. In: British Journal of Educational Technology 37.3 (May 2006), pp. 405–423. See p. 93.
[MH04]
Deborah L. McGuinness and Frank van Harmelen. OWL Web Ontology Language Overview. W3C Recommendation. World Wide Web Consortium (W3C), 2004. url: http://www.w3.org/TR/2004/REC-owl-features-20040210/. See p. 49.
[Mic]
Microsoft Corporation. MSDN Library – Development Tools and Languages – Visual Studio 2008 – Visual Studio – Visual C++ – Visual C++ Reference – Visual C++ Libraries Reference – MFC – MFC Concepts – MFC COM – Active Document Containment – Active Documents. url: http://msdn.microsoft.com/enus/library/bx9c54kf.aspx (visited on 2009/10/22). See p. 93.
[Miz]
Mizar Mathematical Library. url: http://www.mizar.org/library (visited on 2009/12/02). See pp. 1, 3, 5.
[MKM09]
A Review of Mathematical Knowledge Management. “Jacques Carette and William Farmer”. In: MKM/Calculemus 2009 Proceedings. Ed. by Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. Watt. LNAI 5625. Springer Verlag, 2009. See p. 5.
[MML07]
Till Mossakowski, Christian Maeder, and Klaus Lüttich. “The Heterogeneous Tool Set”. In: Proceedings of the 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems TACAS-2007. Ed. by Orna Grumberg and Michael Huth. LNCS 4424. Berlin, Germany: Springer Verlag, 2007, pp. 519–522. See p. 25.
[Mon]
MONET. url: http://monet.nag.co.uk/mkm (visited on 2009/10/22). See pp. 45, 93.
[Mot+09]
Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, and Carsten Lutz. OWL 2 Web Ontology Language: Profiles. W3C Recommendation. World Wide Web Consortium (W3C), 2009. url: http://www.w3.org/TR/2009/ REC-owl2-profiles-20091027/. See pp. xi, 15, 43.
[MOV06]
Jonathan Marsh, David Orchard, and Daniel Veillard. XML Inclusions (XInclude) Version 1.0 (Second Edition). W3C Recommendation. World Wide Web Consortium (W3C), 2006. url: http : / / www . w3 . org / TR / 2006 / REC - xinclude 20061115/. See pp. 73, 111, 136.
[Moz]
Mozilla Labs. Ubiquity. url: http : / / ubiquity . mozilla . com (visited on 2009/10/22). See p. 94.
176
Bibliography
[MP03]
Manuel Maarek and Virgile Prevosto. “FoCDoc: The Documentation System of FoC”. In: Proceedings of the 11th Symposium on the Integration of Symbolic Computation and Mechanized Reasoning (Calculemus 2003). Ed. by Thérèse Hardin and Renaud Rioboo. Rome, Italy 2003, pp. 31–43. url: http://www.calculemus.net/ meetings/rome03/Proceedings/final.pdf. See p. 48.
[MPPS09]
Boris Motik, Bijan Parsia, and Peter F. Patel-Schneider. OWL 2 Web Ontology Language: XML Serialization. W3C Recommendation. World Wide Web Consortium (W3C), 2009. url: http : / / www . w3 . org / TR / 2009 / REC - owl2 - xml serialization-20091027/. See p. 37.
[MPSCG09]
Boris Motik, Peter F. Patel-Schneider, and Bernardo Cuenca Grau. OWL 2 Web Ontology Language: Direct Semantics. W3C Recommendation. World Wide Web Consortium (W3C), 2009. url: http : / / www . w3 . org / TR / 2009 / REC - owl2 direct-semantics-20091027/. See p. 50.
[MPSP09]
Boris Motik, Peter F. Patel-Schneider, and Bijan Parsia. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax. W3C Recommendation. World Wide Web Consortium (W3C), 2009. url: http://www.w3.org/TR/ 2009/REC-owl2-syntax-20091027/. See pp. 44, 49, 142.
[MR08]
Steve McKay and Jason Robbins. Looks Good To Me – Source Code Review Tools. July 30, 2008. url: http://googlecode.blogspot.com/2008/07/looks-goodto-me-source-code-review.html (visited on 2009/10/27). See p. 75.
[Msc]
Mathematics Subject Classification MSC2010. 2010. url: http : / / msc2010 . org (visited on 2009/11/16). See p. 20.
[MT]
William C. Mann and Maite Taboada. Rhetorical Structure Theory – Relation Definitions. url: http://www.sfu.ca/rst/01intro/definitions.html (visited on 2009/10/22). See pp. xi, 45.
[Mül05]
Normen Müller. “OMDoc-Repräsentation von Programmen und Beweisen in VeriFun”. MA thesis. Programmiermethodik, Technische Universität Darmstadt, 2005. url: http://kwarc.info/nmueller/papers/dt.pdf. See p. 25.
[Mül06]
Normen Müller. “OMDoc as a Data Format for VeriFun”. In: OMDoc – An open markup format for mathematical documents [Version 1.2]. LNAI 4180. Springer Verlag, 2006. Chap. 26.20, pp. 329–332. url: http://omdoc.org/omdoc1.2.pdf. See p. 25.
[Mül08a]
Christine Müller. The CS precourse project. 2008. url: http://cs- precourse. kwarc.info/ (visited on 2009/11/25). See p. 145.
[Mül08b]
Christine Müller. “Towards the Adaptation of Scientific Course Material powered by Community of Practice”. In: Wissens- und Erfahrungsmanagement LWA (Lernen, Wissensentdeckung und Adaptivität) Conference Proceedings. Ed. by Joachim Baumeister and Martin Atzmüller. Vol. 448. 2008. See p. 22.
[MVW05]
Jonathan Marsh, Daniel Veillard, and Norman Walsh. xml:id Version 1.0. W3C Recommendation. World Wide Web Consortium (W3C), 2005. url: http://www. w3.org/TR/2005/REC-xml-id-20050909/. See p. 10.
177
Bibliography
[Neo]
NeOn Toolkit. url: http://neon-toolkit.org (visited on 2009/10/26). See p. 122.
[Nil+08]
Mikael Nilsson, Andy Powell, Pete Johnston, and Ambjörn Naeve. Expressing Dublin Core metadata using the Resource Description Framework (RDF). DCMI Recommendation. Dublin Core Metadata Initiative, 2008. url: http:// dublincore.org/documents/2008/01/14/dc-rdf/. See p. 45.
[NPB03]
Mikael Nilsson, Matthias Palmér, and Jan Brase. “The LOM RDF binding – principles and implementation”. In: 3rd Annual Ariadne Conference, 20–21 November 2003, Katholieke Universiteit Leuven, Belgium. 2003. See p. 45.
[NR06]
Natasha Noy and Alan Rector. Defining N-ary Relations on the Semantic Web. W3C Working Group Note. World Wide Web Consortium (W3C), 2006. url: http: //www.w3.org/TR/2006/NOTE-swbp-n-aryRelations-20060412/. See p. 13.
[OE09]
Christian-Emil Ore and Øyvind Eide. “TEI and cultural heritage ontologies: Exchange of information?” In: Literary and Linguistic Computing 24.2 (2009), pp. 161–172. See pp. x, 37.
[Olv]
OpenLink Software. OpenLink Universal Integration Middleware – Virtuoso Product Family. url: http://virtuoso.openlinksw.com (visited on 2009/10/22). See p. 109.
[Ope]
OpenMath Home. url: http://www.openmath.org/ (visited on 2009/10/22). See p. 23.
[Opf]
Open Packaging Format (OPF). Recommended Specification. Version 2.0 v1.0. International Digital Publishing Forum, 2007. url: http://www.idpf.org/2007/ opf/OPF_2.0_final_spec.html. See p. 57.
[O’R05]
Tim O’Reilly. What is Web 2.0. Sept. 2005. url: http://oreilly.com/web2/ archive/what-is-web-20.html (visited on 2009/10/22). See pp. 6, 94.
[Ore+06]
Eyal Oren, Renaud Delbru, Knud Möller, Max Völkel, and Siegfried Handschuh. “Annotation and Navigation in Semantic Wikis”. In: Proceedings of the 1st Workshop on Semantic Wikis, European Semantic Web Conference 2006. Ed. by Max Völkel, Sebastian Schaffert, and Stefan Decker. Vol. 206. CEUR Workshop Proceedings. Budva, Montenegro 2006. See p. 122.
[ORS92]
S. Owre, J. M. Rushby, and N. Shankar. “PVS: A Prototype Verification System”. In: Proceedings of the 11th Conference on Automated Deduction. Ed. by D. Kapur. Vol. 607. LNCS. Saratoga Springs, NY, USA: Springer Verlag, 1992, pp. 748–752. See p. 25.
[Orw49]
George Orwell. Nineteen Eighty-Four. London: Secker & Warburg, 1949. See p. 62.
[Owl]
The OWL API. url: http://owlapi.sourceforge.net (visited on 2010/01/05). See p. 43.
[Pal+09]
Raúl Palma, Peter Haase, Oscar Corcho, and Asunción Gómez-Pérez. “Change Representation For OWL 2 Ontologies”. In: OWL: Experiences and Directions (OWLED). Ed. by Rinke Hoekstra and Peter F. Patel-Schneider. 2009. See p. 64.
178
Bibliography
[PCSF08]
C. Michael Pilato, Ben Collins-Sussman, and Brian W. Fitzpatrick. Version Control With Subversion. 2nd ed. Sebastopol, CA, USA: O’Reilly & Associates, Inc., 2008. isbn: 978-0-596-51033-6. url: http://svnbook.red-bean.com. See pp. 111, 125.
[Pes07]
Darko Pesikan. “Coping with Content Representations of Mathematics in Editor Environments: nOMDoc mode”. Bachelor’s Thesis. Computer Science, Jacobs University, Bremen, 2007. See p. 84.
[Pfe01]
Frank Pfenning. “Logical Frameworks”. In: Handbook of Automated Reasoning. Ed. by Alan Robinson and Andrei Voronkov. Vol. I and II. Elsevier Science and MIT Press, 2001. See p. 25.
[Pie+06]
Emmanuel Pietriga, Chris Bizer, David Karger, and Ryan Lee. “Fresnel – A Browser-Independent Presentation Vocabulary for RDF”. In: 5th International Semantic Web Conference. Ed. by Isabel F. Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel Schwabe, Peter Mika, Michael Uschold, and Lora Aroyo. Vol. 4273. Lecture Notes in Computer Science. Springer, 2006, pp. 158–171. isbn: 3-540-49029-9. See p. 13.
[Pr ]
panta rhei. url: http://trac.kwarc.info/panta-rhei (visited on 2009/10/22). See pp. 145, 154.
[Pro]
Protégé. url: http://protege.stanford.edu (visited on 2010/01/06). See p. 43.
[PS08]
Eric Prud’hommeaux and Andy Seaborne. SPARQL Query Language for RDF. W3C Recommendation. World Wide Web Consortium (W3C), 2008. url: http: //www.w3.org/TR/2008/REC- rdf- sparql- query- 20080115/. See pp. xiv, 59, 92, 130.
[PSM09]
Peter F. Patel-Schneider and Boris Motik. OWL 2 Web Ontology Language: Mapping to RDF Graphs. W3C Recommendation. World Wide Web Consortium (W3C), 2009. url: http : / / www . w3 . org / TR / 2009 / REC - owl2 - mapping - to rdf-20091027/. See p. 50.
[PSS03]
Peter F. Patel-Schneider and Jérôme Siméon. “The Yin/Yang Web: A Unified Model for XML Syntax and RDF Semantics”. In: IEEE Transactions on Knowledge and Data Engineering 15.4 (2003), pp. 797–812. See p. 47.
[PST04]
Helena Sofia Pinto, Steffen Staab, and Christoph Tempich. “DILIGENT: Towards a fine-grained methodology for Distributed, Loosely-controlled and evolving Engineering of oNTologies”. In: ECAI. 2004, pp. 393–397. See p. 79.
[Rab08]
Florian Rabe. “Representing Logics and Logic Translations”. PhD thesis. Jacobs University Bremen, 2008. url: http : / / kwarc . info / frabe / Research / phdthesis.pdf. See pp. 5, 16, 25, 36.
[Rab09]
Florian Rabe. “The MMT Language and System”. 2009. url: https : / / svn . kwarc.info/repos/kwarc/rabe/Scala/doc/mmt.pdf (visited on 2009/12/12). See p. 30.
[Rel]
A Schema Language for XML. url: http : / / www . relaxng . org/ (visited on 2009/10/22). See pp. 91, 150.
179
Bibliography
[Ren+02]
Allen Renear, David Dubin, C. M. Sperberg-McQueen, and Claus Huitfeld. “Towards a Semantics for XML Markup”. In: DocEng’02. ACM, 2002. See p. 45.
[RGJ05]
Gerald Reif, Harald Gall, and Mehdi Jazayeri. “WEESA: Web engineering for semantic Web applications.” In: Proceedings of the 14th WWW conference. Ed. by Allan Ellis and Tatsuya Hagino. ACM Press, 2005, pp. 722–729. isbn: 1-59593-046-9. See pp. xiv, 115.
[Ria]
RIACA OpenMath products. url: http://www.riaca.win.tue.nl/projects/ openmath/ (visited on 2009/10/22). See p. 133.
[RK08]
Florian Rabe and Michael Kohlhase. “An Exchange Format for Modular Knowledge”. In: Proceedings of the LPAR Workshops: Knowledge Exchange: Automated Provers and Proof Assistants, and The 7th International Workshop on the Implementation of Logics. Ed. by G. Sutcliffe, P. Rudnicki, R. Schmidt, B. Konev, and S. Schulz. CEUR Workshop Proceedings 418. 2008, pp. 50–68. See pp. 25, 141, 154.
[RK09]
Florian Rabe and Michael Kohlhase. “A Web-Scalable Module System for Mathematical Theories”. Manuscript, to be submitted to the Journal of Symbolic Computation. 2009. url: https://svn.kwarc.info/repos/kwarc/rabe/Papers/ omdoc-spec/paper.pdf. See p. 25.
[RW73]
Horst W. J. Rittel and Melvin M. Webber. “Dilemmas in a General Theory of Planning”. In: Policy Sciences 4.2 (June 1973), pp. 155–169. See pp. 74, 76, 144, 145.
[Sax]
SAX. url: http://www.saxproject.org (visited on 2009/12/07). See p. 12.
[SBA05]
Jörg Siekmann, Christoph Benzmüller, and Serge Autexier. “Computer Supported Mathematics with OMEGA”. Ed. by Christoph Benzmüller. In: Journal of Applied Logic, special issue on Mathematics Assistance Systems (Dec. 2005). See p. 25.
[Sch02]
Irene Schena. “Towards a Semantic Web for Formal Mathematics”. Technical Report UBLCS–2002–6. PhD thesis. University of Bologna, 2002. See p. 34.
[Sch06]
Sebastian Schaffert. “IkeWiki: A Semantic Wiki for Collaborative Knowledge Management”. In: 1st International Workshop on Semantic Technologies in Collaborative Applications STICA 06, Manchester, UK. 2006. See pp. 49, 122.
[Sch+09]
Sebastian Schaffert, Julia Eder, Szaby Grünwald, Thomas Kurz, and Mihai Radulescu. “KiWi – A Platform for Semantic Social Software (Demonstration)”. In: ESWC. Ed. by Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvönen, Riichiro Mizoguchi, Eyal Oren, Marta Sabou, and Elena Paslaru Bontas Simperl. Vol. 5554. Lecture Notes in Computer Science. Springer, 2009, pp. 888–892. See p. 122.
[Sch09]
Michael Schneider. OWL 2 Web Ontology Language: RDF-Based Semantics. W3C Recommendation. World Wide Web Consortium (W3C), 2009. url: http : / / www.w3.org/TR/2009/REC-owl2-rdf-based-semantics-20091027/. See p. 50.
[Sci]
The SCIEnce Project – Symbolic Computation Infrastructure for Europe. url: http: //www.symbolic-computation.org/ (visited on 2009/10/22). See p. 93.
180
Bibliography
[SD08]
Jonathan Stratford and James H. Davenport. “Unit Knowledge Management”. In: Intelligent Computer Mathematics, 9th International Conference, AISC 2008 15th Symposium, Calculemus 2008 7th International Conference, MKM 2008 Birmingham, UK, July 28 - August 1, 2008, Proceedings. Ed. by Serge Autexier, John Campbell, Julio Rubio, Volker Sorge, Masakazu Suzuki, and Freek Wiedijk. LNAI 5144. Springer Verlag, 2008, pp. 382–397. See pp. 106, 116.
[SDR08]
Uli Sattler, Cathy Dolbear, and Alan Ruttenberg, eds. OWL: Experiences and Directions (OWLED). 2008.
[Seb+08]
Abraham Sebastian, Natalya Fridman Noy, Tania Tudorache, and Mark A. Musen. “A Generic Ontology for Collaborative Ontology-Development Workflows”. In: EKAW. Ed. by Aldo Gangemi and Jérôme Euzenat. Lecture Notes in Computer Science 5268. Springer, 2008, pp. 318–328. isbn: 978-3-540-87695-3. See p. 76.
[SGR09]
Josef Schneeberger, Günther Görz, and Jürgen Renn. “Interview mit Jürgen Renn”. In: Künstliche Intelligenz 4 (Dec. 2009), pp. 48–53. See p. 6.
[Sio]
SIOC Types Ontology Module Namespace. url: http://rdfs.org/sioc/types (visited on 2009/10/27). See p. 77.
[Sip+03]
Sudhanshu Sipani, Kunal Verma, John A. Miller, and Boanerges Aleman-Meza. “Designing a high-performance database engine for the ‘DB4XML’ native XML database system”. In: Journal of Systems and Software 69 (2003), pp. 87–104. See p. 109.
[SS06]
Alan Sexton and Volker Sorge. “Processing Textbook-Style Matrices”. In: Mathematical Knowledge Management, MKM’05. Ed. by Michael Kohlhase. LNAI 3863. Springer Verlag, 2006, pp. 111–125. See p. 96.
[SSY94]
Geoff Sutcliffe, Christian Suttner, and Theodor Yemenis. “The TPTP Problem Library”. In: Proceedings of the 12th Conference on Automated Deduction. Ed. by Alan Bundy. LNAI 814. Nancy, France: Springer Verlag, 1994. See p. 25.
[Ste]
sTeX Emacs Mode. url: https://svn.kwarc.info/repos/stex/emacs (visited on 2009/11/10). See p. 84.
[Sto06]
Margaret-Anne Storey. “Theories, tools and research methods in program comprehension: past, present and future”. In: Software Quality 14 (2006), pp. 187–208. See p. 4.
[Str03]
Andreas Strotmann. “Content Markup Language Design Principles”. PhD thesis. Florida State University, 2003. url: http : / / www . cs . fsu . edu / research / reports/TR-030702.pdf. See p. 22.
[Str04]
Andreas Strotmann. “The Categorial Type of OpenMath Objects”. In: Mathematical Knowledge Management, MKM’04. Ed. by Andrea Asperti, Grzegorz Bancerek, and Andrej Trybulec. LNAI 3119. Springer Verlag, 2004, pp. 378–392. See pp. 23, 35.
181
Bibliography
[Str08]
Jonathan Stratford. Creating an extensible Unit Converter using OpenMath as the Representation of the Semantics of the Units. Tech. rep. 2008-02. University of Bath, 2008. url: http://www.cs.bath.ac.uk/pubdb/download.php?resID=290. See pp. 106, 107, 116.
[Svn]
Subversion. url: http://subversion.tigris.org/ (visited on 2009/10/22). See pp. 111, 125.
[SX02]
Daniel Suthers and Jun Xu. “K¯uk¯ak¯uk¯a: An Online Environment for ArtifactCentered Discourse”. In: Education Track of the Eleventh World Wide Web Conference (WWW 2002). 2002, pp. 472–480. See pp. xv, 119.
[TDO07]
Giovanni Tummarello, Renaud Delbru, and Eyal Oren. “Sindice.com: Weaving the Open Linked Data”. In: ISWC/ASWC. Ed. by Karl Aberer, Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux. Vol. 4825. Lecture Notes in Computer Science. Springer, 2007, pp. 552–565. isbn: 978-3-540-76297-3. See pp. 59, 148, 152.
[Tem+05]
Christoph Tempich, H. Sofia Pinto, York Sure, and Steffen Staab. “An Argumentation Ontology for DIstributed, Loosely-controlled and evolvInG Engineering processes of oNTologies (DILIGENT)”. In: ESWC. Ed. by Asunción Gómez-Pérez and Jérôme Euzenat. Vol. 3532. Lecture Notes in Computer Science. Springer, 2005, pp. 241–256. isbn: 3-540-26124-9. See pp. 76, 122, 144.
[Tem+07]
Christoph Tempich, Elena Simperl, Markus Luczak, Rudi Studer, and H. Sofia Pinto. “Argumentation-Based Ontology Engineering”. In: IEEE Intelligent Systems 22.6 (2007), pp. 52–59. issn: 1541-1672. See pp. 76, 122.
[The02]
The W3C HTML Working Group. XHTML 1.0 The Extensible HyperText Markup Language (Second Edition) – A Reformulation of HTML 4 in XML 1.0. W3C Recommendation. World Wide Web Consortium (W3C), 2002. url: http://www. w3.org/TR/2002/REC-xhtml1-20020801. See p. 11.
[Til06]
Paul van Tilburg. Exploring the Core of MathLang. Internship Report. ULTRA group, School of Mathematics and Computer Science, Heriot-Watt University, Edinburgh, 2006. url: http://paul.luon.net/writings/reports/HWU- MACSMathLang.pdf. See p. 56.
[Tin]
TinyMCE – JavaScript WYSIWYG editor. url: http://tinymce.moxiecode.com/ (visited on 2009/11/10). See pp. 86, 87.
[TL09]
Vladimir Tomberg and Mart Laanpere. “RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments”. In: 2nd International Workshop on Mashup Personal Learning Environments (MUPPLE09). Ed. by Fridolin Wild, Marco Kalz, Matthias Palmér, and Daniel Müller. Vol. 506. CEUR Workshop Proceedings. 2009. url: http://CEURWS.org/Vol-506/tomberg.pdf. See pp. x, 35.
182
Bibliography
[TN07]
Tania Tudorache and Natasha Noy. “Collaborative Protégé”. In: Proceedings of the 16th WWW conference. Ed. by Carey L. Williamson, Mary Ellen Zurko, Peter F. Patel-Schneider, and Prashant J. Shenoy. ACM Press, 2007. isbn: 978-1-59593-6547. See p. 79.
[Tnt]
TNTBase. url: https://trac.mathweb.org/tntbase/ (visited on 2009/10/22). See pp. 109, 111, 125, 148.
[Traa]
The Trac Project. url: http://trac.edgewall.org/ (visited on 2009/10/22). See p. 135.
[Trab]
Trac and Subversion. url: http://trac.edgewall.org/wiki/TracSubversion (visited on 2009/10/27). See pp. 75, 137.
[Tra+09]
Ha Manh Tran, Christoph Lange, Georgi Chulkov, Jürgen Schönwälder, and Michael Kohlhase. “Applying Semantic Techniques to Search and Analyze Bug Tracking Data”. In: Journal of Network and Systems Management (Special Issue on Ontologies for Network and Service Management 17.3 (2009), pp. 285–308. See p. 75.
[Trz95]
Jerzy Trzeciak. Writing Mathematical Papers in English. a practical guide. Gdańskie Wydawnictwo Oświatowe, 1995. See p. 2.
[Tud+08]
Tania Tudorache, Natalya F. Noy, Samson Tu, and Mark A. Musen. “Supporting Collaborative Ontology Development in Protégé”. In: The Semantic Web – ISWC 2008, 7th International Semantic Web Conference, Proceedings. Ed. by Amit P. Sheth, Steffen Staab, Mike Dean, Massimo Paolucci, Diana Maynard, Timothy W. Finin, and Krishnaprasad Thirunarayan. Vol. 5318. LNCS. Springer, 2008. See p. 76.
[TVN08]
Tania Tudorache, Jennifer Vendetti, and Natalya Noy. “Web-Protégé: A Lightweight OWL Ontology Editor for the Web”. In: OWL: Experiences and Directions (OWLED). Ed. by Uli Sattler, Cathy Dolbear, and Alan Ruttenberg. 2008. See p. 76.
[Ull08]
Carsten Ullrich. Pedagogically Founded Courseware Generation for Web-Based Learning. LNCS. Springer, 2008. isbn: 978-3-540-88213-8. url: http : / / www . springerlink.com/content/k604618p5351/. See pp. 22, 93.
[Uni]
Unicode. Version 5.2.0. Unicode, Inc. 2009. url: http : / / www . unicode . org / versions/Unicode5.2.0/. See p. 9.
[VMS99]
A. Marie Vans, Anneliese von Mayrhauser, and Gabriel Somlo. “Program understanding behavior during corrective maintenance of large-scale software”. In: International Journal of Human-Computer Studies 51 (1999), pp. 31–70. See p. 4.
[Völ+06]
Max Völkel, Markus Krötzsch, Denny Vrandečić, Heiko Haller, and Rudi Studer. “Semantic Wikipedia”. In: Proceedings of the 15th WWW conference. ACM Press, 2006. url: http : / / www . aifb . uni - karlsruhe . de / WBS / hha / papers / SemanticWikipedia.pdf. See p. 122.
[W3Ca]
Document Object Model DOM. url: http : / / www . w3 . org / DOM/ (visited on 2009/12/07). See p. 12.
183
Bibliography
[W3Cb]
World Wide Web Consortium, ed. Cascading Style Sheets. url: http://www.w3. org/Style/CSS/ (visited on 2009/10/22). See p. 11.
[W3Cc]
World Wide Web Consortium, ed. Resource Description Framework (RDF). url: http://www.w3.org/RDF/ (visited on 2009/10/22). See pp. 12, 122.
[Wei]
Eric W. Weisstein, ed. Wolfram MathWorld. the web’s most extensive mathematics resource. Wolfram Research. url: http://mathworld.wolfram.com (visited on 2009/12/02). See p. 3.
[Wen+09]
Makarius Wenzel, Clemens Ballarin, Stefan Berghofer, Timothy Bourke, Lucas Dixon, Florian Haftmann, Gerwin Klein, Alexander Krauss, Tobias Nipkow, David von Oheimb, Larry Paulson, and Sebastian Skalberg. The Isabelle/Isar Reference Manual. Ed. by Makarius Wenzel. 2009. url: http://isabelle.in.tum. de/doc/isar-ref.pdf (visited on 2009/11/11). See p. 29.
[Wie]
Jan Wielemaker. SWI-Prolog 5.6.60 Reference Manual. url: http : / / gollem . science.uva.nl/SWI-Prolog/Manual/index.html (visited on 2009/11/11). See p. 29.
[Wik]
Wikimedia Foundation, ed. Wikipedia, the free encyclopedia. url: http://www. wikipedia.org. See pp. 1, 119.
[Wik09a]
Wikimedia Foundation, ed. Help: Edit summary (from Wikipedia, the free encyclopedia). Nov. 2, 2009. url: http://en.wikipedia.org/w/index.php?title= Help:Edit_summary&oldid=323403778. See p. 120.
[Wik09b]
Wikimedia Foundation, ed. Knowledge management (from Wikipedia, the free encyclopedia). Dec. 2, 2009. url: http://en.wikipedia.org/w/index.php?title= Knowledge_management&oldid=329227520. See p. 4.
[Wik09c]
Wikimedia Foundation, ed. Portal:Mathematics (from Wikipedia, the free encyclopedia). Dec. 2, 2009. url: http://en.wikipedia.org/w/index.php?title= Portal:Mathematics&oldid=329137789. See p. 3.
[Wik09d]
Wikimedia Foundation, ed. Pythagorean theorem (from Wikipedia, the free encyclopedia). Nov. 29, 2009. url: http://en.wikipedia.org/w/index.php?title= Pythagorean_theorem&oldid=328597679. See p. 4.
[Wik09e]
Wikimedia Foundation, ed. Wikipedia: Neutral point of view (from Wikipedia, the free encyclopedia). Oct. 28, 2009. url: http://en.wikipedia.org/w/index.php? title=Wikipedia:Neutral_point_of_view&oldid=322591480. See p. 120.
[Wik09f]
Wikimedia Foundation, ed. Wikipedia: No original research (from Wikipedia, the free encyclopedia). Nov. 1, 2009. url: http://en.wikipedia.org/w/index.php? title=Wikipedia:No_original_research&oldid=323200797. See p. 120.
[Wik09g]
Wikimedia Foundation, ed. Wikipedia: Talk page (from Wikipedia, the free encyclopedia). Nov. 2, 2009. url: http://en.wikipedia.org/w/index.php?title= Wikipedia:Talk_page&oldid=323514011. See p. 120.
184
Bibliography
[Wik09h]
Wikimedia Foundation, ed. Wikipedia: Template messages/Cleanup (from Wikipedia, the free encyclopedia). Nov. 1, 2009. url: http : / / en . wikipedia . org/w/index.php?title=Wikipedia:Template _ messages/Cleanup&oldid= 323282474. See p. 120.
[Wir]
WIRIS Editor – a tool for graphical edition of mathematical formulas. url: http: //www.wiris.com/content/view/20/ (visited on 2009/11/10). See p. 86.
[WM09]
Claudia Wagner and Enrico Motta. “Data Republishing on the Social Semantic Web”. In: Proceedings of the 1st Workshop on Trust and Privacy on the Social and Semantic Web (SPOT2009). Vol. 447. CEUR Workshop Proceedings. 2009. url: http://CEUR-WS.org/Vol-447/. See p. 86.
[WS03]
Christoph Walther and Stephan Schweitzer. “About Verifun”. In: Proceedings of the 19th International Conference on Automated Deduction (CADE-19). Ed. by Franz Baader. Springer-Verlag, LNCS 2741, 2003. See p. 25.
[Wsm]
Web Service Modeling Ontology. url: http : / / www . wsmo . org (visited on 2009/11/25). See p. 15.
[Yah]
Yahoo! Pipes. url: http://pipes.yahoo.com (visited on 2009/10/22). See p. 94.
[ZK09]
Vyacheslav Zholudev and Michael Kohlhase. “TNTBase: a Versioned Storage for XML”. In: Proceedings of Balisage: The Markup Conference 2009. Vol. 3. Balisage Series on Markup Technologies. Mulberry Technologies, Inc., 2009. doi: 10.424 2/BalisageVol3.Zholudev01. url: http://www.balisage.net/Proceedings/ vol3/html/Zholudev01/BalisageVol3-Zholudev01.html. See pp. 102, 109, 111, 125.
185