econtrib: contributed to eresp: responsible for. [GtC]: giga-tons of carbon. [y]: years ..... Interactive tutoring engine. Plugin to BlackBoard. WebCT, BlackBoard, etc.
Knowledge Modeling&Management in Science and Technology
Andreas Strotmann University of Alberta School of Business
Outline Afternoon: What is it good for? Knowledge Management Universal access to knowledge Formal knowledge processing Formal knowledge creation
Knowledge Modeling in Science and Technology Mathematics: MathML and OpenMath
Mathematical Knowledge Management Open Problems and Research Directions
Morning: How and why does it work?
Knowledge Management Map Knowledge Modeling in relation to other areas of Knowledge Management
Information & Knowledge Information Modeling
Knowledge Modeling
Information Retrieval
Knowledge Discovery
Information Management
Knowledge Management
Information & Knowledge Data
What's the difference? a stream of bits
Information structured data
Knowledge meaningful information
Wisdom thorough understanding
low / high levels of abstraction of cumulation of reasoning of understanding
distance from ”real world”, ”direct observation”
Information, Knowledge & Language Data
Phonetics stream of ...
Information structured ...
Knowledge meaningful ...
Wisdom thorough ... thoughtful ...
... phonemes
Syntax ... words & phrases
Semantics ... communication
Cognition ... comprehension ... expression
Digital Libraries for Science & Technology Science & Technology Content today Literature (meta data) Data (curation) Information (indexing & retrieval) Text, images, graphs, formulas Data – measurements, audio, video Meta data
Categorization subject headings (e.g. MeSH) authority files (e.g. drug names)
Digital Sci/Tech Libraries and Knowledge Management Knowledge = Meaningful communication
Meta data --> Ontologies well-defined semantics of meta data structure enables formal reasoning ... and thus cross-walks etc.
”The Semantic Web” Web of computer understandable meta data ==> computer-processable/communicable information
Digital Sci/Tech Libraries and Knowledge Management (ctd) knowledge = Meaningful communication
Content --> Knowledge Representation Web of computer understandable documents ==> computer processable and communicable knowledge Content Markup, Knowledge Modeling
Ontologies + Content Markup ==> universally accessible knowledge This is Today's Theme
An Example “We estimate that the World Bank signed new contracts since the Rio Summit to fund projects that, together and over their lifetime, will contribute to the emissions of at least 9.8 billion tons of carbon.” SEEN,ITIS study ‘97
The World Bank signed new contracts since the Rio Summit to fund projects that make it responsible for emissions from these projects of roughly 1.4 billion tons of carbon over a period of 25 years. WB Carbon Backcasting Study ‘97
Verification of Claims Claimed numbers are WB:critics ~ 1:6 difference too large to be random error
Verification needed full justification of both results available in respective documents inference trails can be analyzed and verified
Computer support would help!
The Role of the “Semantic Web” The World Bank signed new contracts since the Rio Summit to fund projects that make it responsible for emissions from these projects of roughly 1.4 billion tons of carbon over a period of 25 years. WB Carbon Backcasting Study ‘97
Standard names entities: The World Bank, The Rio Summit, carbon classes: projects, contracts, emissions, pollutants properties, relations: signed, responsible
Standard inference and combination rules to capture meanings of names e.g. date of Rio
The Role of Content Markup “We estimate that the World Bank signed new contracts since the Rio Summit to fund projects that, together and over their lifetime, will contribute to the emissions of at least 9.8 billion tons of carbon.” SEEN,ITIS study ‘97
p: Project t: time, ts: signed, t0: started e: emissions econtrib: contributed to eresp: responsible for
[GtC]: giga-tons of carbon [y]: years
Why Content Markup? Software exists that can check formal chains of arguments perform mathematical simplifications interactively or automatically
Formal content can feed into such SW as a “formal abstract”: discovery of discrepancies as part of the document: detailed tracking of discrepancies to their source(s)
Why Content Markup? (ctd.) More than just Ontologies encode complex logical structure of document text ontologies add underlying definitions and general relations between basic concepts in text
Universal accessibility Note the way that the mathematical formulas capture the essence of the English sentences in our example above The formulas can in turn be translated into other languages, cultures, modes for
Complementary Roles “Semantic Web” formalizes vocabulary (“ontologies”) captures relationships between entities
Content Markup captures complex formal expressions basis of formal verification processes current focus on mathematics and proofs but extensible to much of science and technology
structurally related to natural language
What is Content Markup? “The intent of the content markup in the Mathematical Markup Language is to provide an explicit encoding of the underlying mathematical structure of an expression, rather than any particular rendering for the expression.” MathML 2.0 Recommendation more generally: underlying semantic structure
Content vs. Presentation Markup Content Markup conceptual structure HTML: header, title MathML-Content: application, abstraction, operators, arguments
captures notions useful for processing universal
Presentation Markup layout structure HTML: font, image MathML-Presentation: nested 2D boxes, rows, fences, glyphs, notational patterns
captures notations useful for rendering locale-specific
A MathML Example: “ ex ” Content Markup x “the base of the natural logarithm to the power of identifier `lowercase x’ ”
Presentation Markup e x “ lowercase e rendered as identifier superscripted with lowercase x rendered as an identifier ”
MathML Coverage MathML-Content “most” of K-12 and first-year college mathematics advanced mathematics requires external extensions ...
MathMLPresentation “all” of mathematics notations covered based on Unicode for extensive character set requirements no extension mechanism
MathML-Content Extensions Ability to refer to externally declared ”content symbols” Collections of symbol declarations for additional mathematical concepts OpenMath ”content dictionaries” (CDs) a form of ontologies? not quite as formally specified perhaps
”CDs” for extra-mathematical concepts physical constants ... science/engineering... linguistic concepts for NLG
Mathematical Knowledge Management Mathematical models core to science / tech. Abstract nature of mathematics makes concept of ”knowledge” easier in this field no ”grounding” in outside reality via experiments clear definitions of concepts have matured
Powerful knowledge processing software exists Computer Algebra: general-purpose maths for engineering and science Automated Theorem Proving
Science/Tech Knowledge Management Formal models a fundamental part of sci/tech evaluation of fit to measured data may not always map easily to mathematics
==> MKM a fundamental ingredient in Science and Technology Knowledge Management but there's a lot more that needs to be added, of course
Extension to the sciences and technology is possible, but not easy First successes are in maths
Applications of Content Markup in Knowledge Management NIST Digital Library of Mathematical Functions http://dlmf.nist.gov/
Multi-lingual digital library of teaching materials for distance education in college mathematics http://webalt.math.helsinki.fi/
Digital Library of Math Functions to serve as new edition of Abramowitz and Stegun (1964). Handbook of Mathematical Functions. Gaithersburg, MD: National Bureau of Standards. most highly cited among math. handbooks contains valuable formulas for use in science and engineering (esp. special functions)
... with much value added software computer-understandable formulas and proofs
DLMF.NIST.gov Project of National Institute for Standards in Technology (NIST) since ~ 1994 still under intense development software basis (Bruce Miller) editorial board writers' collective
Mock-up of one chapter available (Gamma Function)
DLMF Technology Goal: capture formal knowledge on special functions in mathematics Knowledge to be created by domain experts Knowledge to be formalized by a mix of automatic tools and hand-tuning by knowledge editors
Knowledge capture technology Domain experts write LaTeX code ... enhanced with LaTeX macros ... designed to ease semi-automatic translation to disambiguated Content Markup (e.g., OpenMath)
DLMF Conceptual Structure Authors write as naturally as possible Editing tools ease semantic interpretation
Content of Digital Library stored largely in Content Markup form allows creation of a wide range of output formats from single source print web CD
made available for download into math software significant added value
Multilingual Delivery of Online Math Problem description Mathematics learning is difficult ... and even more so in a foreign language In many countries, secondary and/or tertiary level teaching available only in a foreign language In few countries is teaching available in all mother tongues
Mathematics is the foundation of science and technology (and business...) crucial to teach it widely ... and to teach it well
Why:
Crossing Language Barriers Multilingual Societies European Union, Canada, most(!) countries of the world e.g. Malta, a tiny island state in the Mediterranean: three languages – Maltese, Italian, English
Facilitate integration of learning across sub-cultures Facilitate learning for minorities counter loss of languages and cultures
Mathematics as core of science and engineering teaching
Facilitating Minority Mathematics Learning Learning math (and science) is hard Learning it in a language you haven’t mastered yet is harder still
Practice makes perfect Automated tutoring, e.g. using Maple T.A.
Practicing in your own language helps Problems assigned in teacher’s language Problems worked in student’s language Problems graded Automatically in student’s language By teacher in teacher’s language
Translating Mathematics Translation (e.g. from English) automatically (Google/Babelfish): insufficient quality translation errors will affect correctness!
manually expensive limits automatic variability (necessary for online tutoring)
Are mathematical formulas universal? not quite... (gcd/ggT/mcd; tan/tg...) non-latin scripts complicate matters even more
Mathematics is not just formula, but also
Culture and Language in Math Choice of math notation depends on Culture, History,Scripts „ctg“ vs „cotan“ vs „cot“; ]a,b[ vs (a,b); ... „12“ vs „+=„ Language „gcd“ vs „ggT“ vs „mcd“ vs „M.C.D.“... Mathematical sophistication „A x B“ vs „AB“; „Va:P“ vs. „there is
Field of science „i“ vs „j“ Typography „10 x 20“ vs „ab“; „sin x“ vs „f(x)“
Individual style ∃ vs V
Formal vs. Informal „Va:P“ vs. „there is an a such that P“ „12“ vs „twelve“
Visual vs. Aural rendering
syt gcd
ggT ]a,b[ mcd (a,b)
State of the Art Presentation encoding of math (LaTeX...) Explicit choices by author Impossible to adjust to language/writing system/preferred notational variation E.g. variable names (e.g. x,a,f) in Arabic?
Text fragments in several languages Hand-translated, small number of languages Formulas usually do not change with text
The WebALT Approach Content Markup for „mathematical vernacular“ Simple „natural“ language text Represent „natural language“ part of exercise as in „existential“ example using Content Markup „Render“ to different natural languages using natural language generation technology Mathematical formulae Content-to-presentation stylesheets Language and context specific rendering
e.g. Exercise problems for undergrad math Make approach feasible quickly
WebALT Demonstration (1) Multilingual mathematics tutoring example the student's perspective ../diglib%20workshop%202007/mapleta_en&fr.swf webaltplayer.swf (several languages)
the author's perspective digital library interface (Maple TA) ../diglib%2520workshop%25202007/Automatic%2520M ultilingual%2520Exercise.swf Language independent Text/Math Editor TextMathEditor.swf
Example Example „Solve 2=x2“ Text and embedded formula
Store as content markup, e.g. MathMLContent x x 2 2
Example... Natural Language Generation to different languages Matrix sentence & embedded formula „Please solve the equation x2 = 2 for x.“ „Welchen Wert hat x, wenn x2 = 2?“ ...
Embedded formula rendered depending on the same language context E.g. the greatest common denominator as gcd (English), ggT (German), syt (Finnish), M.C.D. (Italian)
Multilingual Math
Universal Delivery of Mathematical Content How it works:
Create language-independent content Well-defined meaning (semantics)
Deliver localized to any language Natural language generation engine Extensible to any language Utilizes similarities between languages
Technical/math vocabulary per language
Mathematics: a universal language Due to its abstract and exact nature, one can expect to be able to obtain verbalizations of math in natural language, without loss of information, provided one generates them from rich mathematical
EN FI
CN math FR
DE SV
linalg1:determinant(linalg2:matrix( linalg2:matrixrow(a,b), linalg2:matrixrow(c,d))) Find the determinant of the matrix
Etsi matriisin
a b . c d
a b c
determinantti. d
Encontra el determinante de la matriz Finn determinanten av matrisen
a b . c d
a b . c d
Natural Language Generation for Content Markup (Real Example) attrib([nlg:mood nlg:imperative, nlg:tense nlg:present, nlg:directive nlg:determine], plangeo1:are_on_line(A,B,C))
Determine if A, B and C are collinear. Määritä ovatko A, B ja C suoralla. Determina si A, B y C son colineales. Déterminer si A, B et C sont sur une droite. Determina se A, B e C sono su una linea. Bestäm om A, B och C är på en linje.
Note the linguistic differences: Imperative vs. Infinitive Adjectives vs. Adverbial phrases
WebALT Demo (2)
TME%20theorem%20and%20XML%20source.swf
The WebALT Project eContent Project 2005 – 2006 University of Helsinki Technical U of Catalonia, Barcelona Technical U Eindhoven U of Cologne Maths for More EPF Lausanne
WebALT Project European eContent project ”WebALT” 2 years (2005/2006) + run-up + wind-down ”Web Advanced Learning Technology” total funding ~ 2.4Mio Euros product development WebALT.com founded as spin-off
mathematicians, computer scientists, linguists Helsinki, Köln, Barcelona, Eindhoven universities very successful!
WebALT Digital Library Goals Digital Library mathematics teaching materials mathematics tutoring service automatic problem generator automatic grading for immediate feedback
multilingual delivery pan-European world-wide automatic guaranteed quality
language independent storage
On the Shoulders of Giants MathML-Content / OpenMath Semantically rich mathematics on the Web
Grammatical Framework (GF) Multilingual natural language generation (A. Ranta) “Resource grammars” for several langs/lang groups
LOM Packaging content as learning objects
Maple T.A. Interactive tutoring engine Plugin to BlackBoard
WebCT, BlackBoard, etc.
The WebALT Ingredients The project developed: Methods to deal with multilingual math Editor to create language independent mathematical content Metadata for mathematical content Metadata editor WebALT E-Repository WALTER WebALT MapleTA System Sample content for on-line courses
Digital Libraries Issues Addressed Knowledge re-use mathematics exercises naturally re-usable – just change a few parameters requires intelligent automatic feedback
Knowledge accessibility multilingual multicultural for the blind at different levels of expertise ...
References http://webalt.math.helsinki.fi/content/results/docs/index_eng.html Final Report; How-to Guide for Creating Multilingual Mathematical Content WebALT! Deliver Mathematics Everywhere. O. Caprotti. Proceedings of SITE 2006. Multilingual content development for eLearning in Africa. W. Ng'ang'a. eLearning Africa: 1st PanAfrican Conference on ICT for Development, Education and Training. 24-26 May 2006 Using web-based assignments in Secondary School Probability. M.-L. Viljanen. 3rd International Conference on the Teaching of Mathematics at the Undergraduate Level. July 2006. Multilingual technology for teaching mathematics. O. Caprotti, W. Ng'ang'a, M. Seppälä. Proceedings of the International Conference on Engineering Education, Instructional Technology, Assessment, and E-learning (EIAE 05) Web Advanced Learning Technologies for Multilingual Mathematics Teaching Support. A. Strotmann, M. Seppälä. ELPUB2005. Multilingual Access to Mathematical Exercise Problems. A. Strotmann, W. Ng'ang'a, O. Caprotti. IAMC Workshop. ISSAC 2005 . Web Advanced learning Technologies for Assessment in Mathematics. O. Caprotti, L. Carlson, M. Seppälä, A. Strotmann. ICMCT 2005. Course Content Dictionary for sharing online educational material. J. Karhima, J. Nurmonen, M. Pauna. Submitted for publication to the CAA Series. State of the art in mathematical e-learning. WebALT Consortium. WebALT Deliverable D1.1. Study of the state of the art in multilingual and multicultural creation of digital mathematical content. L. Carlson, J. Saludes, A. Strotmann. WebALT Deliverable D1.2
Where to get it:
Online Demos Online demos available WebALT portal at webalt.math.helsinki.fi:8085/portal/portal/default/Home
Webalt.math.helsinki.fi -> Results -> NL Generator Interactive demo and web service
Oy WebALT Inc. Privately held company Continues the development of the WebALT System beyond the termination of the project Publishes Premium On-Line Content Offers WebALT MapleTA hosting with premium multilingual content
Why it works:
Universal Math & Language Problem: Automatic translation is hard Preserving meaning requires recognition of meaning Non-negotiable requirement for math & science teaching Extremely hard computationally
Solution: Universal Grammar All human languages are equivalent Generated natural language equally good for all langs (Automatic distant-language translations deteriorate)
Semantic math follows roughly equivalent rules Store meaning of content as math + ling. markers Recognition problem circumvented
Natural language generation from semantic math to local language always possible (and relatively easy) Compositional: preserve meaning & proper grammar
Content Markup Language Design The “Linguistics Parallel” Approach to Content Markup Language Design The Compositionality Principle …applied to Content Markup Languages
The Compositionality Principle “The meaning of a compound expression is a function of the meaning of its parts and the syntactic rule by which they are combined.” Barbara Partee quoted by Theo Janssen in “Handbook of Logic and Language” (1997)
Long history in philosophy of language Frege - Tarski - Montague - Partee ...
Content markup language design principle OpenMath since 1995, MathML since 1997
Rule-by-rule Semantics Given syntactic rule: a, b well-formed formulas of categories A,B => c = Fi(a,b) wff of category C
Corresponding semantic rule: a, b interpreted as a’, b’ => c interpreted as c’ = Gk(a’,b’)
Fi syntactic, Gk semantic operations B.Partee: Montague grammar. Hb. Logic & Language
Scalability Small number of fixed syntactic rules => small fixed number of semantic rules => “Categorial grammar” = skeleton semantics for syntactic rules
Special semantic rule => special syntactic rule
Example: Integration Semantic “parts”? Integration operator e(p,t) function “emissions” applied to variables p,t
integration variable “t” interval of integration (ts(p),…) constructor “interval”, expression “ts(p)”, constant “infinity”
Structure of Example Basic structuring ingredients variables, constants (numbers, operators) application: e(p,t), interval constructor, integral operator variable binding
These are basic because each category requires special semantics => each requires special syntax!
Example Syntax: MathML-Content t … … econt p t
Example as an OpenMath Object application( symbol(integral), binding(symbol(lambda), variable(t), application(symbol(emissions), variable(p), variable(t)), application(symbol(lifetime), variable(p)))
integral([lambda t. e(p,t)], lifetime(p))
Special Variable Binding Syntax Unusual in symbolic computing usually block and reassign interpretation of operators like integration instead does not scale (open-ended class of such operators) cannot work if static semantics available only (as in communication between systems)
Major innovation in Content Markup compositionality principle demands it
Designing Content Markup Compositionality Principle as a Content Markup Language Design Principle “Categorial” (skeleton) semantics as a Content Markup Language Design Tool … both have been used successfully to find bugs in several content languages design corrected alternatives
The Linguistics Parallel Ansatz … both derived from Formal Semantics field of Linguistics e.g. Handbook of Logic and Language, vanBenthem and terMeulen, eds.
Human Language and Content Markup solve similar problem: communicating “meaning” among independent agents therefore need to be based on similar design principles
Linguistics Parallel Informs surprisingly concrete design decisions for content markup, e.g. language layers (morpho-) syntax, categorial/ lexical semantics
syntactic structure ((head arguments) modifiers) binding, common substructure elimination
semantics skeleton semantics of syntactic constructors
Linguistic Parallel Concrete lessons have impacted current Content Markup language designs and yet, linguistic parallel as an approach to studying their underlying principles has had little support. Conjecture: understanding content markup requires the linguistics parallel.
The Cognitivist Approach The linguistics parallel is a special case of the cognitivist approach: Study human cognitive structures to design computing and networking systems
Conjecture: Content Markup helps capture users’ conceptual models ... hence helps design and build more usable information systems
Conclusions Content Markup Languages enable wide range of interesting applications bleeding-edge science through low-end teaching
improvements in usability of (and access to) information systems
Cognitive perspective helps improve design of Content Markup Languages