Knowledge Modeling&Management in Science and

0 downloads 0 Views 798KB Size Report
econtrib: contributed to eresp: responsible for. [GtC]: giga-tons of carbon. [y]: years ..... Interactive tutoring engine. Plugin to BlackBoard. WebCT, BlackBoard, etc.
Knowledge Modeling&Management in Science and Technology

Andreas Strotmann University of Alberta School of Business

Outline Afternoon: What is it good for? Knowledge Management Universal access to knowledge Formal knowledge processing Formal knowledge creation

Knowledge Modeling in Science and Technology Mathematics: MathML and OpenMath

Mathematical Knowledge Management Open Problems and Research Directions

Morning: How and why does it work?

Knowledge Management Map Knowledge Modeling in relation to other areas of Knowledge Management

Information & Knowledge Information Modeling

Knowledge Modeling

Information Retrieval

Knowledge Discovery

Information Management

Knowledge Management

Information & Knowledge Data

What's the difference? a stream of bits

Information structured data

Knowledge meaningful information

Wisdom thorough understanding

low / high levels of abstraction of cumulation of reasoning of understanding

distance from ”real world”, ”direct observation”

Information, Knowledge & Language Data

Phonetics stream of ...

Information structured ...

Knowledge meaningful ...

Wisdom thorough ... thoughtful ...

... phonemes

Syntax ... words & phrases

Semantics ... communication

Cognition ... comprehension ... expression

Digital Libraries for Science & Technology Science & Technology Content today Literature (meta data) Data (curation) Information (indexing & retrieval) Text, images, graphs, formulas Data – measurements, audio, video Meta data

Categorization subject headings (e.g. MeSH) authority files (e.g. drug names)

Digital Sci/Tech Libraries and Knowledge Management Knowledge = Meaningful communication

Meta data --> Ontologies well-defined semantics of meta data structure enables formal reasoning ... and thus cross-walks etc.

”The Semantic Web” Web of computer understandable meta data ==> computer-processable/communicable information

Digital Sci/Tech Libraries and Knowledge Management (ctd) knowledge = Meaningful communication

Content --> Knowledge Representation Web of computer understandable documents ==> computer processable and communicable knowledge Content Markup, Knowledge Modeling

Ontologies + Content Markup ==> universally accessible knowledge This is Today's Theme

An Example “We estimate that the World Bank signed new contracts since the Rio Summit to fund projects that, together and over their lifetime, will contribute to the emissions of at least 9.8 billion tons of carbon.” SEEN,ITIS study ‘97

The World Bank signed new contracts since the Rio Summit to fund projects that make it responsible for emissions from these projects of roughly 1.4 billion tons of carbon over a period of 25 years. WB Carbon Backcasting Study ‘97

Verification of Claims Claimed numbers are WB:critics ~ 1:6 difference too large to be random error

Verification needed full justification of both results available in respective documents inference trails can be analyzed and verified

Computer support would help!

The Role of the “Semantic Web” The World Bank signed new contracts since the Rio Summit to fund projects that make it responsible for emissions from these projects of roughly 1.4 billion tons of carbon over a period of 25 years. WB Carbon Backcasting Study ‘97

Standard names entities: The World Bank, The Rio Summit, carbon classes: projects, contracts, emissions, pollutants properties, relations: signed, responsible

Standard inference and combination rules to capture meanings of names e.g. date of Rio

The Role of Content Markup “We estimate that the World Bank signed new contracts since the Rio Summit to fund projects that, together and over their lifetime, will contribute to the emissions of at least 9.8 billion tons of carbon.” SEEN,ITIS study ‘97

p: Project t: time, ts: signed, t0: started e: emissions econtrib: contributed to eresp: responsible for

[GtC]: giga-tons of carbon [y]: years

Why Content Markup? Software exists that can check formal chains of arguments perform mathematical simplifications interactively or automatically

Formal content can feed into such SW as a “formal abstract”: discovery of discrepancies as part of the document: detailed tracking of discrepancies to their source(s)

Why Content Markup? (ctd.) More than just Ontologies encode complex logical structure of document text ontologies add underlying definitions and general relations between basic concepts in text

Universal accessibility Note the way that the mathematical formulas capture the essence of the English sentences in our example above The formulas can in turn be translated into other languages, cultures, modes for

Complementary Roles “Semantic Web” formalizes vocabulary (“ontologies”) captures relationships between entities

Content Markup captures complex formal expressions basis of formal verification processes current focus on mathematics and proofs but extensible to much of science and technology

structurally related to natural language

What is Content Markup? “The intent of the content markup in the Mathematical Markup Language is to provide an explicit encoding of the underlying mathematical structure of an expression, rather than any particular rendering for the expression.” MathML 2.0 Recommendation more generally: underlying semantic structure

Content vs. Presentation Markup Content Markup conceptual structure HTML: header, title MathML-Content: application, abstraction, operators, arguments

captures notions useful for processing universal

Presentation Markup layout structure HTML: font, image MathML-Presentation: nested 2D boxes, rows, fences, glyphs, notational patterns

captures notations useful for rendering locale-specific

A MathML Example: “ ex ” Content Markup x “the base of the natural logarithm to the power of identifier `lowercase x’ ”

Presentation Markup e x “ lowercase e rendered as identifier superscripted with lowercase x rendered as an identifier ”

MathML Coverage MathML-Content “most” of K-12 and first-year college mathematics advanced mathematics requires external extensions ...

MathMLPresentation “all” of mathematics notations covered based on Unicode for extensive character set requirements no extension mechanism

MathML-Content Extensions Ability to refer to externally declared ”content symbols” Collections of symbol declarations for additional mathematical concepts OpenMath ”content dictionaries” (CDs) a form of ontologies? not quite as formally specified perhaps

”CDs” for extra-mathematical concepts physical constants ... science/engineering... linguistic concepts for NLG

Mathematical Knowledge Management Mathematical models core to science / tech. Abstract nature of mathematics makes concept of ”knowledge” easier in this field no ”grounding” in outside reality via experiments clear definitions of concepts have matured

Powerful knowledge processing software exists Computer Algebra: general-purpose maths for engineering and science Automated Theorem Proving

Science/Tech Knowledge Management Formal models a fundamental part of sci/tech evaluation of fit to measured data may not always map easily to mathematics

==> MKM a fundamental ingredient in Science and Technology Knowledge Management but there's a lot more that needs to be added, of course

Extension to the sciences and technology is possible, but not easy First successes are in maths

Applications of Content Markup in Knowledge Management NIST Digital Library of Mathematical Functions http://dlmf.nist.gov/

Multi-lingual digital library of teaching materials for distance education in college mathematics http://webalt.math.helsinki.fi/

Digital Library of Math Functions to serve as new edition of Abramowitz and Stegun (1964). Handbook of Mathematical Functions. Gaithersburg, MD: National Bureau of Standards. most highly cited among math. handbooks contains valuable formulas for use in science and engineering (esp. special functions)

... with much value added software computer-understandable formulas and proofs

DLMF.NIST.gov Project of National Institute for Standards in Technology (NIST) since ~ 1994 still under intense development software basis (Bruce Miller) editorial board writers' collective

Mock-up of one chapter available (Gamma Function)

DLMF Technology Goal: capture formal knowledge on special functions in mathematics Knowledge to be created by domain experts Knowledge to be formalized by a mix of automatic tools and hand-tuning by knowledge editors

Knowledge capture technology Domain experts write LaTeX code ... enhanced with LaTeX macros ... designed to ease semi-automatic translation to disambiguated Content Markup (e.g., OpenMath)

DLMF Conceptual Structure Authors write as naturally as possible Editing tools ease semantic interpretation

Content of Digital Library stored largely in Content Markup form allows creation of a wide range of output formats from single source print web CD

made available for download into math software significant added value

Multilingual Delivery of Online Math Problem description Mathematics learning is difficult ... and even more so in a foreign language In many countries, secondary and/or tertiary level teaching available only in a foreign language In few countries is teaching available in all mother tongues

Mathematics is the foundation of science and technology (and business...) crucial to teach it widely ... and to teach it well

Why:

Crossing Language Barriers Multilingual Societies European Union, Canada, most(!) countries of the world e.g. Malta, a tiny island state in the Mediterranean: three languages – Maltese, Italian, English

Facilitate integration of learning across sub-cultures Facilitate learning for minorities counter loss of languages and cultures

Mathematics as core of science and engineering teaching

Facilitating Minority Mathematics Learning Learning math (and science) is hard Learning it in a language you haven’t mastered yet is harder still

Practice makes perfect Automated tutoring, e.g. using Maple T.A.

Practicing in your own language helps Problems assigned in teacher’s language Problems worked in student’s language Problems graded Automatically in student’s language By teacher in teacher’s language

Translating Mathematics Translation (e.g. from English) automatically (Google/Babelfish): insufficient quality translation errors will affect correctness!

manually expensive limits automatic variability (necessary for online tutoring)

Are mathematical formulas universal? not quite... (gcd/ggT/mcd; tan/tg...) non-latin scripts complicate matters even more

Mathematics is not just formula, but also

Culture and Language in Math Choice of math notation depends on Culture, History,Scripts „ctg“ vs „cotan“ vs „cot“; ]a,b[ vs (a,b); ... „12“ vs „+=„ Language „gcd“ vs „ggT“ vs „mcd“ vs „M.C.D.“... Mathematical sophistication „A x B“ vs „AB“; „Va:P“ vs. „there is

Field of science „i“ vs „j“ Typography „10 x 20“ vs „ab“; „sin x“ vs „f(x)“

Individual style ∃ vs V

Formal vs. Informal „Va:P“ vs. „there is an a such that P“ „12“ vs „twelve“

Visual vs. Aural rendering

syt gcd

ggT ]a,b[ mcd (a,b)

State of the Art Presentation encoding of math (LaTeX...) Explicit choices by author Impossible to adjust to language/writing system/preferred notational variation E.g. variable names (e.g. x,a,f) in Arabic?

Text fragments in several languages Hand-translated, small number of languages Formulas usually do not change with text

The WebALT Approach Content Markup for „mathematical vernacular“ Simple „natural“ language text Represent „natural language“ part of exercise as in „existential“ example using Content Markup „Render“ to different natural languages using natural language generation technology Mathematical formulae Content-to-presentation stylesheets Language and context specific rendering

e.g. Exercise problems for undergrad math Make approach feasible quickly

WebALT Demonstration (1) Multilingual mathematics tutoring example the student's perspective ../diglib%20workshop%202007/mapleta_en&fr.swf webaltplayer.swf (several languages)

the author's perspective digital library interface (Maple TA) ../diglib%2520workshop%25202007/Automatic%2520M ultilingual%2520Exercise.swf Language independent Text/Math Editor TextMathEditor.swf

Example Example „Solve 2=x2“ Text and embedded formula

Store as content markup, e.g. MathMLContent x x 2 2

Example... Natural Language Generation to different languages Matrix sentence & embedded formula „Please solve the equation x2 = 2 for x.“ „Welchen Wert hat x, wenn x2 = 2?“ ...

Embedded formula rendered depending on the same language context E.g. the greatest common denominator as gcd (English), ggT (German), syt (Finnish), M.C.D. (Italian)

Multilingual Math

Universal Delivery of Mathematical Content How it works:

Create language-independent content Well-defined meaning (semantics)

Deliver localized to any language Natural language generation engine Extensible to any language Utilizes similarities between languages

Technical/math vocabulary per language

Mathematics: a universal language Due to its abstract and exact nature, one can expect to be able to obtain verbalizations of math in natural language, without loss of information, provided one generates them from rich mathematical

EN FI

CN math FR

DE SV

linalg1:determinant(linalg2:matrix( linalg2:matrixrow(a,b), linalg2:matrixrow(c,d))) Find the determinant of the matrix

Etsi matriisin

 

a b . c d

  a b c

determinantti. d

Encontra el determinante de la matriz Finn determinanten av matrisen

 

 

a b . c d

a b . c d

Natural Language Generation for Content Markup (Real Example) attrib([nlg:mood nlg:imperative, nlg:tense nlg:present, nlg:directive nlg:determine], plangeo1:are_on_line(A,B,C))

Determine if A, B and C are collinear. Määritä ovatko A, B ja C suoralla. Determina si A, B y C son colineales. Déterminer si A, B et C sont sur une droite. Determina se A, B e C sono su una linea. Bestäm om A, B och C är på en linje.

Note the linguistic differences: Imperative vs. Infinitive Adjectives vs. Adverbial phrases

WebALT Demo (2)

TME%20theorem%20and%20XML%20source.swf

The WebALT Project eContent Project 2005 – 2006 University of Helsinki Technical U of Catalonia, Barcelona Technical U Eindhoven U of Cologne Maths for More EPF Lausanne

WebALT Project European eContent project ”WebALT” 2 years (2005/2006) + run-up + wind-down ”Web Advanced Learning Technology” total funding ~ 2.4Mio Euros product development WebALT.com founded as spin-off

mathematicians, computer scientists, linguists Helsinki, Köln, Barcelona, Eindhoven universities very successful!

WebALT Digital Library Goals Digital Library mathematics teaching materials mathematics tutoring service automatic problem generator automatic grading for immediate feedback

multilingual delivery pan-European world-wide automatic guaranteed quality

language independent storage

On the Shoulders of Giants MathML-Content / OpenMath Semantically rich mathematics on the Web

Grammatical Framework (GF) Multilingual natural language generation (A. Ranta) “Resource grammars” for several langs/lang groups

LOM Packaging content as learning objects

Maple T.A. Interactive tutoring engine Plugin to BlackBoard

WebCT, BlackBoard, etc.

The WebALT Ingredients The project developed: Methods to deal with multilingual math Editor to create language independent mathematical content Metadata for mathematical content Metadata editor WebALT E-Repository WALTER WebALT MapleTA System Sample content for on-line courses

Digital Libraries Issues Addressed Knowledge re-use mathematics exercises naturally re-usable – just change a few parameters requires intelligent automatic feedback

Knowledge accessibility multilingual multicultural for the blind at different levels of expertise ...

References http://webalt.math.helsinki.fi/content/results/docs/index_eng.html Final Report; How-to Guide for Creating Multilingual Mathematical Content WebALT! Deliver Mathematics Everywhere. O. Caprotti. Proceedings of SITE 2006. Multilingual content development for eLearning in Africa. W. Ng'ang'a. eLearning Africa: 1st PanAfrican Conference on ICT for Development, Education and Training. 24-26 May 2006 Using web-based assignments in Secondary School Probability. M.-L. Viljanen. 3rd International Conference on the Teaching of Mathematics at the Undergraduate Level. July 2006. Multilingual technology for teaching mathematics. O. Caprotti, W. Ng'ang'a, M. Seppälä. Proceedings of the International Conference on Engineering Education, Instructional Technology, Assessment, and E-learning (EIAE 05) Web Advanced Learning Technologies for Multilingual Mathematics Teaching Support. A. Strotmann, M. Seppälä. ELPUB2005. Multilingual Access to Mathematical Exercise Problems. A. Strotmann, W. Ng'ang'a, O. Caprotti. IAMC Workshop. ISSAC 2005 . Web Advanced learning Technologies for Assessment in Mathematics. O. Caprotti, L. Carlson, M. Seppälä, A. Strotmann. ICMCT 2005. Course Content Dictionary for sharing online educational material. J. Karhima, J. Nurmonen, M. Pauna. Submitted for publication to the CAA Series. State of the art in mathematical e-learning. WebALT Consortium. WebALT Deliverable D1.1. Study of the state of the art in multilingual and multicultural creation of digital mathematical content. L. Carlson, J. Saludes, A. Strotmann. WebALT Deliverable D1.2

Where to get it:

Online Demos Online demos available WebALT portal at webalt.math.helsinki.fi:8085/portal/portal/default/Home

Webalt.math.helsinki.fi -> Results -> NL Generator Interactive demo and web service

Oy WebALT Inc. Privately held company Continues the development of the WebALT System beyond the termination of the project Publishes Premium On-Line Content Offers WebALT MapleTA hosting with premium multilingual content

Why it works:

Universal Math & Language Problem: Automatic translation is hard Preserving meaning requires recognition of meaning Non-negotiable requirement for math & science teaching Extremely hard computationally

Solution: Universal Grammar All human languages are equivalent Generated natural language equally good for all langs (Automatic distant-language translations deteriorate)

Semantic math follows roughly equivalent rules Store meaning of content as math + ling. markers Recognition problem circumvented

Natural language generation from semantic math to local language always possible (and relatively easy) Compositional: preserve meaning & proper grammar

Content Markup Language Design The “Linguistics Parallel” Approach to Content Markup Language Design The Compositionality Principle …applied to Content Markup Languages

The Compositionality Principle “The meaning of a compound expression is a function of the meaning of its parts and the syntactic rule by which they are combined.” Barbara Partee quoted by Theo Janssen in “Handbook of Logic and Language” (1997)

Long history in philosophy of language Frege - Tarski - Montague - Partee ...

Content markup language design principle OpenMath since 1995, MathML since 1997

Rule-by-rule Semantics Given syntactic rule: a, b well-formed formulas of categories A,B => c = Fi(a,b) wff of category C

Corresponding semantic rule: a, b interpreted as a’, b’ => c interpreted as c’ = Gk(a’,b’)

Fi syntactic, Gk semantic operations B.Partee: Montague grammar. Hb. Logic & Language

Scalability Small number of fixed syntactic rules => small fixed number of semantic rules => “Categorial grammar” = skeleton semantics for syntactic rules

Special semantic rule => special syntactic rule

Example: Integration Semantic “parts”? Integration operator e(p,t) function “emissions” applied to variables p,t

integration variable “t” interval of integration (ts(p),…) constructor “interval”, expression “ts(p)”, constant “infinity”

Structure of Example Basic structuring ingredients variables, constants (numbers, operators) application: e(p,t), interval constructor, integral operator variable binding

These are basic because each category requires special semantics => each requires special syntax!

Example Syntax: MathML-Content t … … econt p t

Example as an OpenMath Object application( symbol(integral), binding(symbol(lambda), variable(t), application(symbol(emissions), variable(p), variable(t)), application(symbol(lifetime), variable(p)))

integral([lambda t. e(p,t)], lifetime(p))

Special Variable Binding Syntax Unusual in symbolic computing usually block and reassign interpretation of operators like integration instead does not scale (open-ended class of such operators) cannot work if static semantics available only (as in communication between systems)

Major innovation in Content Markup compositionality principle demands it

Designing Content Markup Compositionality Principle as a Content Markup Language Design Principle “Categorial” (skeleton) semantics as a Content Markup Language Design Tool … both have been used successfully to find bugs in several content languages design corrected alternatives

The Linguistics Parallel Ansatz … both derived from Formal Semantics field of Linguistics e.g. Handbook of Logic and Language, vanBenthem and terMeulen, eds.

Human Language and Content Markup solve similar problem: communicating “meaning” among independent agents therefore need to be based on similar design principles

Linguistics Parallel Informs surprisingly concrete design decisions for content markup, e.g. language layers (morpho-) syntax, categorial/ lexical semantics

syntactic structure ((head arguments) modifiers) binding, common substructure elimination

semantics skeleton semantics of syntactic constructors

Linguistic Parallel Concrete lessons have impacted current Content Markup language designs and yet, linguistic parallel as an approach to studying their underlying principles has had little support. Conjecture: understanding content markup requires the linguistics parallel.

The Cognitivist Approach The linguistics parallel is a special case of the cognitivist approach: Study human cognitive structures to design computing and networking systems

Conjecture: Content Markup helps capture users’ conceptual models ... hence helps design and build more usable information systems

Conclusions Content Markup Languages enable wide range of interesting applications bleeding-edge science through low-end teaching

improvements in usability of (and access to) information systems

Cognitive perspective helps improve design of Content Markup Languages

Suggest Documents