Presenting and Capturing. Mathematics for the Web http://www.cs.cmu.edu/~
kohlhase/talks/mathml-tutorial. Mathematics ̂= Anything Formal. Michael
Kohlhase.
MathMl Presenting and Capturing Mathematics for the Web http://www.cs.cmu.edu/~kohlhase/talks/mathml-tutorial
Mathematics = b Anything Formal
Michael Kohlhase School of Computer Science Carnegie Mellon University
http://www.cs.cmu.edu/~kohlhase 1
c Michael Kohlhase
:
Carnegie Mellon
Document Markup for Mathematics • Problem: Mathematical Vernacular and mathematical formulae have more structure than can be expressed in a linear sequence of standard characters • Definition (Document Markup) Document markup is the process of adding codes to a document to identify the structure of a document or the format in which it is to appear.
2
Carnegie Mellon
c Michael Kohlhase
:
Document Markup Systems for Mathematics • M$ Word/Equation Editor: WYSYWIG, proprietary formatter/reader + easy to use, well-integrated
– limited mathematics, expensive, vendor lock-in • TEX/LATEX: powerful, open formatter (TEX), various readers (DVI/PS/PDF) + flexible, portable persistent source, high quality math – inflexible representation after formatting step • HtML+GIF: server-side formatting, pervasive browsers
+ flexible, powerful authoring systems LATEX/Mathematica/. . . – limited accessibility, reusability
3
Carnegie Mellon
c Michael Kohlhase
:
Styles of Markup • Definition (Presentation Markup) A markup scheme that specifies document structure to aid document processing by humans – e.g. *roff, Postscript, DVI, early MS Word, low-level TEX + simple, context-free, portable (verbatim), easy to implement/transform – inflexible, possibly verbose, • Definition (Content Markup) A markup scheme that specifies document structure to aid document processing by machines or with machine support. – e.g. LATEX (if used correctly), Programming Languages, ATP input + flexible, portable (in spirit), unambiguous, language-independent – possibly verbose, context dependent, hard to read and write 4
Carnegie Mellon
c Michael Kohlhase
:
Content vs. Presentation by Example Language
Representation
Content?
{\bf proof}:. . . \hfill\Box
\begin{proof}. . . \end{proof}
. . .
LISP
. . . √ 3 8+ x
TEX
$\{f|f(0)> 0\;{\rm and}\;f(1) 0 and f (1) < 0}
TEX
$\{f|f(0)> 0$ and $f(1) 0 and f (1) < 0}
LATEX HtML
(power (plus 8 (sqrt x))
• We consider these to be representations of the same content (object)
5
Carnegie Mellon
c Michael Kohlhase
:
3)
Web Standards for Formula Markup lanugage
MathMl
OpenMath
by
W3C Math WG
OpenMath society
origin
math for HtML
integration of CAS
coverage
cont+pres; K-14
content; extensible
status
Version 2 (II 2001)
standard (IV 2001)
activity
maintenance
maintenance
Info
http://w3c.org/Math
http://www.openmath.org
6
Carnegie Mellon
c Michael Kohlhase
:
MathMl: Mathematical Markup Language MathMl is an Xml application for describing mathematical notation and capturing both its structure and content. The goal of MathMl is to enable mathematics to be served, received, and processed on the World Wide Web, just as HtML has enabled this functionality for text. from the MathML2 Recommendation • Plan of the talk – Philosophy
(What is Math? Presentation vs. content, Why now?) (What you need to generate)
– Concrete Language Elements – Tools, Deployment
7
(Authoring, Browsers, Computation. . . )
Carnegie Mellon
c Michael Kohlhase
:
Excursion: Xml (Extensible Markup Language) • Xml is language family for the Web
(begin/end brackets)
– tree representation language
– restrict instances by Doc. Type Def. (DTD) or XML Schema (Grammar) – Presentation markup by style files
(Xsl: XML Style Language)
• Xml= extensible HtML∩ simplified SGML • logic annotation (markup) instead of presentation! • many tools available: parsers, compression, data bases, . . . • conceptually: transfer of directed graphs instead of strings. • details at http://www.w3c.org
8
Carnegie Mellon
c Michael Kohlhase
:
MathMl Language Elements
Presentation first
9
Carnegie Mellon
c Michael Kohlhase
:
Representation of Formulae as Expression Trees • Mathematical Expressions are build up as expression trees – of layout schemata in Presentation-MathMl
– of logical subexpressions in Content-MathMl • Example: (a + b)2 a + b 2
10
a b 2 Carnegie Mellon
c Michael Kohlhase
:
Layout Schemata and the MathMl Box model
11
Carnegie Mellon
c Michael Kohlhase
:
P-MathMl Token Elements • Tokens Elements directly contain character data (the only way to include it) Attributes: fontweight, fontfamily and fontstyle, color. . .
• Identifiers: ...
(∼ variables, italicized)
• Numbers: ...
(∼ variables, italicized)
• Operators: ... • Operator display is often ideosyncratic
(constants, functions, upright) (Operator Dictionaries for defaults)
– Examples: spacing, *-scripts in sums and limits, stretchy integrals,. . .
– Attributes: lspace, rspace, stretchy, and movablelimits. – Operators include delimiter characters like ∗ parentheses (which stretch), ∗ punctuation (which has uneven spacing around it) and ∗ accents (which also stretch). 12
Carnegie Mellon
c Michael Kohlhase
:
General Layout Schemata • horizontal row: child1 ...
(alignment and grouping)
• fraction: numerator denominator Attribute: linethickness (set to 0 for binomial coefficients) • Radicals: child1 ... and base index • grouping with parenthesis: child ... Attributes: open="(" and close="]" to specify parentheses • grouping and style: child ...
13
(pre-set attributes)
Carnegie Mellon
c Michael Kohlhase
:
Example: x2 + 4x + 4 = 0
14
just presentation
some structure
x 2 + 4 x + 4 = 0
x 2 + 4 x + 4 = 0
Carnegie Mellon
c Michael Kohlhase
:
Example: Grouping Arguments by mfenced
f (x + y) f x + y
15
f (x + y)
f x + y
Carnegie Mellon
c Michael Kohlhase
:
Example: mfrac and mroot 1 - x 2 3
16
r 3
x 1− 2
Carnegie Mellon
c Michael Kohlhase
:
√ −b ± b2 − 4ac Example: The quadratic formula x = 2a x = -b ± b2 - 4ac 2⁢a
17
Carnegie Mellon
c Michael Kohlhase
:
Script Schemata • Indices: G1 , H5 , Rji . . .
– Super: base script – Subs: base script – Both: base superscript subscript (vertical alignment!)
• Bars and Arrows: X, |{z} Y , |{z} Z ,. . . – Under: base script – Over: base script
– Both: base underscript overscript • Tensor-like: base sub1 sup1 ... psub1 psup1 ...] 18
[ Carnegie Mellon
c Michael Kohlhase
:
msub + msup vs. msubsup msub + msup x 1 α
x1
19
α
msubsup x 1 α
α x1
Carnegie Mellon
c Michael Kohlhase
:
Example: Movable Limits on Sums ∑ i=1 &infty; xi + ∑ i=1 &infty; xi
20
∞ X
i=1
i
x +
P∞
i x i=1
Carnegie Mellon
c Michael Kohlhase
:
MathMl Language Elements
Content MathMl
21
Carnegie Mellon
c Michael Kohlhase
:
Expression Trees in Prefix Notation • Prefix Notation saves parentheses (x − y)/2
x y 2
(so does postfix, BTW) x − (y/2)
x y 2
• Function Application: function arg1 ...
argn
• Operators and Functions: ∼ 100 empty elements , , , ,. . . • Token elements: ci, cn
(identifiers and numbers)
• Extra Operators: ... 22
Carnegie Mellon
c Michael Kohlhase
:
Containers aka Constructors • sets: ... or ... ... • intervals: Attribute: closure (one of open, closed, open-closed, closed-open) • vectors: ...
• matrix rows: ... • matrices: ...
23
Carnegie Mellon
c Michael Kohlhase
:
Examples of Content Math Expression
Markup
x 9
sin(x) + 9 x=1
x1 n 0 &infty; xn x3 fx x y 0x1 3y10
24
c Michael Kohlhase
:
P∞
n x n=0
d3 f (x) dx3
0 < x < 1, x, y 3 ≤ y ≤ 10 Carnegie Mellon
Expression
Markup
x x0 0 &infty; 12 01 10 21
25
{x|x ≥ 0} = [0, ∞)
(1, 2) ×
0 1 1 0
= (2, 1)t
Carnegie Mellon
c Michael Kohlhase
:
Mixing Presentation and Content MathMl (a + b) ⁢ (c + d) a b c d Carnegie Mellon
26
c Michael Kohlhase
:
Parallel Markup in MathMl
(a+b) ⁢ (c+d) ab cd
Carnegie Mellon
27
c Michael Kohlhase
:
Practical Considerations
I can write all this MathMl now, how can I include it in my web-page or course materials.
28
Carnegie Mellon
c Michael Kohlhase
:
Including MathMl in your web page • which browsers?
(After all, we want to see it)
– Internet Explorer ≥ 5.5 with Mathplayer/Techexplorer
(behaviors)
(native support of editing)
– Amaya (W3C Exploratory Browser)
(native presentation MathMl)
– Mozilla/Netscape 7
– all others with applets like WebEQ, CSS rendering • Problem: how to identify the math markup to our browser, plug-in, or applet. • Solution: insert MathMl markup between and tags to distinguish MathML from HtML. • Question: Does this really work cross-platform?
29
(no!)
Carnegie Mellon
c Michael Kohlhase
:
Including MathMl in Web pages (Details) • Use XHtML! MathMl is an Xml application which does not mix well with HtML (tag soup parser) • Use the MathMl namespace! http://www.w3.org/1999/xhtml ... Example
.... x+3
30
This is the theory (according to the MathMl Spec) does it work? not in practice!
Carnegie Mellon
c Michael Kohlhase
:
Problems with 95% of the Browser Market • M$ Intenet Explorer does not render Xml • I.E. also does not implement MathMl natively
(but XHtML is Xml!) (market too small)
• MathPlayer (Design Science), Techexplorer (IBM) plugins must be registered by magic incantations in the document head
31
Carnegie Mellon
c Michael Kohlhase
:
Solution: David Carlisle’s Universal XslT style sheet. • Idea: Make use of XslT transformer in the browser – Amaya, Mozilla, Netscape 7: do nothing
(Identity trafo)
– M$ Internet Explorer: insert element, adjust namespaces prefixes, transform to HtML. • In practice? Add a stylesheet processing instruction
... ...
32
Carnegie Mellon
c Michael Kohlhase
:
Practical Considerations • This will work in most cases, except if • you are off-line ⇒ use a local copy of the style sheet
• the style sheet is not on the same server as XHtML+MathMl document – set IE security preferences to “low”
(not recommended)
– copy the style-sheets there – use off-line approach or embed style sheets into document
33
Carnegie Mellon
c Michael Kohlhase
:
Coping with multiple renderers • If a machine has more than one renderer, use Xml namespaces for preferences.
• appliccable values
– css: render the equation through the use of CSS (no plug-in required) – mathplayer-dl: prompt to install MathPlayer if necessary. – mathplayer: use the MathPlayer behavior. – techexplorer-plugin: use the Techexplorer plug-in. – techexplorer: the Techexplorer rendering is preferred.
34
Carnegie Mellon
c Michael Kohlhase
:
Back to the Content
Can’t I just generate the presentation from other formats?
35
Carnegie Mellon
c Michael Kohlhase
:
Content vs. Semantics in Math • Content: logic-independent infrastructure Identification of abstract syntax, “semantics” by reference for symbols. 2
• Semantics: establishing meaning by fixing consequences adds formal inference rules and axioms. – Mechanization in a specific system, or (Theorem Prover or Proof Checker) – logical framework
36
(specify the logic in the system itself)
Carnegie Mellon
c Michael Kohlhase
:
Semantics vs. Representation e.g. Disjunction in Strong Kleene Logic ∨
F
U
T
T
F
U
T
U
U
U
T
T
T
T
T
The binary minimum function on {T, U, F }, where F ≤ U ≤ T .
• We will consider these to be representations of different content objects • That they are the same mathematically, is another matter. Boolean Algebra
field of clopen subsets of a topological space
Stone Representation Theorem
Multi-modal K
Description Logic ALC
[IJCAI93]
f (x) =
Complex Analysis II
0
Solution of f = f
37
x x3 x3 1+ 1! + 2! + 3! +· · ·
Carnegie Mellon
c Michael Kohlhase
:
From Content to Presentation • Idea: manage the source in content markup scheme and generate presentation markup from that! (style sheets) – e.g. LATEX with style/class files or HtML with Cascading Style Sheets, . . . – device/language independence, personalization,. . . . . . • Questions:
(every content language has to deal with this somehow)
– fixed or extensible set of language features? (in what language?)
– specify presentation information
(client-side or server-side)
– need a style sheet execution engine
(where is my style sheet? what will it look like?)
– late binding problems, • Problems: How to get back?
38
(transformation loses information)
Carnegie Mellon
c Michael Kohlhase
:
From Presentation to Content? • Problem: Presentation Markup ↔ Content Markup
– many presentation for one concept (e.g. binomial coeff.
n k
vs. Ckn vs. Cnk )
– many concepts for one presentation (e.g. m3 is m cubed, cubic meter, upper index, footnote,. . . ) – grouping is left implicit, invisible operators – disambiguation by context
(e.g. 3a2 + 6ab + b2 ) (e.g. λXα X =α λYα Y )
– notation is introduced and used on the fly. • Content Recovery is a heuristic context/author-dependent process
– There is little hope we can do it fully automatically in principle (AI-hard!) – for limited domains we can do a good job
39
(e.g. in Mathematica 4)
Carnegie Mellon
c Michael Kohlhase
:
OpenMath • Extensible framework for specifying the content of mathematical objects • Objects as
– constants (OMS)
(Deliberately uncommitted wrt. object semantics) (symbols: semantics given in theory)
– variables (OMV)
(local objects)
– applications (OMA)
(for functions)
– bindings (OMBIND)
(λ, quantification)
– attributions (OMATTR)
(e.g. for types)
40
Carnegie Mellon
c Michael Kohlhase
:
C-MathMl and OpenMath are equivalent (almost) OpenMath 0
41
MathMl
a a 0
Carnegie Mellon
c Michael Kohlhase
:
Added-value services facilitated with Math Content D1 cut and paste
(cut output from web search engine and paste into CAS )
D2 automatically proof-checking formal argumentations • math explanation
(e.g. specialize a proof to a simpler special case)
D3 semantical search for mathematical concepts • data mining for representation theorems • classification
(rather than keywords)
(find unnoticed groups out there)
(given a concrete math structure, is there a general theory?)
D4 personalized notation
(implication as → vs. ⊃, or Ricci as 12 Rij vs. 2Rij ) (ActiveMath, Course Capsules)
D5 user-adapted documents
42
(bridge verification?)
Carnegie Mellon
c Michael Kohlhase
:
Conclusion: The way we do math will change dramatically Publication
Application
Compute/
Prove
Experiment
Mathematical Creativity Spiral [Buchberber ’95]
The Creativity Spiral
Specify/
Visualize
Com− munication
Formalize
Conjecture
Teaching
• Every step will be supported by mathematical software systems • Towards an infrastructure for web-based mathematics! 43
c Michael Kohlhase
:
Carnegie Mellon