The LaTeXML Daemon: Editable Math on the Collaborative Web Deyan Ginev1 , Heinrich Stamerjohanns1 , Bruce R. Miller2 , and Michael Kohlhase1 1
Computer Science, Jacobs University Bremen
[email protected] 2 National Institute of Standards and Technology
[email protected]
Introduction The language of the TEX/LATEX typesetting system has become all-pervasive in scientific publications and has proven its stability, convenience and expressivity in its three-decade history. With the advent of the Web 2.0 paradigm, it has also become the primary choice of various technical and scientific social platforms, most prominently online encyclopedias (e.g. PlanetMath [Pla]) and question-answer forums (e.g. MathOverflow [Mat]). On the other hand, the standardization of MathML and OpenMath and the adoption of the former in HTML5, have opened the floodgates for scientific content native to the browser. LATEX on the Web The efforts of bringing LATEX to the web are numerous and have varied in number and approach. Classical scenarios provide hooks to either LATEX itself or a LATEX daemon(e.g. [Uni]), incorporating formulas as PDF or PNG, while newer applications pursue a fully native output of XHTML+MathML (see [Sta+09] for an overview). The LaTeXML [Mil] system, and particularly its enhanced branch maintained for the arXMLiv [Sta+10] project, belongs to the second category. It is currently the only system with experimental Content MathML support and is also prepared for further linguistic and semantic analysis of the document contents, as it not only expands TEX macros, but also allows for the definition of custom bindings that preserve and enhance the semantic information encoded by the authors. Moreover, it takes the effort one step further, being able to already create XHTML+MathML+RDFa and remains easily extensible to other output formats, such as HTML5+RDFa. A Daemon In this abstract we present an extension, the LaTeXML daemon, which drastically widens the applications of the LaTeXML system, addressing the problems of efficient, scalable and on-the-fly processing. The daemon enhancements avoid the overhead of startup time and LATEX package initialization, achieving close to instant conversion times, and easily scale to multi-threaded setups. This provides a great boost to conversion times not only in the processing of large-scale corpora, but also when employed as a backend for web services using LATEX as a frontend language. Maturity has been exhibited by converting one and a half million abstracts from the ZentralblattMATH [Zbl] database, coupled
with a stable performance in various installations of the Planetary [Koh+11] system. Next, we equip our system with the capabilities of operating as a web service independent of a file system and of recognizing resources via web-accessible URIs3 . Furthermore, we add native support for user-embedded metadata, with an outlet for Semantic Web target representations, such as RDFa. When these features are employed in unity, our system acts as a capable conversion backend for web-based authoring systems, also scalable to exporting user-defined metadata for add-on semantic services. Prominently, the high conversion speeds allow for on-the-fly preview of the authored content and enable real-time collaborative setups, where multiple authors can write simultaneously – a setting explored in depth by services such as Etherpad [Inc] and Google Docs [Goo], but lacking native mathematical content. Implementation An emphasis has been placed on flexibility, versatility and ease of use, in both the setup and the deployment phases. A large range of customization options and a pair of intuitive server-client binaries with detailed documentation, enable out-of-the-box use for a wide range of applications. The system implementation is based entirely on open web standards and has the full expressivity of the original TEX engine. Additionally, correctness and robustness are ensured respectively, via the powerful scoping system of LaTeXML and Perl’s mastery in localizing both variables and processing flows. The daemon communication is based on sockets, allowing an easy coupling with both local and Internet services. LaTeXML is Public Domain software, and the daemon remains consistently under that license. Currently the software is hosted on the arXMLiv branch of the LaTeXML repository [Gin], to be merged with the trunk of LaTeXML upon reaching a release candidate. Also, a demo page [Arx] has been set up, in order to showcase the features claimed. Future Work Our next steps would be to address reducing the memory and processing footprints in the short term, plus ensuring ironclad security when deployed on the web, in the mid-term. We are simultaneously developing a number of custom libraries that further build on the capabilities discussed, trying to fully realize the potential of the framework in introducing LATEX as a frontend language for the Collaborative Web. They range from supporting special input methods, e.g. Wikipedia style markup, to providing a stronger integration with the World Wide Web, e.g. by increasing the expressivity in writing metadata and supporting direct reuse of LATEX content available online.
3
Note that in this respect LaTeXML exceeds the capacity of TEX/LATEX which can only read files from the local file system.
References [Arx]
arXMLiv: Showcase Demo Page. url: http://trac.kwarc.info/ arXMLiv/wiki/Demo (visited on 03/03/2011). [Gin] Deyan Ginev. LaTeXML: A LATEX to XML Converter, arXMLiv branch. url: https : / / svn . mathweb . org / repos / LaTeXML / branches/arXMLiv (visited on 03/03/2011). [Goo] Google. Google Docs. url: http://docs.google.com (visited on 03/03/2011). [Inc] AppJet Inc. Etherpad. url: http://www.etherpad.com (visited on 03/03/2011). [Koh+11] Michael Kohlhase et al. “The Planetary System: Web 3.0 & Active Documents for STEM”. In: accepted for publication at ICCS 2011 (Finalist at the Executable Papers Challenge). 2011. url: https: //svn.mathweb.org/repos/planetary/doc/epc11/paper.pdf. [Mat] MathOverflow. url: http : / / mathoverflow . net (visited on 09/13/2010). [Mil] Bruce Miller. LaTeXML: A LATEX to XML Converter. url: http : //dlmf.nist.gov/LaTeXML/ (visited on 03/03/2011). [Pla] PlanetMath.org – Math for the people, by the people. url: http: //planetmath.org (visited on 01/06/2011). [Sta+09] Heinrich Stamerjohanns et al. “MathML-aware article conversion from LATEX, A comparison study”. In: Towards Digital Mathematics Library, DML 2009 workshop. Ed. by Petr Sojka. Masaryk University, Brno, 2009, pp. 109–120. url: http://kwarc.info/kohlhase/ submit/dml09.pdf. [Sta+10] Heinrich Stamerjohanns et al. “Transforming large collections of scientific publications to XML”. In: Mathematics in Computer Science 3.3 (2010): Special Issue on Authoring, Digitalization and Management of Mathematical Knowledge. Ed. by Serge Autexier, Petr Sojka, and Masakazu Suzuki, pp. 299–307. url: http://kwarc.info/ kohlhase/papers/mcs10.pdf. [Uni] Open University. MathTran: A TEX to Image Converter Web Service. seen May 2010. url: http://www.mathtran.org (visited on 03/03/2011). [Zbl] Zentralblatt MATH. url: http://www.zentralblatt- math.org (visited on 03/03/2011).