The Stratego/XT BibTEX Tools — tool documentation —
Eelco Visser
Department of Information and Computing Sciences Universiteit Utrecht DRAFT November 5, 2005
c 2005 Eelco Visser Copyright
Address: Department of Information and Computing Sciences Universiteit Utrecht P.O.Box 80089 3508 TB Utrecht Eelco Visser
[email protected] http://www.cs.uu.nl/∼visser
The Stratego/XT BibTEX Tools — tool documentation — Eelco Visser DRAFT November 5, 2005 Abstract This paper provides documentation for the Stratego/XT BibTEX Tools, a collection of tools for processing BibTEX bibliography files. The main tools in the collection are bib-to-html for generating a sectioned publication list from a BibTEX file according to selection criteria and a layout template, and aux-to-bib for extracting a BibTEX file based on the citations in a LATEX document.
1
Introduction
As a researcher you produce multiple publications each year. For various reasons you make overviews of these publications. You want people to quickly find your publications, so you maintain a publication list on your homepage with most recent publications first, organized by year, and with links to pdf files. You have to maintain bibliographies for the projects you are involved in. As you broaden your interests or get involved in new research areas you may have a list with your publications organized by research topic. Your curriculum vitae has a list of publications organized per publication medium (first the journal publications, then the conference proceedings, etc.). Some of these lists you may need in PDF and others in HTML for publication on the web, and some in both formats. Of course, all these lists have to be updated frequently as new publications arrive and as their status (and thus their position in the lists) changes from draft, to submission, to publication. To conclude, manipulating bibliography information is part of your job, and may cost more time than you like. The standard approach is to maintain each of these lists separately and to minimize their number to reduce maintenance costs. This paper describes a collection of tools for deriving all these types of publication lists from a single source, i.e., a BibTEX bibliography file. The BibTEX-tools are built with the Stratego/XT transformation language and toolset [10]. This document is not about the many other toolsets for BibTEX that are around [5, 3, 8], notably that of Nelson Beebe [1]. Indeed the motivation for creating these tools was frustration with the ‘standard’ bibtex tools collection that I had been using; which does not imply that I have a conducted a thorough investigation in the availability of other suitable tools. Creating these tools has been a fruitful case study in the use of Stratego/XT.
3
2005 [1]
2004 [2]
[3]
2002 [4]
M. Bravenboer, R. Vermaas, J. Vinju, and E. Visser. Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax. In R. Gl¨uck and M. Lowry, editors, Proceedings of the Fourth International Conference on Generative Programming and Component Engineering (GPCE 2005), volume 3676 of Lecture Notes in Computer Science, Tallin, Estonia, September 2005. Springer. (pdf, gpce, bib). M. Bravenboer and E. Visser. Concrete Syntax for Objects. Domain-Specific Language Embedding and Assimilation without Restrictions. In D. C. Schmidt, editor, Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA’04), pages 365–383, Vancouver, Canada, October 2004. ACM Press. (acm, info, pdf, bib). B. Fischer and E. Visser. Retrofitting the AutoBayes Program Synthesis System with Concrete Object Syntax. In C. Lengauer et al., editors, Domain-Specific Program Generation, volume 3016 of Lecture Notes in Computer Science, pages 239–253. Spinger-Verlag, 2004. (pdf, info, springer, bib). E. Visser. Meta-Programming with Concrete Object Syntax. In D. Batory, C. Consel, and W. Taha, editors, Generative Programming and Component Engineering (GPCE’02), volume 2487 of Lecture Notes in Computer Science, pages 299–315, Pittsburgh, PA, USA, October 2002. Springer-Verlag. (pdf, bib).
Figure 1: Example publication list
2
Quick Start
This section explains without further background how to use the BibTEX-Tools to quickly produce a publication list or extract the BibTEX entries for a paper.
2.1
bib-to-html
The bib-to-html tool produces publication lists from BibTEX files. The name is a bit misleading since it produces such lists in HTML and in PDF. bib-to-html -i file.bib Given a bibliography file file.bib produce a directory file/ that contains a publication list with all the entries in file.bib organized by year in reverse chronological order with a separate section for each year, as illustrated in Figure 1. bib-to-html -i file.bib --template by-type Similar as above, but now specify the template to use. The by-type template organizes the list by publication type; books and book chapters first, then journal articles, then papers in proceedings. The bibtex-tools packages comes with a number of standard templates, in addition to the by-type template: by-year, is the template that is used by default, and organizes the publications in sections per year; alpha organizes entries alphabetically; by-type-by-year, organizes the list in sections by type and within those sections with subsections per year, by-year-by-type, organizes publications by year, and within each year by type; finally, key-publications, starts with a section of key publications (determined by the presence of a field category = {key}) organized by year, and then the other publications organized by year. bib-to-html -i file.bib --all-templates Use all standard templates to produce publication lists. The file/ directory will contain an index.html file with links to each of the lists. Another feature you might want to use is the inclusion of url fields. For each field of the form urlname = {http://...}, a link pointing to the url and with name name 4
is added to the note field of the BibTEX entry. The first url field is also used as link for the title of the entry. If you read this document with a PDF viewer you can see that the publication titles in Figure 1 and the items between parentheses such as ‘pdf’, and ‘info’ are hyperlinks, which have been generated from url fields.
2.2
aux-to-bib
When you have several large BibTEX files, it may be convenient to extract from these a single BibTEX file with exactly the entries cited in some article. For instance, to edit the entries for use in the article (e.g. squeezing for space), or to share it with your co-authors. aux-to-bib -i file -I . -I bibdir Given an aux file file.aux, aux-to-bib extracts from the bibliographies declared in the aux file, those entries for which there are citations in the aux file. It is necessary to indicate using the -I option, which directories should be searched for BibTEX files.
3
Producing Bibliographies with LATEX/BibTEX
In this section I review the standard workflow for producing bibliographies and publication lists with LATEX and BibTEX. This might provide you with enough information to create these things without using bibtex-tools.
3.1
BibTEX
BibTEX is a convenient format for the specification of bibliographic information. The basic language is quite simple. A BibTEX file consists of entries of the form @type{key, field*}, where the fields associate values with field names. Figure 2 provides a typical example. In a sense a BibTEX file is a database consisting of a list of records. However, BibTEX associates no schema with these records; types and field names can be chosen arbitrarily. Thus, one can invent types and field names appropriate for the application domain. @incollection{Vis04.strategoxt, author = {Eelco Visser}, title = {Program Transformation with {Stratego/XT}: Rules, Strategies, Tools, and Systems in {StrategoXT-0.9}}, booktitle = {Domain-Specific Program Generation}, pages = {216--238}, year = 2004, editor = {C. Lengauer and others}, volume = {3016}, series = {Lecture Notes in Computer Science}, month = {June}, publisher = {Spinger-Verlag}, }
Figure 2: BibTEX entry 5
The real strength of BibTEX, however, is not its database format, but the formatting of bibliographic entries for use as references in documents. The bibtex tool orders and formats a selection of entries from a BibTEX file based on the citations in a document. Rather than providing a fixed formatting, bibtex is parameterized with a bibliography style that defines how an entry is formatted. In fact these styles are programs in an idiosyncratic, unnamed postfix language [7] in which one defines how to sort entries, how to format bibliography labels and citations, and for each type of entry how to format it. Here, the types of entries that one can use are actually restricted; if the style does not support an entry type, it will not be formatted. Thus, inventing new types requires extending the styles, which is usually non-trivial. Formatting an entry entails selecting the fields to display, the order to display them in, the font to use for each field, the transformations to apply to the content of the field, and knowing what to do with missing fields. For example, the plain style orders entries alphabetically by author, numbering the entries consecutively. The abbrv style is the same as plain, but transforms the author fields by abbreviating first names to initials. Over the years many of such styles have been developed. There are generic styles such as plain and abbrv, but also styles encoding a style typical for use in some scientific community, or even specific for a certain journal. By maintaining bibliographic information in BibTEX files, authors don’t have to be concerned with the details of these styles and can easily switch from one style to another, simply by declaring a different style in their document.
3.2
BibTEX Workflow
The standard workflow for BibTEX is as a tool to produce the bibliography for a book or article. It works primarily with the typesetting language TEX/LATEX. For example, consider the following file concrete.ltx: \documentclass{article} \begin{document} ... \cite{Vis02.gpce} ... \bibliographystyle{abbrv} \bibliography{concrete} \end{document} The document cites references by means of citation commands such as \cite{Vis02.gpce}. It declares a bibliography style to format the bibliography (abbrv.bst in this case), and one or more bibliography files from which to obtain the references (concrete.bib in this case). When running this file through the latex command for typesetting by invoking latex concrete.ltx the citation commands and bibliography declarations are collected and written to a .aux file: \relax \citation{Vis02.gpce} \bibstyle{abbrv} \bibdata{concrete} The bibtex tool interprets these citations and translates the cited entries in the .bib files to a .bbl file with a formatted bibliography in LATEX. For example, invoking 6
\begin{thebibliography}{1} \bibitem{Vis02.gpce} E.~Visser. \newblock Meta-programming with concrete object syntax. \newblock In D.~Batory, C.~Consel, and W.~Taha, editors, {\em Generative Programming and Component Engineering (GPCE’02)}, volume 2487 of {\em Lecture Notes in Computer Science}, pages 299--315, Pittsburgh, PA, USA, October 2002. Springer-Verlag. \end{thebibliography}
Figure 3: Bibliography formatted in LATEX. bibtex concrete creates the file concrete.bbl in Figure 3. This LATEX file is then included in the document to produce the bibliography at the place of the bibliography command in the document. LATEX’s referencing mechanism is used to provide the citation labels in the text; there is no need for the author to maintain the numbers of entries in the bibliography. Running the latex command once more includes the generate .bbl file in the document. The \bibitem command in the bibliography creates a \bibcite entry in the .aux file: \relax \citation{Vis02.gpce} \bibstyle{abbrv} \bibdata{concrete} \bibcite{Vis02.gpce}{1} Yet one more run of latex is now needed to typeset the \cite{Vis02.gpce} command with the reference [1] in the bibliography. In general, the procedure to typeset a LATEX document with BibTEX citations is the following: latex concrete.ltx bibtex concrete latex concrete.ltx latex concrete.ltx
3.3
Bibliographies in Multiple Sections
To produce documents that contain multiple bibliographies is less well known, but can be done as well. The bibunits.sty package for LATEX allows the declaration of multiple regions that declare their own citation namespace [4]. The citations within this part of the document give rise to a separate bibliography. For example, the publication list in Figure 1, can be produced by the LATEX document in Figure 4. The procedure for typesetting this document is similar as above. However, for each bibliography a separate .aux file numbered in order of appearance of the bibunits is generated, and bibtex should be invoked for each of these files, which explains the for loop in the following script: latex --interaction scrollmode concrete-list.ltx for file in concrete-list.*.aux 7
\documentclass{article} \usepackage{bibunits} \usepackage{pubmacros-smooth} \newenvironment{BibSection}[1]{ \begin{bibunit}[abbrv] }{ \putbib[concrete] \end{bibunit} } \begin{document} \subsection*{2005} \begin{BibSection}{1} \nocite{BVVV05} \end{BibSection} \subsection*{2004} \begin{BibSection}{2} \nocite{FV04.retrofit,BV04} \end{BibSection} \subsection*{2002} \begin{BibSection}{3} \nocite{Vis02.gpce} \end{BibSection} \end{document}
Figure 4: Using bibunits to create multiple bibliographies. do bibtex $(basename $file .aux) done latex --interaction scrollmode concrete-list.ltx latex --interaction scrollmode concrete-list.ltx The document in Figure 4 uses the pubmacros-smooth style file, which comes with the bibtex-tools distribution. This style file redefines the thebibliography environment such that it does not print the usual References header, and such that entries are numbered consecutively. Because of some ununderstood bug in this style file, the typesetting of this document stops; the --interaction scrollmode directive tells latex to ignore the problem.
3.4
Bibliographies for the Web
Producing web pages with bibliographies or publication lists from BibTEX files requires a translation from BibTEX to HTML. There are tools that do this directly. However, they are necessarily restricted in the formatting of bibliography entries, since reproducing all the variability in BibTEX and all existing styles is too much effort. An option might be an implementation of BibTEX that directly produces HTML. However, the .bst files that do the actual formatting are written for LATEX and do not use an abstract API for formatting, but directly produce text. Although there seems to be a library of common functions used in many .bst files. Replacing those with instructions for producing HTML might do the trick. Even so, the approach would require adapting existing .bst files, and is not necessarily extensible to new or nonstandard .bst files. The next option might be a dedicated translation of .bbl files to HTML. Since the LATEX vocabulary used in those files is generally restricted, such a direct translation could yield good quality HTML. However, the approach is pretty fragile, since it precludes LATEX commands that might be used in BibTEX entries. A relatively easy way is to produce a LATEX document with the .bbl file produced 8
\documentclass{article} \usepackage{hevea} \usepackage{pubmacros-smooth} \newenvironment{BibSection}[1]{\input{concrete-list.#1.bbl}}{} \begin{document} \subsection*{2005} \begin{BibSection}{1} \nocite{BVVV05} \end{BibSection} \subsection*{2004} \begin{BibSection}{2} \nocite{FV04.retrofit,BV04} \end{BibSection} \subsection*{2002} \begin{BibSection}{3} \nocite{Vis02.gpce} \end{BibSection} \end{document}
Figure 5: Publication list for web with Hevea. by BibTEX and translate it to HTML using some LATEX to HTML translator such as Hevea [6]. This delegates the problem of translating LATEX to HTML to a third party tool. Even if the tool is not complete, it comes for free, and hopefully will become more complete in the future. Hevea is a pretty good LATEX to HTML translator written in Objective Caml. It emulates a good part of the TEX engine, including macro definitions, so should be able to deal with .bbl files produced by bibtex. However, Hevea requires that style files are rewritten to Hevea compatible .hva files, and will not read existing style files. Unfortunately, the bibunits style is not supported by Hevea. This is not a big issue, as we can easily reuse the .bbl files generated by a latex/bibtex run. Figure 5 shows file concrete-list-for-html.ltx, which has the same body as concrete-list.ltx in Figure 4, but loads the hevea style and redefines the BibSection environment to read the .bib files produced for the concrete-list document. An HTML rendering of the publication list can now be produced by invoking hevea: hevea concrete-list-for-html.ltx -o concrete-list.html
3.5
Summary
To summarize, if you want to produce a publication list from a BibTEX file to put on the web you need • (pdf)latex to generate a printable version and to produce the .aux files • bibtex to format bibliography entries • bibunits to divide the bibliography into sections • hevea to translate LATEX to HTML And you need a LATEX document divided into sections with a bibunit in each section with a bunch of \nocite commands to indicate which entries to list in which section.
4
BibTEX Tools
The techniques described in the previous section might be sufficient for your purposes, and you may have no need for the BibTEX Tools described in the rest of this paper. 9
I have used these techniques for a long time without further tools, but encountered a number of scenarios that required further automation. The first problem is that of adding hyperlinks to the bibliography entries, for instance to the PDF file of the publication, or to the journal that publishes it, as illustrated in Figure 1. Typically you only want such links in the version of the bibliography that is published in HTML, but not in a paper version of the bibliography, or in a citation in a normal paper; a hyperlink in printed documents doesn’t make much sense. As a consequence you have to maintain two sets of BibTEX entries, The second problem is the maintenance of multiple types of bibliographies or publication lists. You may have to produce publication lists for each member of a research group, for each project that you manage (or each project in the department). You may have to produce publication lists organized by year and by type. You may have to maintain similar bibliographies for multiple documents, e.g., you CV, web publication list, or project pages. For each of these documents you can create LATEX files as described in the previous section, which is tedious. As new publications are added to the BibTEX files, each of the documents has to be updated, which is even more tedious. More so, you have to reconsider in which section of which document a publication should be included. When publications change status (e.g., from draft to published), references that were already included may have to be repositioned. It is likely that the bibliographies will become incomplete.
4.1
Automatic Generation of Publication Lists
The BibTEX tools package was developed to overcome these problems by automatically generating a publication list from a BibTEX file. The main tool in the collection, bib-to-html, produces from a BibTEX file publication lists in PDF and HTML, with the following properties: • bib-to-html derives hyperlinks to include in entries from url fields, thus allowing an entry to be used with hyperlinks in an HTML page, and without in a normal document. • bib-to-html selects the entries to include in the publication list based on some criterion. For example, exclude technical reports and drafts, or include only the publications associated with a certain project. • bib-to-html organizes the entries into sections. For example, organization by year, by type, by year and type, key publications first, or by project. • bib-to-html is programmable, i.e., parameterized with selection queries and formatting templates, rather than offering a fixed set of possibilities. • bib-to-html uses the same workflow as outlined in the previous section. The added value comes from the automatic analysis of bib files, the generation of the necessary LATEX files, and the encapsulation of the various operations; running bib-to-html does it all.
4.2
Components for BibTEX Processing
The development of bib-to-html required a number of basic components such as a parser and a pretty-printer, since regular expression matching is not sufficient for
10
analyzing BibTEX entries. Making selection and formatting of publication lists programmable, required a domain-specific query language for BibTEX. By providing these ingredients as separately executable components it is easy to create other end-user tools for BibTEX processing. The components are implemented in the transformation language Stratego, which allows very concise implementations. All transformations in BibTEX Tools are implemented in about 1200 lines of Stratego code (including whitespace and comments); 200 of which should really be part of the Stratego library. Thus, BibTEX Tools provides basic components for new BibTEX processing tools, which can be created by combining existing components with new transformation components.
4.3
End User Tools
The following end-user tools are currently provided: bib-to-html -i file.bib [options ] Generates a publication list in html and pdf from a BibTEX file with the following options: --select file : BibTEX query for selection of entries (default: select all) --template file : LATEX file with embedded BibTEX queries for presentation --all-templates : use all default templates --obib : name of output bibtex file --enable-bib-refs on|qoff : include reference to bib entry (default: on) --enable-hevea on|off : build html using hevea (default: on) --enable-pdf-ref on|off : include link to pdf and bibtex files (default: on) -a : input is abstract syntax aux-to-bib -i file.aux -o file.bib -I dir1 -I dir2 ... Derives a bibliography file file.bib specific for a document, based on the citation data in file.aux. bib-transform -i file1.bib ...
-o comb.bib [--pp] [transforms ]
Combines multiple BibTEX files (file1.bib, file2.bib, ...) into a single file (comb.bib) and optionally applies a number of transforamtions to the entries in the files. The --pp option is needed to produce a BibTEX file in text format, otherwise the result is an abstract syntax tree in ATerm format. The transformations are: --desugar: reduce syntactic variations --inline: inline string definitions (implies desugaring) --uniq: remove duplicate entries with the same key (implies desugaring) --normalize: all of the above --add-refs: add hyperlinks to entries and split off individual files for entries bib-format -i file1.bib -o file2.bib Pretty-prints a BibTEX file file1.bib and saves the result as file2.bib.
11
4.4
Components
The following are components that are used in the tools above, but are available as stand-alone executable tools and can be easily used in other combinations. They typically operate on a BibTEX file in abstract syntax format. parse-bibtex -i file.bib -o file.abib Parses a BibTEX file file.bib, producing an abstract syntax tree in the ATerm format written to file.abib. The following tools operate on such abstract syntax tree representations of BibTEX files, rather than text representations. pp-bibtex -i file.abib -o file.bib Pretty-prints an ATerm abstract syntax tree representation of a BibTEX file in file.abib to text in file file.bib. bib-desugar -i file1.abib -o file2.abib Normalizes BibTEX entries in file1.abib to use lowercase types and fields, use curly braces instead of double quotes, and removes comments between entries. bib-inline -i file1.abib -o file2.abib Inlines string definitions and normalizes the fields they are used in. bib-uniq -i file1.abib -o file2.abib Removes duplicate entries that have the same key. bib-split --dir dir -i file1.abib -o file2.abib Creates a separate BibTEX file for each entry (using the key as name) and adds a urlbib field with the name of the file as URL. Hyperlinks to bib entries can be turned off using the --enable-bib-refs off option. bib-add-refs -i file1.abib -o file2.abib Adds hyperlinks to the note field based on url fields. For each field of the form urlx = val , a note with name x linked to val is added. The first url field in the entry is used as the URL for the title. bib-query -q file.btq -i file1.abib -o file2.abib Selects the entries from file1.abib that match the BibTEX query in file.btq and writes the result to file2.abib. template-to-latex opts Creates a LATEX file from a template, i.e., a LATEX file with embedded BibTEX queries.
12
@string{uutechreps = {http://archive.cs.uu.nl/pub/RUU/CS/techreps/}} a comment @inproceedings{DVJ04, author = {Eelco Dolstra and Eelco Visser and Merijn de Jonge}, title = {Imposing a Memory Management Discipline on Software Deployment}, booktitle = {International Conference on Software Engineering (ICSE’04)}, pages = {583--592}, month = {May}, year = 2004, address = {Edinburgh, Scotland}, urlpdf = uutechreps # {CS-2004/2004-044.pdf} } Entries( "" , [ (String("string", StringField("uutechreps", Words(["http://archive.cs.uu.nl/pub/RUU/CS/techreps/"]))), "a comment\n\n") , ( Entry("inproceedings", "DVJ04" , [ Field("author", Words(["Eelco", "Dolstra", "and", "Eelco", "Visser", "and", "Merijn", "de", "Jonge"])) , Field("title", Words(["Imposing", "a", "Memory", "Management", "Discipline", "on", "Software", "Deployment"])) , Field("booktitle", Words(["International", "Conference", "on", "Software", "Engineering", "(ICSE’04)"])) , Field("pages", Words(["583--592"])) , Field("month", Words(["May"])) , Field("year", Id("2004")) , Field("address", Words(["Edinburgh,", "Scotland"])) , Field("urlpdf", ConcValue(Id("uutechreps"), Words(["CS-2004/2004-044.pdf"]))) ] , NoComma ) , "" ) ] , "" )
Figure 6: BibTEX file and its abstract syntax tree representation in ATerm format.
5
Concrete and Abstract Syntax
In order to transform BibTEX files, they are first parsed. The parse-bibtex tool parses a BibTEX file and outputs an abstract syntax tree representation of the file in the ATerm format [2]. The reverse operation, turning an abstract syntax tree into text is called pretty-printing and is implemented by pp-bibtex. Figure 6 illustrates the result of parsing a BibTEX file. From this example, we can learn a couple of things about the structure of BibTEX files. A file consists of a list of entries preceded and followed by comments. An entry is a pair of an entry and the comment that follows it. This does not necessarily reflect the intention of the author, but we don’t care much about comments anyway and will soon throw them away (next section). An entry consists of a type (e.g., inproceedings), a key (e.g., DVJ04), and a list of fields. A field is a pair of a field name and a value. Field values are not represented by a single string, but are decomposed into their separate words, and word groups. This makes it easy to search fields for occurrences of particular words. The value of the urlpdf field is composed of a reference to a string (uutechreps), and a list of words. A complete syntax definition of BibTEX in the syntax definition formalism SDF2 [9] is presented in Appendix A.
13
6
Transforming Entries
A real contribution of the BibTEX tools described in this paper is the collection of transformations it provides, and the ease of extending this set of transformations using Stratego. The bib-transform tool is an end user tool that provides a convenient interface for calling the various basic transformations. But it is also possible to combine these basic transformations in other ways.
6.1
Desugaring (Normalizing)
BibTEX has a bit of redundancy in its syntax. Entries may use curly braces or parentheses. Field values can be enclosed in curly braces or in double quotes. Types and field names can be written in any combination of upper and lower case letters; their interpretation is case insensitive. The last field of an entry may be followed by a comma. This variability requires any transformations or analyses on BibTEX entries to consider many different (combinations of) cases. To reduce this complexity, the bib-desugar tool transforms each variation point to one of its alternatives. Here are some examples of desugarings of entries in the test file btxdoc.bib. Parentheses to curly braces: @string (SCRIBE-NOTE = {Chapter twelve and appendices E8 through E10 deal with bibliographies}) @string{SCRIBE-NOTE = {Chapter twelve and appendices E8 through E10 deal with bibliographies}}
Types and field names in lowercase, parentheses to curly braces, double quotes to curly braces, and removal of comments and spurious whitespace (the latter is done by parsing): The next entry shows some of the syntactically legal things that those with the inclination may use.@ MaNuAl
(scribe, TITLE="Scribe Document Production System
User Manual", ORGANIZATION =
{Unilogic,}#" Ltd"# {.
}, ADDRESS = "Pittsburgh", MONTH =aPR ,YEAR=1984, note = scribe-note, ) @manual{scribe, title = organization = address = month = year = note = }
May the inclination not be with you. {Scribe Document Production System User Manual}, {Unilogic,} # {Ltd} # {.}, {Pittsburgh}, aPR, 1984, scribe-note,
(Note that there is some work left to do: month names are not normalized.)
14
6.2
Inlining
String entries in BibTEX can be used to factor out commonly occurring field values, or parts of field values, such as journal names, or URLs. When querying and publishing entries, we’d like them to be self-contained. The bib-inline tool replaces references to strings within entries with their body, and concatenates values composed with the # operator. Example: @string{uutechreps = {http://archive.cs.uu.nl/pub/RUU/CS/techreps/}} @string{ICSE04 = {International Conference on Software Engineering (ICSE’04)}} @string{EelcoDolstra = {Eelco Dolstra}} @inproceedings{DVJ04, author = EelcoDolstra # { and Eelco Visser and Merijn de Jonge}, title = {Imposing a Memory Management Discipline on Software Deployment}, booktitle = ICSE04, pages = {583--592}, month = {May}, year = 2004, address = {Edinburgh, Scotland}, urlpdf = uutechreps # {CS-2004/2004-044.pdf} } @inproceedings{DVJ04, author = {Eelco Dolstra and Eelco Visser and Merijn de Jonge}, title = {Imposing a Memory Management Discipline on Software Deployment}, booktitle = {International Conference on Software Engineering (ICSE’04)}, pages = {583--592}, month = {May}, year = 2004, address = {Edinburgh, Scotland}, urlpdf = {http://archive.cs.uu.nl/pub/RUU/CS/techreps/CS-2004/2004-044.pdf}, }
6.3
Removing Duplicates
The bib-uniq tool removes duplicate entries that have the same key. There is nothing intelligent about this operation. No check is done that the two entries are actually identical otherwise, and the choice for removing one or the other is arbitrary. This tool should be extended in two ways. First at least a rudimentary check should be done that the two entries are very similar (same title, same authors). Next it would be convenient to detect entries are the same, but do not have the same key. There appears to be a tool that does these things in Beebe’s collection [1]. However, for comfortably working with bibtex a minimum requirement is to remove entries with the same key, as it confuses that tool to no end.
6.4
Splitting a BibTEX File
The bib-split tool splits a file into separate files for individual entries in a BibTEX file. The name of the file is the key of the entry with a .bib extension. Note that this transformation assumes that entries are selfcontained, i.e., that strings have been inlined and that there are no cross-references. An interesting extension would be to compute the closure of all entries required by a individual entries.
6.5
Adding Hyperlinks
The bib-add-refs tool interprets the url fields in entries by adding links. The first url field is used to create a hyperlink for the title. For each field of the form urlx = 15
val , a note with name x linked to val is added. The first url field in the entry is used as the URL for the title. @article{Foo03, title = {Foos in the Bar}, author = {Foo Bar}, journal = {The International Baz Journal}, volume = 1, year = 2003, urlpdf = {http://institute/~fbar/Foo03.pdf}, urlibj = {http://www.publisher.com/ibj}, urlbib = {./Foo03.bib}, } @article{Foo03, title = {\href {http://institute/~fbar/Foo03.pdf} {Foos in the Bar}}, author = {Foo Bar}, journal = {The International Baz Journal}, volume = 1, year = 2003, urlpdf = {http://institute/~fbar/Foo03.pdf}, urlibj = {http://www.publisher.com/ibj}, urlbib = {./Foo03.bib}, note = {(\href{http://institute/~fbar/Foo03.pdf}{pdf}, \href{http://www.publisher.com/ibj}{ibj}, \href{./Foo03.bib}{bib})}, }
6.6
Summary
With the transformations discussed in this section we can now solve the first part of the problem raised in Section 4, adding hyperlinks to bibliography entries. Transforming a BibTEX file with bib-transform --normalize --split --add-refs \ -i file.bib -o file-refs.bib will desugar entries, inline strings, throw away duplicate entries, split off files for individual entries, and add hyperlinks. With the resulting BibTEX file you can create documents with ‘linked’ bibliographies, which may already be useful for PDF documents.
7
Selecting Entries with Queries
The selection of entries to be included in a bibliography can be done using BibTEX Query, a little BibTEX specific query language, loosely inspired by SQL. A query written in a .btq file is interpreted by the bib-query tool, which extracts all entries from a BibTEX file that match the query. The language provides two main types of queries, select and order by. Appendix B presents a complete syntax definition of the BibTEX Query language.
7.1
Select
The select query command of the form [ref1 :=] select [from ref2 ] query
16
selects the entries from ref2 that match query , and assigns it to ref1 . The references are optional; when left out they default to the current set of entries. For the first query this is the initial set of entries. The query part of the select command is a predicate on entries testing one or more properties. Queries are composed from basic queries on types, keys, and values, which can be combined using Boolean operators, negation, conjunction, and disjunction. A type query ‘type is name ’ succeeds if the type of an entry is name . For example, the following query selects all journal articles: select type is article Similarly a key query key is name succeeds if the key of the entry is name . Forexample, the following query selects the entry with key Vis04.strategoxt: select key is Vis04.strategoxt A value query ‘name field pred fq ’ succeeds if the predicate pred succeeds for the value of the name field with field query fq . There are a number of predicates that can be used here. The predicate is checks for literal equality. For example, the query select year field is 2004 selects all entries with year field 2004. The predicate < checks that the value in the field is less than the value in the query. For example, the query select year field < 2004 selects all entries from before 2004. The predicate contains checks that the field contains the value in the query. For example, the query select author field contains {Dolstra} checks that the author field contains the word Dolstra. Queries can be combined with Boolean operators for conjunction (&), disjunction (|), and negation (!). For example, the query select type is article & year field > 2000 & year field < 2005 selects all journal articles between 2000 and 2005 (non-inclusive). Similarly, the query select author field contains {Dolstra} & year field < 2005 selects all entries with Dolstra as coauthor from before 2005. The combinators can also be used in the field values. For example, the query select pubcat field is !( {draft} | {published techreport} | {obsolete} | {grant} | {website} | {webpage} | {documentation} ) selects all entries for which the pubcat field is not one of draft, etc. (I use this one to include only published documents in a publication list.)
17
7.2
Order By
The order by query command of the form [ref1 :=] order [from ref2 ] by name field (ascending | descending) sorts the entries from ref2 according to the value of field name in asscending or descending order. This command is most useful within templates to organize a bibiography into sections. However, it can also be used to reorganize a BibTEX file. For example, the query order by year field descending reorders the entries in a BibTEX file by year, starting with the most recent year first. Query commands can be combined in a sequence. The first query is applied to the initial list of entries. Each subsequent query works on the result of the previous queries. The queries after an order by command are applied to the sections of the ordering. For example, the sequence select year field > 1998 order by year field descending order by author field ascending first selects the publications after 1998, then orders them by year in reverse chronological order, and finally orders the entries of each year by author.
8
Formatting Bibliographies with Templates
The final task of creating a publication list is to organize the document into sections with the appropriate selection of entries in each section. To achieve this bibtex-tools provide an embedding of BibTEX Query in LATEX, which can be used create bibliography templates. Given a template and a BibTEX file, the template-to-latex tool replaces queries embedded in the template by \BibSections with citations to the selected entries to result in a document as in Figure 4. The basic template for creating a publication list by year in Figure 7 contains all the ingredients for BibTEX Query templates. The \bibtexquery{query}{scope} command declares a query in its first argument, which is applicable in the second argument. The query applies to the current set of entries and imposes a selection or ordering on these entries. The \bibtexqueryref{var} command produces the name of the current bibliography. Finally, the \bibtexshow command inserts the selected entries as a bibliography. Figure 8 presents a refinement of the previous template, in which the entries of each year are further divided into entries per type. Figure 9 presents a template that first shows a section with all key publications (that have key in their category field), then a section with the refereed non-key publications, and finally a section with non-refereed publications.
18
\bibtexquery{ @year := order by year field descending }{ \subsection*{\bibtexqueryref{@year}} \bibtexshow }
Figure 7: Template by-year.ltx.
\bibtexquery{ @year := order by year field descending }{ \subsection*{\bibtexqueryref{@year}} \bibtexquery{ select from @year type is book | incollection | phdthesis | proceedings | chapter }{ \subsubsection*{Books\ and\ Book\ Chapters} \bibtexshow } ... }
Figure 8: Template by-year-by-type.ltx.
\bibtexquery{ select category field {and} contains {key} }{ \section*{Key\ Publications} \bibtexquery{ @year := order by year field descending }{ \subsubsection*{\bibtexqueryref{@year}} \bibtexshow } } \bibtexquery{ refereed := select !category field {and} contains {key} & category field {and} contains {refereed} }{ \section*{Other\ Refereed\ Publications} \bibtexquery{ @year := order from refereed by year field descending }{ \subsubsection*{\bibtexqueryref{@year}} \bibtexshow } } \bibtexquery{ select !category field {and} contains {key} | {refereed} }{ \section*{Non-Refereed\ Publications} \bibtexquery{ @year := order by year field descending }{ \subsubsection*{\bibtexqueryref{@year}} \bibtexshow } }
Figure 9: Template key-publications.ltx.
19
9
Problems
The current version of bibtex-tools does not take crossreferences into account.
10
Future Work
There are numerous ways to improve and extend the bibtex-tools package. This will mostly be done on demand; there is no point in extending a package that is not being used. If you have an idea for an extension or improvements, there are a number of ways to get these implemented: 1. Send an email to the author at
[email protected] 2. Submit an issue report in the issue tracking system for the project: https://bugs.cs.uu.nl/secure/BrowseProject.jspa?id=10040 or go to the BibTeX Tools (BTT) project at https://bugs.cs.uu.nl/. 3. Implement it yourself; I would be happy to include your contributions in the package. The last method is the most effective; I cannot guarantee a quick response to feature requests or bug reports.
11
License
Copyright (C) 2004,2005
Eelco Visser
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
20
A A.1
Syntax of BibTEX BibTeX-Layout
module BibTeX-Layout exports sorts C ParComment BrComment lexical syntax [\ \t\n\13] -> LAYOUT ~[\@]* -> C ~[\)]+ -> ParComment ~[\}]+ -> BrComment context-free restrictions LAYOUT? -/- [\ \t\n\13] C -/- ~[\@\0]
A.2
BibTeX-Lexical
module BibTeX-Lexical exports sorts String Preamble Comment lexical syntax [Ss][Tt][Rr][Ii][Nn][Gg] -> String [Pp][Rr][Ee][Aa][Mm][Bb][Ll][Ee] -> Preamble [Cc][Oo][Mm][Mm][Ee][Nn][Tt] -> Comment sorts Name EName Key lexical syntax [A-Za-z0-9\-\_\/]+ -> Name Name -> EName Preamble -> EName {reject} String -> EName {reject} Comment -> EName {reject} ~[\ \t\n\,\=\{\}\@]+ -> Key
A.3
BibTeX-Values
module BibTeX-Values imports BibTeX-Lexical exports sorts ValWord ValLetter ValLetterDQ ValWordDQ ValWS lexical syntax ~[\{\}\ \t\n] -> ValLetter ValLetter+ -> ValWord ~[\{\}\ \t\n\"] -> ValLetterDQ ValLetterDQ+ -> ValWordDQ [\ \t\n\r]+ -> ValWS syntax "{" "}" -> "{" "}" -> lexical restrictions ValWord -/- ~[\}\ \t\n] ValWordDQ -/- ~[\}\ \t\n\"] ValWS -/- [\ \t\n\r]
21
context-free syntax "{" ValWord* "}" -> ValWord {cons("Group")} "{" ValWord* "}" -> ValWordDQ {cons("GroupDQ")} context-free restrictions ValWord -/- ~[\}\ \t\n] ValWordDQ -/- ~[\}\ \t\n\"] sorts Value context-free syntax Name "{" ValWord* "}" "\"" ValWordDQ* "\"" Value "#" Value
A.4
-> -> -> ->
Value Value Value Value
{cons("Id")} {cons("Words")} {cons("QWords")} {left,cons("ConcValue")}
BibTeX
module BibTeX imports BibTeX-Layout BibTeX-Lexical BibTeX-Values hiddens context-free start-symbols Entries exports sorts Field StringField Entry Comma Entries context-free syntax Name "=" Value -> Field {cons("Field")} Name "=" Value -> StringField {cons("StringField")} "@" "@" "@" "@" "@" "@" "@" "@"
B B.1
Comment "{" BrComment "}" Comment "(" ParComment ")" Preamble "{" Value "}" Preamble "(" Value ")" String "{" StringField "}" String "(" StringField ")" EName "{" Key "," {Field ","}* Comma "}" EName "(" Key "," {Field ","}* Comma ")"
-> -> -> -> -> -> -> ->
Entry Entry Entry Entry Entry Entry Entry Entry
{cons("Comment")} {cons("CommentParen")} {cons("Preamble")} {cons("PreambleParen")} {cons("String")} {cons("StringParen")} {cons("Entry")} {cons("EntryParen")}
","
-> Comma {cons("Comma")} -> Comma {cons("NoComma")}
C (Entry C)* C
-> Entries {cons("Entries")}
Syntax of BibTEX Query BibTeX-Query
module BibTeX-Query imports BibTeX BibTeX-QueryCombinators BibTeX-ValueQueries BibTeX-TypeQueries BibTeX-KeyQueries
22
hiddens context-free start-symbols Select QueryCommands exports sorts QueryCommand Select OrderBy From Order Ref lexical syntax "@" Id -> Var context-free syntax QueryCommand+ -> QueryCommands {cons("Commands")} Select -> QueryCommand OrderBy -> QueryCommand Assign? "select" From? Query Select "from" Ref
-> Select {cons("Select")} -> QueryCommand -> From {cons("From")}
Assign? "order" From? "by" Name "field" Order -> OrderBy {cons("OrderBy")} "ascending" -> Order {cons("Ascending")} "descending" -> Order {cons("Descending")} Ref ":=" Var Name
B.2
-> Assign {cons("Assign")} -> Ref {cons("Var")} -> Ref {cons("Name")}
BibTeX-QueryCombinators
module BibTeX-QueryCombinators imports BibTeX exports sorts Query context-free syntax "all" -> "!" Query -> Query "&" Query -> Query "|" Query -> "(" Query ")" ->
Query Query Query Query Query
{cons("All")} {cons("Not")} {cons("And"),left} {cons("Or"),left} {bracket}
context-free priorities "!" Query -> Query {cons("Not")} > Query "&" Query -> Query {cons("And"),left} > Query "|" Query -> Query {cons("Or"),left}
B.3
BibTeX-KeyQueries
module BibTeX-KeyQueries imports BibTeX
23
BibTeX-QueryCombinators exports sorts KeyQuery context-free syntax "key" "is"
KeyQuery
-> Query {cons("KeyQuery")}
Key -> KeyQuery "!" KeyQuery -> KeyQuery KeyQuery "&" KeyQuery -> KeyQuery KeyQuery "|" KeyQuery -> KeyQuery "(" KeyQuery ")" -> KeyQuery
{cons("CheckKey")} {cons("Not")} {cons("And"),left} {cons("Or"),left} {bracket}
context-free priorities "!" KeyQuery -> KeyQuery {cons("Not")} > KeyQuery "&" KeyQuery -> KeyQuery {cons("And"),left} > KeyQuery "|" KeyQuery -> KeyQuery {cons("Or"),left}
B.4
BibTeX-TypeQueries
module BibTeX-TypeQueries imports BibTeX BibTeX-QueryCombinators exports sorts TypeQuery context-free syntax "type" "is"
TypeQuery
-> Query {cons("TypeQuery")}
EName -> TypeQuery "!" TypeQuery -> TypeQuery TypeQuery "&" TypeQuery -> TypeQuery TypeQuery "|" TypeQuery -> TypeQuery "(" TypeQuery ")" -> TypeQuery
{cons("CheckType")} {cons("Not")} {cons("And"),left} {cons("Or"),left} {bracket}
context-free priorities "!" TypeQuery -> TypeQuery {cons("Not")} > TypeQuery "&" TypeQuery -> TypeQuery {cons("And"),left} > TypeQuery "|" TypeQuery -> TypeQuery {cons("Or"),left}
B.5
BibTeX-ValueQueries
module BibTeX-ValueQueries imports BibTeX BibTeX-QueryCombinators
24
exports sorts Predicate ValueQuery context-free syntax Name "field" Value? Predicate ValueQuery -> Query {cons("FieldQuery")} "is" "contains" ">" "=" "