What is SGML, briefly? SGML is an abbreviation for the "Standard Generalized Markup Language". SGML is defined in an International Standard published by the International Organization for Standardization (ISO), with reference number ISO 8879:1986, bearing the full name "Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)". To most people, _markup_ means an increase in the price of an article. Although we talk about increases in value, it's not the same thing. "Markup" is a term coming from the publishing and printing business, where it means the instructions for the typesetter that were written on a typescript or manuscript copy by an editor. Today, with your favorite editor, you can enter the markup yourself, or even have it entered for you, in terms of codes or other instructions for an electronic typesetting program, which in simple cases is also the editor. An example is troff's ".ce" for "center the following line". A _markup_language_ is a set of means (constructs) to express how text (i.e., that which is not markup) should be processed, or handled in other ways. Unlike most other artificial languages, markup languages have to deal with embedded data, and contain rules for what is markup and what is data. For instance, in TeX the backslash means that subsequent input is TeX instructions. Most markup languages offer additional, administrative, language constructs, with which to define other language constructs (such as macros).
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 8 of 17
_Generalized_markup_ is markup that has the curious property that it does _not_ specify how things should look. We still call it markup, though, because of the similarity with markup as described above. For instance, "" and "" are used in this FAQ to denote Question and Answer, respectively. This doesn't say anything about how questions should look in a typeset edition of this FAQ. You could have all the questions rendered in bold-face, for instance. With generalized markup, you tell the system _what_ you have, rather than how it should look, and you do so by putting a label (tag) around the text. There is a clear correlation between tags and what things look like. Tags are placed at the start and at the end of text of a certain kind, and these are precisely the places where typographic features are used, such as spacing, change of typeface, etc. An example is LaTeX, which, through macros, let you talk about itemized lists, instead of indents, item numbering, among other things. The _Standard_Generalized_Markup_Language_ started out as GML, the Generalized Markup Language, created by Charles Goldfarb, Edward Mosher and Raymond Lorie (G, M, and L, respectively) in 1969 at IBM. GML became the basis for the Standard through work in ANSI and with aid from a project predating GML, GenCode, which attempted to standardize names of commonly used elements. Rather than take this (impossible) approach, SGML is a language which makes it possible to roll your own generalized markup, but with a standard form and in standard ways. (Historic note: The origin of SGML was confused with that of GenCode in the 199112-15 edition of this FAQ.) In practice, you won't exactly roll your own, any more than you design LaTeX packages on your own. Although some people actually do that! Central to the design of SGML is the idea that a set of generic identifiers (the names of the tags), together with their interrelationships, form a type (or class) of documents, and that every document is an instance of a class, which means it can be validated with respect to this class. Document markup: XML (from: http://www.w3.org/TR/1998/REC-XML-19980210) The Extensible Markup Language is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received and processed on the Web in the way that it is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. (from: http://www.w3.org/Talks/1998/1126-mjd-newmedia/all) XML: A Simple Example Acme Pharmaceuticals Co. 7301 Smokey Boulevard Smallville Indiana 94571
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 9 of 17
SGML Document Transformation: DSSSL (from: http://itrc.uwaterloo.ca/~papresco/dsssl/tutorial.html ) DSSSL is an international standard for associating processing with SGML documents. As you know, SGML itself is intended to allow the complete separation of the content of a document (text, structure, links), from the processing to be associated with it (usually formatting). So where a Word for Windows, Tex, or even LaTeX document would describe what a document looks like (in other words how a printer should "process" it), SGML documents would only describe the structure. Using DSSSL you can describe the processing of documents in a standard way. Since the two most common forms of document processing are formatting and transformation, DSSSL standardised these two processes first. Others may follow as they are needed. The first two are very powerful and many believe that DSSSL will "transform" (sorry) the world of SGML document processing Check : http://www.jclark.com/dsssl/ for another tutorial, and ftp://ftp.ornl.gov/pub/sgml/WG8/DSSSL/dsssl96b.pdf for the standard (293 pages).
ISO
DSSSL
XML Document Transformation: XSL (from: http://www.w3.org/TR/WD-xsl ) XSL is a language for expressing stylesheets. It consists of two parts: 1.a language for transforming XML documents, and 2.an XML vocabulary for specifying formatting semantics. An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary. From XSL to DSSSL: XSLJ (from: http://www.ltg.ed.ac.uk/~ht/xslj.html ) 1. What is XSL? XSL is the eXtensible Style Language, proposed as the language for style sheets for XML in A Proposal for XSL. 2. What is xslj? xslj is a virtually complete implementation of XSL by way of translation into extended DSSSL, as supported by the latest release of James Clark's DSSSL engine Jade. xslj translates valid XSL style sheets into valid extended DSSSL style sheets, which can then be used to render XML documents using Jade. DSSSL and Scheme ( from: http://itrc.uwaterloo.ca/~papresco/dsssl/tutorial.html ) Document Style Semantics and Specification Language
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 10 of 17
Language used to associate formatting rules with the elements of a structured document encoded using SGML. Consists of two parts, a tree transformation language that can be used to reorder structured documents prior to presentation, and a formatting process that associates formatting instructions with specific "tree nodes" in the document to be presented. Both parts of DSSSL are specified using a variant of the LISP list processing programming language called Scheme. DSSSL extends the basic IEEE-defined Scheme semantics by adding functions that can transform tree structures and provide the types of information about page dimensions, formatting rules, and language typically required by a text formatter. Scheme (from ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_5.html ; an on-line introductory book about Scheme) [ Warn people that this is partisan propaganda... ] Scheme is a very nice language for implementing languages, or for transformational programming in general--that is, writing programs that write programs--or for writing programs that can easily be extended or customized. The features that make Scheme attractive for implementing Scheme also make it good for all kinds of things, including scripting, the construction of new languages and application-specific programming environments, and so on. [ As you learn Scheme, you'll probably realize that all interesting programs end up being, in effect, application-specific programming environments...] Most Scheme systems are interactive, allowing you to incrementally develop and test parts of your program. In this respect, it is much like BASIC or Tcl--but a far cleaner and more expressive language. Scheme can also be compiled, to make programs run fast. This makes it easy to develop in, like BASIC or Tcl, but still fast, like C. (Scheme isn't usually quite as fast as C, but it's usually not too much slower, if you get a good Scheme compiler.) So if you're a Tcl or BASIC programmer looking for a less crufty and/or fossilized language, Scheme may be for you. Unlike most interactive languages, Scheme is well-designed: it's not a kludge cobbled up by some people with very limited applications in mind, and later extended past its reasonable scope of application. It was designed from the outset as a generalpurpose language, combining the best features of two earlier languages. It is fairly radical revision of Lisp, incorporating the best features of both Lisp and Algol (the ancestor of C, Pascal, et al.). (This is why Scheme has been adopted by several groups as an alternative to kludgey languages like Tcl and Perl. The Free Software Foundation's Guile extension language is based on Scheme. So is the Scheme Shell (scsh), which is a scripting language for UNIX. The CAD Framework Initiative has adopted Scheme as the glue for controlling Computer-Aided Design tools. The Dylan language is also based on Scheme, though with a different syntax and many extensions.) If you want to learn Lisp, Scheme is a good place to start. Common Lisp is a big, somewhat messy language, which is probably easiest to learn by starting with Scheme. Then you can understand Common Lisp as a series of extensions (and
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 11 of 17
significant obfuscation) of Scheme. Some of the best features of Common Lisp were copied from Scheme. If you want to get something of the flavor of functional programming, you can do that in Scheme--most well-written Scheme programs are largely functional, because that's simply the easiest way to do many interesting things. And if you just want to learn to program better, Scheme may open your eyes to new ways of thinking about programs. Many people prototype programs in Scheme, because it's so easy, even if they eventually have to recode them in other languages to satisfy their employers. Why Scheme Now? Scheme is not a new language--it's been around and evolving slowly for 20 years. The evolution of Scheme has been slow, because the people who standardize Scheme have been very conservative—features are only standardized when there is a nearuniversal consensus on how they should work. The focus has been on quality, not industrial usability. This policy has had two consequences. The first is that Scheme is a beautiful, extremely well-designed language. The second is that Scheme has been "behind the curve," lacking several features that are useful in general-purpose languages. Gradually, though, Scheme has grown from a very small language, suitable only for teaching concepts, to a very useful language. The most important new feature of Scheme (in my view) is lexically-scoped ("hygeinic") macros, which allow the implementation of many language features in a portable and fairly efficient way. This allows Scheme to remain small, but also allows useful extensions to the base language to be written as libraries, without a significant performance penalty. Groves (from: http://www.cogsci.ed.ac.uk/~ht/grove.html ; (c) Henry S. Thompson 1997) SGML Groves: A Partial Ilustrated Example Here's a trivial SGML document: ]> 1 2 And here's a picture of part of the grove that a conformant SGML parser (see the DSSSL spec. for details) should produce:
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 12 of 17
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 13 of 17
The many domain specific DTDs (From: Jon Bosak, Overview: XML, HTML, and all that; Sun Microsystems, April 11, 1997. Comment: presented here in order to show the “explosive” growth of markup conventions (called “babelic” in the introduction). Major industry DTDs (markup languages) ATA 2100 aircraft industry CMC pharmaceuticals DocBook computer software SAE J2008 automobile manufacturing TIM telecommunications ISO 12083 journal, book, and magazine publishing TEI academic and scholarly publishing HTML World Wide Web
CALS military, aerospace PCIS semiconductors IBMIDDoc IBM software TMC T2008 truck manufacturing EDGAR Securities and Exchange Commission ICADD publishing for the print-disabled UTF news media
Agent communication: KQML (from: http://www.cs.umbc.edu/kqml/whats-kqml.html ) What is KQML? KQML or the Knowledge Query and Manipulation Language is a language and protocol for exchanging information and knowledge. It is part of a larger effort, the ARPA Knowledge Sharing Effort which is aimed at developing techniques and methodology for building large-scale knowledge bases which are sharable and reusable. KQML is both a message format and a message-handling protocol to support run-time knowledge sharing among agents. KQML can be used as a language for an application program to interact with an intelligent system or for two or more intelligent systems to share knowledge in support of cooperative problem solving. KQML focuses on an extensible set of performatives, which defines the permissible operations that agents may attempt on each other's knowledge and goal stores. The performatives comprise a substrate on which to develop higher-level models of inter-agent interaction such as contract nets and negotiation. In addition, KQML provides a basic architecture for knowledge sharing through a special class of agent called communication facilitators which coordinate the interactions of other agents The ideas which underlie the evolving design of KQML are currently being explored through experimental prototype systems which are being used to support several testbeds in such areas as concurrent engineering, intelligent design and intelligent planning and scheduling. Agent communication: AL The description of AL (Alice Language), at the moment, is work in progress. AL is the outcome of G. Dionisi’s thesis 4. M. Gattoni and M. Giavazzi are preparing an 4
G. Dionisi, “AL: un linguaggio per descrivere la comunicazione tra agenti,” Dipartimento di Scienze dell'Informazione, Università di Milano, Milano, Italy, Master thesis 1998.
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 14 of 17
Italian and English version of http://weblab.crema.unimi.it/alice.
the
User
Manual,
to
be
found
in:
An excerpt of the User Manual (adapted) explains the reasons for the choice of the core representation of agents as Finite State Automata: Protocols and the formalisms for describing state changes in Agents In order to represent a communication process, various formalisms were proposed. Each of them has tried to describe a possible model of conversation, that is, define the flow of state changes occurring as a consequence of conversations. The description of the conversation's model may be called “a protocol”, i.e. a protocol defines the admissible sequence of messages or of type of messages inside a conversation. Let us review three formalisms used for describing conversation protocols between two agents: a single Finite State Automaton for both Agents, a Petri net, two communicating Finite State Automata Conversations described by a single Automaton. A conversation can be described by a single Finite State Automaton 5, where each state represents a state of two communicating agents and the set of possible speech acts - achievable in that state - and the transitions corresponds to an exchange of messages.. As an example, consider a conversation among two agents: the first one (user) asks for a service that the second one (provider) can offer. A picture describing the conversation is included hereafter.
This method is not perspicuous, as it does not distinguish between the different functionality’s and states of each of the two agents: it seems nearly possible that an agent can ask something and, at the same time, answer itself. Conversation described by Petri Nets
5
T. Winograd and F. Flores, Understanding Computers and Cognition: a new foundation for design: Ablex Publishing Corporation, 1986.
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 15 of 17
An agent – to – Agent conversation may be described by means of Petri Nets6. Let us consider the same example of a conversation as the one presented above, by means of a single Finite State Automaton. Hereafter a Petri Net graphical representation.
The Petri Net approach allows to identify the internal states and the functionalities of Agents and the evolution of the conversation. Using tokens of different colors, we may also visualize different concurrently occurring conversations. However, the “synchronization” events in Petri Nets (represented by the central column in the picture) are a form of shared memory that limits the suitability of the formalism for describing “autonomous” Agents with no shared memory or centralized control (synchronization). Conversation described by two communicating Finite State Automata. Here we have the graphical representation of the same example of conversation.
6
J. Ferber, Les systèmes multi-agents. Vers une intelligence collective. Paris: InterEditions, 1995.
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 16 of 17
As one may notice, the formalism does not assume any synchronization time for the agents, thus each Agent has private memory and autonomy. The only “shared” computational mechanism consists of exchanging messages, as it is the case in Operating Systems The two independent Automata start contemporaneously and change state at the moment they receive a message. The Automaton of the agent that begins the conversation, starts sending a message to the partner agent; this triggers a state change in the partner, that responds with a new message and so on to the end of the conversation. This last approach (Communicating Finite State Automata) is the one chosen to define AL's protocol. In order to complete the description of AL's protocol we have to show how single messages exchanged between agents are constructed. This is the aim of the next section. The two specific features of AL are the representation of Agents as Finite State Automata (briefly described above) and the compositionality of the Agent’s performatives as combinations of instances of three performative types. These performative types correspond to: tell, ask and reply in the sense that any message can be classified as belonging to one of the three categories: messages that don't want a reply (tell-like) messages that do want a reply (ask-like) messages that are a reply (reply-like) In AL the programmer may define his-her own performatives specifying to which group they belong. Further, new performatives may be defined by the user in terms of the Automata that process the conversation triggered by the new performative. AL is entirely developed in Scheme. Further information, including the code, may be asked by e-mail to the author. Agent communication: Jaskemal The description of Jaskemal (A Knowledge Manipulation Environment in Java and Scheme Languages within an Agent-based Architecture), at the moment, is work in progress. Jaskemal has been defined and prototyped by Daniele Maraschi (see: [7]and [9] )and later extended with the dynamic scheduler by Sergio Maffioletti in his thesis [8] . From the use of a first KAWA prototype the Scheme compiler into Java Byte Code has been extended to cover most of Scheme but also embedded into a full Agent’s language (see also http://www.cygnus.com/~bothner/kawa.html : notice the DSSSL interpreter; the current status of KAWA is quite advanced) The distinctive features of STROBE and AL have been implemented: Cognitive (First Class) Environments, KQML-like performatives from the Java Agent Template, the Finite State Automaton view of Agents during conversations and the compositionality of performatives). At the moment, we are documenting the language and using it within two applicative projects 7. Further information about Jaskemal, and the code, may be asked by e-mail to the author. 7
One of them is documented in: S. A. Cerri, V. Loia, P. Fontanesi, and A. Bettinelli, “Serendipitous acquisition of Web knowledge by Agents in the context of Human Learning,” presented at THAI-ETIS :
S.A.Cerri; Dynamic typing and lazy evaluation as necessary requirements for Web languages; European Lisp User Group Meeting; Amsterdam (NL), June 6-8 1999; p. 17 of 17
European Symposium on Telematics, Hypermedia and Artificial Intelligence, Varese, Italy, 1999.