Enhancing Enterprise Knowledge Processes via Cross-Media Extraction Jose´ Iria
Victoria Uren
The University of Sheffield 211 Portobello Street Sheffield, S1 4DP, UK
Knowledge Media Institute The Open University Milton Keynes, MK7 6AA,UK
[email protected]
[email protected]
ABSTRACT In large organizations the resources needed to solve challenging problems are typically dispersed over systems within and beyond the organization, and also in different media. However, there is still the need, in knowledge environments, for extraction methods able to combine evidence for a fact from across different media. In many cases the whole is more than the sum of its parts: only when considering the different media simultaneously can enough evidence be obtained to derive facts otherwise inaccessible to the knowledge worker via traditional methods that work on each single medium separately. In this paper, we present a cross-media knowledge extraction framework specifically designed to handle large volumes of documents composed of three types of media – text, images and raw data – and to exploit the evidence across the media. Our goal is to improve the quality and depth of automatically extracted knowledge.
Categories and Subject Descriptors H.2.4 [Systems]: Multimedia databases; H.3.3 [Information Search and Retrieval]; I.2.1 [Applications and Expert Systems]: Office automation.
General Terms Algorithms, Design, Human-Factors
Keywords Cross-media knowledge extraction, Large-scale datasets, Industrial applications
1.
INTRODUCTION
In large organizations the resources needed to solve challenging problems are typically dispersed over systems
Copyright ACM ...$5.00
within and beyond the organization, and also in different media. For example, to diagnose the cause of failure of a component, engineers may need to gather together images of similar components, the reports that summarize past solutions, raw data obtained from experiments on the materials, and so on. The effort required to gather, analyze and share this information is considerable. In the X-Media project1 we are investigating the potential of rich semantic metadata as a lingua franca to connect up dispersed resources across media and support knowledge reuse and sharing. Automatic capture of semantic metadata is an economic imperative for the widespread deployment of such systems and is already available for single medium scenarios: named entity recognition and information extraction for text, scene analysis and object recognition for images, and pattern detection and time series methods for raw data. However, there is still the need, in knowledge environments, for extraction methods able to combine evidence for a fact from across different media. In many cases, as we will exemplify in this paper, the whole is more than the sum of its parts: only when considering the different media simultaneously can enough evidence be obtained to derive facts otherwise inaccessible to the knowledge worker via traditional methods that work on each single medium separately. Our goal is to improve the quality and depth of the extracted knowledge while providing users with joined-up views over dispersed resources. In this paper, we first motivate the cross-media extraction problem by presenting two real world use cases, one for fault diagnosis in aero-engines and the other for competitor analysis in car manufacture. We then summarize the requirements posed by the use cases on the design of knowledge extraction systems. The core of the paper follows, which describes a framework specifically designed to perform cross-media extraction on a large scale. We finish with a brief revision of related work, followed by the conclusions and future work. 1
http://www.x-media-project.org
2.
MOTIVATION
Knowledge workers face four key challenges. The first is to gather the knowledge relevant to a task or problem, which may be dispersed across different storage systems and different media. The second is to analyze the knowledge they have gathered and make sense of it. The third is to share the knowledge with their colleagues. These three challenges are contextualized by knowledge workers’ tasks and the processes they follow to accomplish them. Keeping track of the process, by being aware of what one is doing, what one needs to do next, and what others are doing, is the fourth challenge. What to search for, what analysis is needed and who to share with, all depend on the task in hand and the current stage of the process. The X-Media project is designing and implementing innovative knowledge extraction systems to tackle the first challenge, and knowledge sharing and reuse tools to tackle the remaining challenges (but the latter fall outside the scope of this paper). We have gathered user requirements at our industrial partners’ sites using a user centred design process [15]. The following subsections ground our vision and motivate the requirements for the knowledge extraction systems being developed by presenting selected aspects of two use cases: problem resolution for aero-engines and competitors scenario forecast for car manufacture. Such systems are capable of extracting information from compound documents2 containing text, images and raw data (usually in the form of tabular data with numeric fields).
2.1
Rolls-Royce: Problem Resolution
This use case, defined in cooperation with Rolls-Royce plc (RR), deals with collaborative information retrieval and analysis to determine the root cause of problems discovered during routine maintenance of aircraft engines. This is a very important process for the company, as it helps in understanding the real cause(s) behind inservice or maintenance events of an aircraft, contributing to the more general goal of improving engine design and minimizing disruptions to the fleet. A very simplified description of the process Problem Resolution is as follows. Currently, the process involves the work of a team of specialized engineers, who are recruited according to their experience in dealing with similar problems in the past. They first manually search and collect as much information relevant to the problem as possible. Subsequently, they formulate hypotheses about potential root causes, some of which are selected for verification. The process cycles until a satisfactory explanation is found. Its duration naturally depends on several circumstances, but it can be extremely costly for harder problems. 2
We refer to any document that contains mixed media types as “compound document”.
X-Media aims to provide end-user systems that monitor and support the process, so as to maximize efficiency and cooperation between the team members. One of the enhancements to the current process consists in automating the extraction of knowledge from various distributed sources and media, so as to make that knowledge available for the team to search and browse in a more efficient way. A vast repository of “dormant” knowledge is to be found in the form of large amounts of documents on the intranet, such as technical reports and event reports about a given engine. These documents consist of a mix of inter-related text, images of components, and raw data from lab experiments, which present complementary information about the engine in question. We aim to automate the extraction of knowledge from this repository to enable on-demand retrieval of knowledge about similar problems from the past. The documents dealt with by the team during the process also mix text, images and tabular data to convey the message intended by the author of the document. For example, emails exchanged between team members very often contain a textual description of a problem or new finding together with images and/or tabular data attached to better illustrate what is said in the email body. Another common example consists of presentation slides showing text and images which, again, contain complementary information.
2.2
FIAT: Competitors Scenario Forecast
This use case, defined in cooperation with Fiat S.p.A (FIAT), concerns forecasting the launch of competitors’ models. It comprises collecting information about the features of competitors’ vehicles from various data sources and producing a calendar that illustrates the prospective launches. The information needed to achieve that is to be found scattered throughout the Internet, including in blogs and forums, and covered by international automotive magazines as well by a long tail of automotive national magazines. The collected information is used in the Set up stage of new FIAT vehicles (the development stage where a first assessment of the future vehicle’s features is carried out). This process is of great value to the company because it contributes to keeping vehicle design up to date with the always evolving competitors scenario. End-user systems are being developed within X-Media that are able to track knowledge changes and of being proactive in supporting knowledge workers during the Set up stage. To enable that, the underlying knowledge extraction systems are required that to be able to handle such rapidly evolving multimedia data sources on a large scale. As in the previous use case, documents contain complementary information across the media. Here we give a concrete example. The compound document illustrated
data, the shape of the information landscape in enterprises has radically changed – multimedia documents now abound, inside which evidence for the knowledge is not only to be found confined to a single medium, but very often across two or more media. The new requirement for knowledge extraction methodologies and systems is therefore to be able to exploit such evidence, with a real potential to improve the quality and depth of the automatically extracted knowledge and, consequently, enhance enterprise knowledge processes.
3.2 Figure 1: Example of a compound document in Fiat’s Competitors Scenario Forecast use case. in Figure 1 is an example of a prototypical document collected by knowledge workers in this use case. The document contains photographs of the front part of the interior of a Toyota Yaris car along with text describing the depicted car components. End-user systems are being built that support issuing queries over the extracted knowledge, e.g. “find competitor car models with ergonomic air ducts”. The desired output of such systems for this query would be to present Yaris as a potentially interesting model and provide the worker with a set of images and text snippets, including the ones in the document shown. In order to achieve that, knowledge extraction systems must gather evidence from across the media: on the one hand, identification of the car model depicted in the images can only be done using the text, which explicitly mentions “Yaris”; on the other hand, identification of some of the car model components such as air ducts, steering wheel and gear lever can only be done using the images, since the text only mentions glove box, tray, pockets, bins and cup-holders. In section 4 we present a cross-media extraction framework that enables capturing knowledge from documents such as the one in Figure 1.
3.
REQUIREMENTS
In this section we list the requirements for knowledge extraction systems, identified through the analysis of the use cases presented in Section 2. The major requirements identified were the ability to exploit evidence for a fact across several media, and the ability to perform the extraction on a large scale. It is also worth mentioning a few of the other requirements identified, which complement the aforementioned ones and have also strong implications in design decisions: the ability to exploit background knowledge, portability and the ability to report uncertainty.
3.1
Ability to Exploit Evidence Across Media
As illustrated by the examples in the previous section, with the wide availability of devices and software capable of acquiring, generating and presenting multimedia
Ability to Extract on a Large Scale
Large companies’ intranets, such as the ones maintained by FIAT and RR, contain nowadays dozens of millions of documents and are soon expected to reach hundreds of millions, a dimension comparable to the Internet at the end of the 90s. Moreover, the increased use of the World Wide Web (WWW) as a source of information has made the boundary between intra and internet very thin, which dramatically increases the size of the search space. For the purposes of this work, the following aspects are considered in what respects large-scale: • Amount of content • Domain complexity • Amount of background knowledge available The core basic requirement is for knowledge extraction methods to be able to cope with the large number of documents provided by the use cases. Domain complexity here simply refers to the amount of concepts, and relations between those concepts, that define the problem domain – in practical terms, because we employ domain ontologies to represent a conceptualization of the domain, complexity of the domain corresponds, for our purposes, to the size of the ontologies. We will define what we mean by “background knowledge” in Section 3.3.
3.3
Other Requirements
Ability to Exploit Background Knowledge. Enterprise environments are rich in domain expertise and untapped resources, often overlooked by knowledge extraction systems even when accessible in digital form. Examples of background knowledge include, in general, mediaindependent resources such as domain ontologies and previously existing knowledge bases. Most importantly, background knowledge also comprises media-specific information. For example, in the Problem Resolution use case, text extraction methods can benefit from the use of external resources such as gazetteers - jet engines have typically 300,000 parts whose names can be compiled by simple methods and used during extraction; image analysis methods, for instance, can make
limitations of the models used or from the very nature of the data. For example, in the Fiat use case, since the scenario concerns forecasts and (sometimes) rumours, there is an inherent uncertainty about the knowledge extracted. As another example, many machine learning classification methods (such as those discussed in section 4.2), output predictive models able to report a confidence value for a prediction, which constitutes another (distinct) source of uncertainty. Thus, our framework should be able to report uncertainty in the extracted knowledge.
4.
KNOWLEDGE EXTRACTION FRAMEWORK
In X-Media the analysis and extraction of knowledge from documents plays a crucial role in the quality of the knowledge made available to the user. We have designed a knowledge extraction framework adequate for domains characterized by media heterogeneity and high volumes of data. We drew requirements from the use cases presented in Section 2 and, taking into account those requirements, sought to identify a way to put together extraction methods and technologies, both existing in the literature as well as our own novel approaches, so as to arrive at a framework capable of satisfying all of them. In designing the framework, we had to consider the tradeoff between several opposing forces, the most important of which being extraction accuracy vs. capability of processing high volumes of data, but also other tradeoffs such as extraction accuracy vs. need for user supervision, or portability vs. need for external resources. Figure 2: The architectural view of the proposed knowledge extraction framework. use of topological descriptors expressing relations between regions of photographs of jet engine parts; and raw data extractors can use information about the different frequency ranges of the several engine materials/components tested in the lab. Our framework should be able to easily incorporate these and other external domain-specific resources. Portability. To maximize reuse, knowledge management systems need to be portable across subject domains, languages and tasks. For example, Fiat’s competitors scenario forecast is a process likely to be revised frequently due to the nature of the task. This has strong implications on the choice of knowledge representation formalisms and on the selection of and research on knowledge extraction methods, e.g. adoption of machine learning approaches. Portability is a pervasive requirement to the work presented here. Ability to Report Uncertainty. Uncertainty is inherent to the knowledge extraction process. It can arise from
The scope of this section is to describe the architectural and functional elements of the framework, explaining the way the framework provides for cross-media and large-scale extraction.
4.1
Architectural View
The proposed framework consists of three main components: (i) a multimedia daemon that handles the content related tasks such as dismantling compound documents, enabling fast access of indexed data and making transparent to the rest of framework the data format variety, (ii) a knowledge extraction processor that operates on the output of the media manager with the aim to provide an interpretation of the content semantics and (iii) a knowledge base that facilitates the storage, retrieval and inference of knowledge. A graphical representation of the framework that illustrates the inner and inter dependencies between the components is depicted in Figure 2. The multimedia daemon is the functional component that retains direct access to the source content. Its primary role is to fetch and deliver content in the appropriate format and structure upon request of the other functional components. The implications of large scale
Figure 3: The functional view of cross-media knowledge extraction. cross-media extraction on the media manager component is the need for incorporating a layout aware feature extraction and storage mechanism for documents as well as an indexing scheme able to handle all different types of modalities (e.g., text, image and raw data). The layout mechanism is vital for applying cross-media extraction since it is the point where spatial relations between different modalities are captured and retained in a machine understandable format. On the other hand, indexing is a well known technique to efficiently deal with large volumes of data. The knowledge extraction processing component resides at the core of the cross-media framework since it is the place where the actual content processing and knowledge extraction takes place. The functionalities of this component tends to require considerable resources in terms of memory consumption and computational power as well as bandwith for data exchange. The employment of low complexity, fast algorithms for single media extraction and concept modeling is mandated by the large scale requirement and is the leading force driving the specification of this component. Section 4.2 gives details on the cross-media extractor subcomponent, and justifies some of the decisions in light of the large scale requirement. The knowledge base component accommodates the knowledge repository of the framework and is responsible for storing and providing access to the extracted and preexisting background knowledge, respectively. As mentioned in Section 3.2, the scale resistance of this component is affected both by the domain complexity and the amount of content. Thus, the large scale requirement here too mandates strict limitations on the level of expressiveness allowed for the knowledge representation language and the reasoning mechanisms supported by the knowledge base, and, thus, the output of knowledge extraction systems. We developed a structural model that allows representing the output of the extraction methods in RDF and OWL, based on the Core Ontology of Multimedia [3], an ontology that serves as basis for representing media objects, in particular to describe
decompositions of media objects and to describe media annotations. To deal with background knowledge on a large-scale, we batch-preprocess external resources and produce RDF triples into the knowledge base. Current RDF store technologies can store up to billions of RDF triples [8], which is suitable for the use cases in X-Media. As an overall development methodology, we have adopted a strategy similar to KnowItAll [5], where the core system, no more than a sophisticated web harvester, is gradually extended with more and more complex knowledge extraction modules. This allows to better study the performance issues that arise from working on a large scale as the development progresses.
4.2
Cross-Media Extraction
To handle extraction across the media, we have designed the machine learning-based framework presented in Figure 3. The framework receives as input a multimedia document (e.g. a failure report) and produces semantic annotations with set of inferred concepts. It is divided into the following steps: multimedia document processing, integration of single-media and cross-media information, background knowledge.
4.2.1
Multimedia Document Processing
It is the task of the Multimedia Document Processing step to extract single-media elements and their relations from the compound document. Document processing literature discusses several approaches to extract layout information from PDF, HTML and other structured documents, see [9] for an overview. Single-media KA algorithms process the content of the corresponding modality and cross-media KA algorithms process both the content from the different modalities and the layout information. Single-Media Features After extracting single-media content from compound documents, features are extracted from each single-media element. For image content, MPEG-7 low-level visual features [11] provide a rich description of the content in terms of colour, shape, texture, and histograms. From text content, we extract not
only the traditional bag-of-words, but we also perform fast entity [7] and relation extraction [6]. Raw data features are simple statistics such as statistical moments of data or the explicit detection of certain data patterns known to have some relevant meaning (e.g. sensor data indicating a component malfunction). Cross-Media Features As mentioned, a multimedia document may contain evidence for a fact to be extracted across different media. However, it is not straightforward to know which media elements refer to the same fact. Thus, the document layout and extracted crossreferences (e.g. captions) can suggest about how each text paragraph/segment relate to each image/raw data [2, 4, 14]. Arasu and Garcia-Molina [2], Crescenzi et al. [4] and Rosenfeld et al. [14] approaches are based on templates that characterize each part of the document. These templates are either extracted manually or semi-automatically. Rosenfeld et al. implemented a learning algorithm to extract information (author, title, date, etc). They ignored text content and only use features such as fonts, physical positioning and other graphical characteristics to provide additional context to the information. X-Media follows an approach similar to the one proposed by Rosenfeld et al., we extract a set of cross-media features for the types of documents we need to process. These cross-media features include: layout structure, distance between segments, cross-references, same type of font, font colour, and background colour/pattern. All such features can be extracted from PDF or HTML documents, providing the following steps with essential information about how media elements relate to one another.
4.2.2
Feature Processing
Sparse feature data such as text and dense feature data such as images have very different characteristics. In cross-media knowledge extraction the high diversity of types of data raises the need of pre-processing the data to produce a single common representation for all the data. The Feature Processing step aims at estimating a representation that will ease the task of the learning algorithm. We follow Magalh˜ aes and R¨ uger [10] and process text and image independently with probabilistic latent semantic indexing to produce a canonical representation of both text feature space and image/raw data feature space. This allows statistical learning algorithms to handle different types of data simultaneously more easily.
4.2.3
Figure 4: Example of dependencies on a probabilistic network. e.g. [10, 16]. The maximum entropy framework described in [10] fully addresses these issues by using a Gaussian prior (or a Laplacian prior) and a well known quasi-Newton optimization procedure. Wu et al. [16] have also deployed a two step framework that use several standard feature processing algorithms (e.g. ICA, PCA) and then fuse modalities with a support vector machine.
4.2.4
Cross-Media Dependencies Models
The previous step estimates the model of a single concept exclusively by using the concept’s own examples. However, semantic metadata provide information about concepts co-occurrence and how they co-occur across different modalities. This type of background knowledge describes the semantic structure of the problem that the cross-media extraction algorithm can exploit to enhance the model of each individual concept. Approaches like the ones proposed by Naphade and Huang [12], Preisach and Schmidt-Thieme [13] and Magalh˜ aes and R¨ uger [10] can capture the semantic structure of the problem and improve accuracy of the systems. Figure 4 illustrates the flexibility offered by probabilistic networks to address the X-Media specific requirements. Concept A is initially modelled in terms of its concept data: only text AT, only visual AV, or text, visual and cross-media features AM. The best representation of concept A is then a combination of all these representations and other dependencies relations with concepts B and C. These approaches produce a richer semantic representation than a single model for each concept, thus allowing to capture knowledge otherwise inaccessible.
Cross-Media Data Models
Once the feature data has been processed, some modelling algorithm can be used to create the knowledge models of all concepts. Special care must be taken when designing the algorithm to model each concept: it must support high-dimensional data, hundreds of thousands of examples, and low computational complexity. Several approaches have addressed similar problems,
5.
RELATED WORK
The technology focus in Knowledge Management has moved from simple keyword-based search towards more advanced solutions for extraction and sharing of knowledge [1]. The focus is still very much on providing more advanced text-based solutions, though image and video is considered by some industry players. There
are prospects for cross-media extraction and knowledge fusion entering this market. Recently many projects and Networks of Excellence dealing with knowledge extraction and sharing have been sponsored by European funds. Most address the problem of knowledge extraction over a single medium, but a few do address extraction over multimedia data, e.g., MUMIS 3 , Reveal-This 4 , MUSCLE 5 . However, most of the research is themed around video retrieval applications, which typically consider video, caption and speech analysis, differing quite substantially from XMedia’s need to analyze and mine documents comprising text, static images and raw data. In fact, X-Media’s knowledge-rich environments such as those presented in Sections 2.1 and 2.2 set it apart from other projects in the area.
6.
CONCLUSIONS
We have identified the need for cross-media knowledge extraction in technical domains. In the Problem Resolution use case, we saw how cross-media extraction could support on-demand retrieval of knowledge about similar problems from the past - a task which at present is hampered by the dispersal of evidence in different media. The Competitors Scenario Forecast use case motivates the need for cross-media extraction from public resources on the Web, to support the task of producing the new models’ launch calendar - a task with similar requirements to the previous one. The major requirements for knowledge extraction systems to support such use cases were identified to be the ability to exploit evidence for a fact across several media, and the ability to perform the extraction on a large scale.
7.
ACKNOWLEDGMENTS
This work was funded by the X-Media project (www.xmedia-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6026978.
8.
ADDITIONAL AUTHORS
Alberto Lavelli (Fondazione Bruno Kessler,
[email protected]), Sebastian Blohm (Universit¨at Karlsruhe,
[email protected]), Aba-sah Dadzie (University of Sheffield,
[email protected]), Thomas Franz (Universit¨at Koblenz-Landau,
[email protected]), Jo˜ao Magalh˜aes (University of Sheffield,
[email protected]), Spiros Nikolopoulos (Centre for Research & Technology Hellas,
[email protected]), Christine Preisach (Uni Hildesheim,
[email protected]), Piercarlo Slavazza (Quinary,
[email protected]).
9.
REFERENCES
[1] W. Andrews and R. E. Knox. Magic quadrant for information access technology. Technical report, Gartner Research (G00131678), October 2005. [2] A. Arasu and A. H. Garcia-Molina. Extracting structured data from web pages. In ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 2003.
The major contribution of this paper is in presenting a machine learning-based cross-media knowledge extraction framework specifically designed to handle large volumes of documents composed of text, images and raw data, with a high level of automation. The framework provides a structured approach in which multimedia documents are processed for single and cross-media features such as layout structure; cross-media data models are learned from both (after a feature processing step), and dependencies between domain ontology concepts are captured via a probabilistic network approach.
[3] R. Arndt, R. Troncy, S. Staab, and L. Hardman. Adding formal semantics to MPEG-7: Designing a well-founded multimedia ontology for the web. Techical report, Department of Computer Science, Univ. Koblenz-Landau, 2007.
Future work concerns two different aspects.First, the instantiation and evaluation of the proposed cross-media knowledge extraction framework in the two use cases described in this paper together with the real users. Second, carrying on research on improving the accuracy vs. scalability characteristics of both our single-medium and cross-media methods.
[5] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in knowitall. In Proceedings of the Thirteenth International World Wide Web Conference. ACM Press.
3
http://www.dcs.shef.ac.uk/nlp/mumis/ http://www.reveal-this.org/ 5 http://www.muscle-noe.org/ 4
[4] V. Crescenzi, G. Mecca, and P. Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In 27th International Conference on Very Large Databases (VLDB), 2001.
[6] C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In Proceedings of the 11th Conference of the
European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006. [7] J. Iria, N. Ireson, and F. Ciravegna. An experimental study on boundary classification algorithms for information extraction using svm. In Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining, Trento, Italy, 2006. [8] A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff. Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web, 2(1):49–79, December 2004. [9] A. Laender, B. Ribeiro-Neto, A. Silva, and J. Teixeira. A brief survey of web data extraction tools. In SIGMOD Record, volume 31, June 2002. [10] J. Magalh˜ aes and S. R¨ uger. Information-theoretic semantic multimedia indexing. In ACM Conference on Image and Video Retrieval (CIVR), Amsterdam, Holland, 2007. [11] B. S. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG 7: multimedia content description language. John Wiley & Sons, 2002. [12] M. R. Naphade and T. S. Huang. A probabilistic framework for semantic video indexing filtering and retrieval. In IEEE Transactions on Multimedia, volume 3, 2001. [13] C. Preisach and L. Schmidt-Thieme. Relational ensemble classification. pages 499–509, 2006. [14] B. Rosenfeld, R. Feldman, and J. Aumann. Structural extraction from visual layout of documents. In ACM Conference on Information and Knowledge Management (CIKM), 2002. [15] M. Rosson and J. Carroll. Usability Engineering: scenario-based development of HCI. Morgan-Kaufman, 2002. [16] Y. Wu, E. Y. Chang, K. C.-C. Chang, and J. R. Smith. Optimal multimodal fusion for multimedia data analysis. In MULTIMEDIA ’04: Proceedings of the 12th annual ACM international conference on Multimedia, pages 572–579, New York, NY, USA, 2004. ACM Press.