Abstractive Summarization of Multi-modal Documents

Abstractive Summarization of Multi-modal Documents Charlie Greenbacker∗ Human Language Technologies Lab Dept. of Computer and Information Sciences University of Delaware

Article Text

Text Extraction Module

Knowledge Base

Graphic Extraction Module

Unified model of text & visual info

Generator Info Graphic

Summary of Graphic and Text

Figure 1: High-level overview diagram of proposed system.

Abstract In popular media, information graphics such as bar charts and line graphs are often used to complement the text of an article. Together, the graphic and text form a single multi-modal document serving some communicative goal (or goals) of the author. While screen readers make the text accessible for people who are blind, the information provided by the graphic is normally inaccessible. The SIGHT system analyzes the graphic and identifies its intended message, generating a text description of the visual information. However, this operation is performed on the graphic in isolation, outside the context of the original document in which it appears. We propose to exploit the information provided by the text of the article, and produce a unified summary of the concepts expressed by both the text and graphics in the complete document. CR Categories: I.2.7 [Artificial Intelligence]: Natural Language Processing—Language generation; I.2.4 [Artificial Intelligence]: Knowledge Representation Formalisms and Methods—Semantic networks Keywords: summarization, accessibility, information graphics

1

Introduction

Most automatic summarization techniques rank the relative importance of sentences in the original document and extract the most important ones in order to build the summary. However, in order to produce a unified summary of both the text and graphics in an article, we must integrate the knowledge obtained from both at the conceptual level. As we are unlikely to find sentences in the original ∗ e-mail:

[email protected]

what capture the integrated information from both text and graphics, we must create an abstractive summary by generating novel sentences to express the key concepts found in the source document. Our line of research is intended to accomplish this task.

2

Proposed Method

Document text will be analyzed by Sparser, a linguistically-sound, phrase structure-based chart parser with an extensive and extendible semantic grammar. It outputs knowledge base entries based on the concepts it recognizes in the text, and records the various realization forms used to express these concepts for later use by a generator. Graph analysis will be performed by the existing SIGHT system, which first converts the image into an XML representation, then identifies the intended message and extracts the key propositions [Demir et al. 2008]. Concepts extracted from both the text and graphics will be stored in a unified model built in KRISP, a knowledge representation system designed for semantic modeling of concepts. By identifying where information is most concentrated in the model, and using cues from document structure, we can determine which concepts are important enough for inclusion in the summary. Once the most important concepts have been selected and organized, surface realization will be handled by Mumble. By using realization forms observed by Sparser, we ensure the chosen concepts are expressed in a valid and natural manner [McDonald and Greenbacker 2010].

References D EMIR , S., C ARBERRY, S., AND M C C OY, K. F. 2008. Generating textual summaries of bar charts. In Proceedings of the Fifth International Natural Language Generation Conference (INLG 2008), ACL, 7–15. M C D ONALD , D. D., AND G REENBACKER , C. F. 2010. ‘If you’ve heard it, you can say it’ - towards an account of expressibility. In Proceedings of the Sixth International Natural Language Generation Conference (INLG 2010), ACL, 185–190.