Accessing Multimedia through Concept Clustering - Semantic Scholar

12 downloads 76163 Views 110KB Size Report
A computer program that can extract abstract meaning is, of ... Department of Computer Science ..... references to applets, New York City, Apple Computer,.
Reprinted from: Proceedings of CHI ‘97, (Atlanta, GA), March 1997, pp. 19-26.

Accessing Multimedia through Concept Clustering John Kominek, Rick Kazman University of Waterloo Department of Computer Science Waterloo, ON, Canada N2L 3G1 +1 519 888-4567 x4870 {jmkomine,rnkazman}@cgl.uwaterloo.ca1 ABSTRACT

Multimedia information retrieval is a challenging problem because multimedia information is not inherently structured. Jabber is an experimental system that attempts to bring some structure to this task. Jabber allows users to retrieve records of videoconferences based upon the concepts discussed. In this paper we introduce ConceptFinder, a sub-system within Jabber, and show how it is able to process the spoken text of a meeting into meeting topics. ConceptFinder can make subtle distinctions among different senses of the same words, and is able to summarize a set of related words, giving a name to each topic. Users can then use this name to query or browse the stored multimedia, through Jabber’s user interface. By presenting information that closely matches a user’s expectations, the challenge of multimedia retrieval is rendered more tractable.

undifferentiated stream of bytes. When editing and retrieval tools are provided, they are typically just digital metaphors of their analog counterparts: a CD player interface for music, a VCR player interface for video, etc. As with the real equipment, access is primarily linear. Offering fast-forward and fast-reverse helps, but barely touches the potential of random digital control. Attempts to structure and index multimedia, and thereby provide random access, have depended on simplification — either by restriction to a narrow and inherently well-structured domain (e.g. news broadcasts [12]), or by restriction to a single information channel (e.g. image recognition [1], [10]).

Multimedia indexing, information retrieval and browsing, concept clustering

Multimedia retrieval is a hard problem. The information is diverse and challenges—even confounds—any single indexing technique. In our opinion, only a collection of complementary techniques will truly suffice, with the strength of one compensating for the weakness of another. Our goal is to unearth some promising candidates.

INTRODUCTION

BACKGROUND AND PERSPECTIVE

Multimedia holds tremendous appeal because it brings interaction with a computer a step closer to real life. Multimedia “speaks” more directly and viscerally than mere text. Thus the explosion in prepared multimedia (encyclopedias and games on CDs) and in live multimedia (videoconferencing). Because of this appeal, live multimedia is increasingly being stored as an information resource. Video tapes have traditionally been the medium of choice but are being abandoned in favor of fully digital storage, under the promise of more natural and rapid access. The promise is easier to envision than to achieve.

A recent trend in database research, as exemplified by IBM’s Query by Image Content project, is retrieval based on visual information. While this is an appealing research area, it does not address the core problem in multimedia information retrieval. For the purpose of information retrieval, we believe that the most important aspect of multimedia is not the images. It is the words—what people say. More precisely, it is what people mean—the ideas, concepts, topics, and themes that arise from human discourse. A computer program that can extract abstract meaning is, of course, virtually impossible. Even moderate success is a struggle, as the Informedia project [2] has discovered. However, even moderate success can still have an enormous impact. Being able to index meetings, for example, can turn videoconferences into storehouses of organizational memory [12].

Keywords

The difficulty stems from the inherently unstructured and non-uniform nature of multimedia. Traditional databases operate by imposing a rigid structure on their contents. This succeeds in many business domains but at a high cost: your information must suit the database schema, not the other way around. Elements considered most deserving of the term multimedia—sound and video—are often treated as an

Our previous work on multimedia indexing was embodied in a system called Jabber [7]. We chose videoconferencing 1. Kazman’s current address: Software Engineering Institute, Carnegie Mellon University ([email protected]). Kominek’s current address: Electrical & Computer Engineering Department, UC Davis ([email protected]).

Figure 1: Jabber’s user interface, showing a composite query and recorded meetings as our target domain due to their popularity and free-form nature. The Jabber effort has concentrated in two main areas: indexing based on audio content, and the presentation of this information to the user. In particular, Jabber operates on the following tenets. • That combining multiple indexing techniques is more powerful than relying on just one. • That speech provides the most useful information, especially the spoken words. Esoteric technologies (e.g. facial and hand gesture recognition) are at most supplementary. • That the contents should be presented visually. It should provide a summary of the material, and locate relevant points. • That searching and browsing should be presented to the user as a unified task, without abrupt changes in user interaction. Jabber analyzes the audio streams of videoconferences from two perspectives: as simply a pattern of sounds, and as a sequence of words (thus requiring speech recognition). The fragility of speech recognition in the presence of noise is partially compensated by concept clustering, our primary indexing technique. We use concept clustering to summarize the textual contents, which helps integrate the user task of browsing and querying of multimedia meetings. Other researchers have made similar attempts to use concepts as an indexing technique, but their success has been

limited, owing to difficulties in word sense disambiguation [11], or to their reliance on manually constructed semantic representations [3]. Jabber suffers from neither of these limitations, as will be shown. JABBER DESCRIPTION

The indexing of videoconferences has been addressed in Jabber in four ways: 1) indexing by what people say, 2) indexing by people’s interaction patterns, 3) indexing by human-prepared meeting agendas, and 4) indexing by the participants’ use of a shared application [7].The last two forms of indexing do not pose any major problem and are not emphasized in our work. The more difficult first two points have been studied in detail. User Interaction

Jabber’s unified browsing/searching user interface is shown in Figure 1. The interface consists of a time line, a set of topics recognized by ConceptFinder, and a querying facility. The upper pane of the interface contains the time line view, which displays all the recognized words uttered by the various speakers in the videoconference in parallel horizontal lines. Each speaker’s time line is adjacent to their iconic image. Each of the recognized words is a selectable button on the time line; when selected the stored multimedia information (audio and/or video) is replayed from just before the point in time that word was uttered. The time-line view also shows the current temporal idiom

(discussed below) as a horizontal bar at the bottom of the pane. Finally, the time range being viewed is displayed, given in seconds from the start of the record. The lower pane contains the concept—or theme—view. The name of each concept is given in the left-most column, and the words contained within this concept are to its right. For example, the speech concept in Figure 1 contains the words phrase, talk, and saying. These themes are useful for browsing the time line. Here’s how. When a user clicks on a recognized word from the time line, the stored portion of the meeting is replayed from that point. When a user clicks on a theme button, all words that are thematically related to the selected one are highlighted on the time line. The user can also choose from a variety of zoom settings. At a high magnification, individual word buttons can be read. At a low magnification, one merely sees the recognized words as tiny unreadable buttons. At first blush, using such a low magnification seems to be self-defeating: what is the use of a browsing interface if the user can’t read the words? The answer lies in the interaction between the theme view and the time line view. When a word is selected from the theme view, its button changes color and all places where that word appears on the time line get changed to the same color. When a theme word such as speech is selected, all of its composite words—in this case phrase, talk, and saying—are assigned the same color. In this way, a user can see where themes cluster on the time line. To see more detail about a particular area of interest (typically a heavily colored area) the user need only zoom in on that area. Furthermore, if a user wants to look at the confluence of several themes, say information, speech, and report from Figure 1, each theme would be selected and assigned a unique color. The user’s attention is drawn to areas that are heavily colored, indicating a concentration of words related to the topics of interest. The user can also, at any time, execute a query of Jabber’s indexes—either temporal idioms or concepts. The result of a query is identical with the result of browsing. Part of the time line is colored, indicating the “hit set”. In this way, Jabber attempts to blur the distinction between searching/querying and browsing. Both searching and browsing are organized around, and return the user to, a time-line view of the meeting. Each indexing technique has its own view for browsing, annotates the time line, and has its own querying mechanism. In addition, each index can be viewed and browsed in combination with other indexes, as the search window in Figure 1 shows. Temporal Idioms

Meetings are more than discussion topics and agenda items. They contain salient patterns of interpersonal discourse— we call common patterns “temporal idioms”. These idioms can be identified by examining the interaction among the meeting participants over time. The identification of these idioms presents an intriguing new view on a meeting, allowing indexing in terms of how people interacted, regardless of what they said. We determine temporal idioms by detecting periods of

speech and pauses in an audio sample, completely ignoring the spoken content. This information is captured as another kind of index of meeting activity, and is reflected in a temporal idiom window (not shown here), and as another line in the time line view. In this line, each idiom is assigned a unique color. The user can thus see, at a glance, the topic of conversation along with the conversational idiom. We use the following kinds of information to make this determination: how long the current speaker is speaking, how frequently the floor changes, and whether there is overlapped speech. Once we have this information we can identify patterns of individual speech and of combinations of speakers. For example, we have identified the following conversation types, purely by their patterns of interaction: • Presentation • Question and Answer • Discussion • Argument • Agreement Not only are temporal idioms useful as an indexing technique for later meeting retrieval, but as we have noted elsewhere [7], changes in idioms occur at decision points in a meeting. For example, if a long presentation by one speaker is followed by a discussion, which is followed by a short question and answer period (e.g. “Do you agree with the proposal?”), then this indicates that a consensus is being reached. The ability to locate decision points is a great asset as it helps locate “important conversation.” Lexical Conglomerates: Chains, Trees, Clusters

Even if perfect speech recognition were available, simply recording the words spoken during a conversation does not make for a meaningful index. What is really needed is an index of topics (or concepts, or themes), not spoken words. To generate topics, we have used a technique known as “lexical chaining” [8]. Lexical chaining attempts to group words together into coherent sets, using lexical relations between the words as a guide to their relationships. Word relations are derived from WordNet, a sophisticated 90,000 word on-line thesaurus. WordNet maps words onto a large network of synonym sets (synsets for short). Each synset corresponds to a concept. These synsets are in turn linked as synonyms and antonyms, and to a richer set of relations including part-whole, sub-type, super-type, entailment, and causality [2]. The simplest lexical conglomerate is the chain: a onedimensional group of linked words organized according to order of insertion. Using part-whole relations, an example of a chain is: cupboard-kitchen-house. In an earlier version of Jabber we introduced the construct of lexical trees. For example, refrigerator and oven would be linked beneath kitchen. We found that trees of relationships better capture the complexity of human discourse: any given word in a sentence might be related to numerous other words. Having produced these lexical trees, we use them to build indexes back into the recorded information streams. These

are, however, not just indexes of keywords; they are indexes of topics (or themes if the topic reoccurs). To create these indexes, we simply record the time at which each word was uttered, and store this time with the word in its lexical tree. Once such indexes are created, we can now support topicbased retrieval. However, in generating the themes of a meeting we encountered three problems: • The topics represented by the trees tended to “wander”. For example, consider the chain tomorrow - today - leave - stretch - chew. It is clear that tomorrow and today belong together. Leave and stretch both belong if one considers their temporal meanings (e.g. a stretch of time), but chew is included because it relates to stretch only in the sense that both cause a change of state or appearance. • Due to the enormous number of possible connections, building lexical trees was too slow for real-time analysis. • We had no way of naming the theme that a tree represented—they were just collections of related words. Thus, there was no easy way to summarize and present the information they represented. Concept clustering is our solution to these problems. THEME-BASED RETRIEVAL

In contrast to the current state of practice—keyword based retrieval—we are interested in concept-based and themebased retrieval. We want to achieve this for several reasons. The most important reason is that people seldom use identical descriptions; in particular, the key words supplied by information providers may differ greatly from the query terms tried by information seeker. This is known as the “vocabulary problem”. In Furnas’ landmark study [6], he noted that people were highly variable in the terms that they chose to describe the same phenomena. In one study of names for editorial operations, for example, the amount of term overlap was only 7%. This problem is solved if we can identify concepts, and instead use these for indexing and retrieval. For example if a user is searching for information relating to the lynx, that user might be happy to have information about the bobcat returned. A keyword search cannot make such an association, and statistical approaches to information retrieval can only do so if the words lynx and bobcat consistently appear in close proximity to each other. If authors consistently use either one word or the other, but not both, no such statistical relation will be made. Similarly we want to be able to distinguish between different senses of the same word. A user who makes a query about the apple as a food, presumably does not want to see references to applets, New York City, Apple Computer, Apple Records, or Apple Valley Minnesota. Another related reason is the ability to satisfy “near” queries, or to aid in query reformulation. The user interested in the lynx might also be interested in seeing information about other large cats, or other carnivorous animals of mountain forests. The user interested in apples might also be inter-

ested in knowing about pears, melons, or ugli fruits. Thus, we need to be able to not only cluster words into concepts, but to understand the relationship, and the distance between different concepts. If we can define themes appropriately, and can define meaningful measures of the distance between themes, then we can support the notion of concept-based browsing. We can also provide meaningful feedback to a user who retrieves either too much or too little (for example, the apple query run on a World-Wide Web search engine retrieved 400,000 Web pages). We can provide the user with information about what other nearby concepts exist, enabling them to either expand or contract the search space. An Example of Theme-Based Retrieval

To provide a concrete example, consider the following five sentences. These indicate the kinds of problems that plague keyword based retrieval, and the kinds of discriminations that we need to support sensible theme-based retrieval. 1. With a diet consisting of meat in addition to berries, the family ursidae belongs to the order carnivora. 2. The Brown Bear ranges across the coniferous forests of North America, even into the territory of the closely related Alaskan Brown Bear. 3. The market value of common metals, including copper and zinc, is continuing to drop due to a prolonged bear market. 4. If you can bear to hold on longer, don’t sell your shares yet, as the nickel market is expected to turn from bear to bull. 5. The penny coin is our smallest unit of currency, equal to 1/100th of a dollar. The next most valuable coin is the nickel. Suppose you issue the query bear, intending the animal sense. A keyword search will correctly hit the second sentence, incorrectly hit the third and fourth, and incorrectly miss the first. A query for nickel (in the metal sense) would similarly produce false hits and false misses. It is reasonable that the third sentence should be returned for this query, since the subject is common metals, even though it doesn’t contain the word nickel. Keyword based searching is improved when multiple terms are provided and combined using Boolean operators. However, this works only when the specified terms are found in close proximity. We envision a system that can suggest reasonable alternatives in the face of an empty hit list. (e.g. “I can't find any mention of iron, but perhaps the metals copper, zinc, and nickel may be of interest.”) To support this we need to identify the sense of a word as it appears in the voice stream, and relate it to similar words. From the five example sentences, we would ideally like to see the following senses grouped together. Brown Bear, Alaskan Brown Bear, ursidae metal, nickel, zinc, copper nickel, penny, coin, currency

bear, bull, market We want to distinguish between the stock market senses of bear and bull from the animal sense of these words. Likewise, we want to distinguish between nickel as currency and nickel as a metal. Here are the concept clusters we get after running the example sentences through ConceptFinder. 1. currency: penny, dollar, nickel, coin, currency 2. metal: nickel, copper, zinc, metal 3. brown_bear: alaskan_brown_bear, brown_bear 4. investors: bull, bear 5. activity: market, share, turn, hold Notice that we are not only able to cluster related concepts, but also attach a meaningful label to each. These labels summarize the constituent clusters and indicate themes present in the text. Second, we are able to disambiguate different senses of the same word. We distinguish between the metal and currency senses of nickel, and do not consider bear to be an animal. Both bear and bull are properly categorized as investmentrelated terms, rather than as species of mammals. The ability to distinguish word senses means that the topic label attached to alaskan_brown_bear and brown_bear can be extremely specific: both are Brown Bears. If the word bear were incorrectly included then the label would be more general, along the lines of bear or placental mammal. Finally, all of the synsets in WordNet exist within a single is-a hierarchy, and all the concepts in a cluster have an is-a relation with the topic label. This allows us to compute the relationship between any two concepts, and thus provide an approximate measure of similarity. Such a measure of similarity provides the foundation for concept-based queries. CONCEPTFINDER DESCRIPTION

The integral role of WordNet in concept-based indexing should be apparent from the above discussion. Actually we use only NounNet, our customized version containing only nouns connected via is-a relations (or, more properly, hypernym/hyponym relations). Adverbs and adjectives have little independent semantic content—they serve only as modifiers. While verbs frequently do contain semantic content, the noun form of a verb almost always has the same semantics as the verb from which it is derived. Thus, by restricting ourselves to nouns, we simplify our problem and sacrifice little semantic content. Using NounNet as a reference, ConceptFinder proceeds through eight stages in identifying and clustering words, now briefly explained. Noun and Noun Phrase Identification

The first stage is to identify noun words and noun phrases from the transcript. Because it is highly distributive, English is not an easy language to parse—the same word may belong to different parts of speech. For example, range can be a noun (as in mountain range) or a verb (to cover broadly), depending on where in a sentence it is located. In

lieu of a full lexical parser, we err on the conservative side, indexing any word that might be noun. Also, priority is given to word collocations that form noun phrases. In our example, “alaskan brown bear” is interpreted as a single compound noun phrase, rather than as adjective-adjectivenoun. Sense Narrowing

The mapping between a word (as represented by a string of characters) and a concept (as represented by a synonym set) is many-to-many. A word may have different senses; a concept may be denoted by different words. Indeed, the etymology of English is fascinating and its many variations enable subtle, poetic expression. But to a Computer Scientist it is a headache. For each detected noun, therefore, the first step is to make an educated guess as to what it means. We call this operation sense narrowing. If a word maps onto only one NounNet synset, the task is trivial. Otherwise, ConceptFinder makes a judgement based on the previous two and forthcoming two nouns. For each possible meaning of the candidate word, ConceptFinder tabulates the is-a distance between it and the surrounding words. The sense yielding the minimum distance is assumed to be the current one. In sentence 3 above, for example, the candidate word copper is considered in the context of {value, metals, copper, zinc, drop}. Here, the surrounding usage effectively rules out the sense of copper as a synonym for penny—the distance between the two is much longer than the copper is-a metal relationship. Word Domain to Concept Domain Mapping

Once the meaning of a word is narrowed down to one sense, we can relate it to a single NounNet synset. Each synset is assigned a unique identifying number. These numbers are our search keys. Each key maps onto appropriate words in the text stream, including the speaker and time at which the words were spoken. In this way ConceptFinder translates original text from the word domain to the concept domain. All browsing and query operations are performed in the concept domain before converting back, when the results are expressed in terms of the original speech streams (the time lines). Construction of Concept Domain Subtree

Though a few cycles do exist in NounNet, for the most part the synsets are organized into a large tree structure based on is-a relations. (Example: brown-bear is-a bear is-a placental mammal . . . is-a life-form is-a entity.) Each node higher up the tree specifies a more general concept. Associated with each synset node is a reference counter, initially set by ConceptFinder to zero. As words are processed and mapped into the concept domain, the corresponding counter is incremented. The value indicates the number of references to that concept found in the text. Additionally, the relations between all such synsets are explicitly stored. The result is a subtree of the entire NounNet structure, reflecting the semantic content of the current source text. Subtree Density Calculation

We are not just interested in the popularity of a synset, but

Figure 2: The “gathering” concept mapped by ConceptFinder onto a meeting transcription in its relatedness to nearby synsets. Groups of related concepts indicate recurring themes, especially when discussed throughout a meeting. To locate such “hot spots” ConceptFinder assigns each node a density value. The density is determined by the presence and popularity of nearby nodes (as determined in the previous step), with the closest nodes contributing the greatest weight. To make comparisons fair, the density is normalized against the number of possible connecting relations. Popular concepts have a relatedness density near one; unrelated concepts have a density of zero. Cluster Extraction Around Density Peaks

Now that each synset in the concept domain subtree has a density value, they can be ranked form “hot” to “cold”. In our running example, penny is ranked first and hold is ranked last. ConceptFinder examines each synset in order, grouping similar concepts together into clusters. Five clusters are formed in our example. The behavior of ConceptFinder is controlled by a pair of threshold values. These determine whether to add a synset to an existing cluster, or to begin a new cluster, or to remove it from consideration. Loose thresholds result in one giant cluster representing the entire text; tight thresholds result in a collection of singlets. Neither are particularly useful— somewhere in between is better. To scale from small samples to large, ConceptFinder must adapt the threshold values

to the source material. Topic Summarization by Lowest Common Hypernym

A cluster of concepts (or words) is not easily digestible by a user, especially if it grows to over ten items. So, we want to automatically summarize the topics discussed during a meeting or videoconference. For example, from our sample sentences, a trained editor might suggest “small coinage” and “future market value of copper” and “preferences of the brown bear” as topics. It is too much to expect this level of comprehension from a computer program, of course. As a “best effort” algorithm, each concept cluster is summarized by its lowest common hypernym. Both penny and nickel are coins, and the coin and dollar are both forms of currency. Therefore, the summary topic of the first cluster is currency—the least general concept encompassing all the referents of the component synsets. Experience shows that specific topics tend to be the most informative. Ranking of Concept Clusters

When processing a large text source, the number and size of concept clusters can grow uncomfortably large. Worse, clusters that are highly related may have excessively vague topics (e.g. “thing” or “entity”). ConceptFinder tries to rank concept clusters according to meaningfulness, promoting those likely to capture discussion themes.

In our example, the first three clusters (and summarizing topics) are all specific and central to the text. The final two are still relevant, but less specific: they belong lower on the list. Determining a good ranking function is still a matter of investigation. Taking hints from an information theoretic view of WordNet [8], a linear combination of concept relatedness and specificity appears to be fruitful.

are also colored red. In addition, the entire concept cluster has been assigned the color red, and so all occurrences of the other words in the cluster—staff, faculty, and meeting— are also colored red in the time-line. Querying on the gathering concept would have exactly the same effect on the time line. DISCUSSION

RESULTS ConceptFinder’s Efficiency

The Jabber system is efficient enough to perform in realtime on PC hardware, annotating a videoconference as the conferees interact. This means that all the parts of the system are able to operate in real time. This is a challenge for a system that is dealing with a potentially exponential explosion in the number of checks that it must make. As a conversation grows, the topic that a participant mentioned at the start may be reprised 40 minutes later. It turns out that a second critical use for our threshold values is to control the “generosity” of ConceptFinder—the willingness of the system to consider more distant, tenuous relations. By trial-and-error, we have found what we consider to be acceptable threshold values. For example, the content of Figure 2 was derived from a video conference of five participants, and lasted for 51 minutes. Running on a fast 486 PC, ConceptFinder indexed the meeting transcript in 20 minutes, less than 50% of its duration. (The cost of speech recognition is not included in this figure.) Using Generated Concepts

Meetings, being social and real-time occasions, often wander. So, the topics that one re-constructs from the transcript of a meeting are less “tight” than those found in written texts. Still, we are able to derive meaningful topics from a meeting. Figure 2 shows an example from a recorded videoconference. The participants are computer network administrators. Roughly, the agenda was “priority handling of work requests”, though discussions meandered and lacked an easy-to-index vocabulary, being full of jargon and projectinternal references. The Jabber screen shot of Figure 2 relates ConceptFinder results to the original spoken text. As previously described, in Jabber’s user interface, the concept names are shown below the time-line, on the left. The concept cluster—the related words—are shown immediately to the right of the concept name. For example, in Figure 2, the topic position is shown, consisting of the words third, second, priority, level, and tier. Because the speakers were trying to decide how to handle requests from users, the issues of positioning and prioritization kept recurring. The highlighted concept gathering has been selected by the user in this example. Because of this selection, gathering has been assigned a currently unassigned color—red in this case—and all places in the time-line where gathering occurs

Concept-based querying and browsing has enormous potential for both multimedia-based and traditional information retrieval domains. ConceptFinder shows evidence of providing robust and subtle distinctions among words that are superficially similar, but which represent distinct concepts. We have a number of research areas that we would like to pursue to make ConceptFinder’s promise more real. First of all, we need to perform rigorous benchmarking, comparing ConceptFinder’s performance with keyword-based and statistical methods of information retrieval. Our initial, informal results, however, are very encouraging. In addition, we are preparing to test ConceptFinder on newsgroup postings, another informal social form of communication. ConceptFinder’s success is limited by some external factors, principally the accuracy and coverage of WordNet. WordNet’s coverage is not uniform. A casual user is often surprised by both the depth and shallowness of its coverage (in different areas, of course). Similarly, by relying on speech recognition to provide the raw text upon which ConceptFinder works, we will have additional mistakes in terms of what words are recognized. However, the subtree density calculation helps lessen the impact of improperly recognized words. One fact works in our favor to overcome these difficulties. We are interested in indexing multimedia resources by concepts, not individual words. So, even if a large percentage of the original words are not recognized, if a concept is important enough, it will still be recognized, because words relevant to that concept will be uttered frequently in the meeting. Additionally, we intend to incorporate more sophisticated lexical parsing as a front end to the system, so that we can make more secure guesses as to when a word is a noun, and to help identify conversation-specific noun phrases that are not included in any dictionary. Another worthwhile task is to compare the distances separating words in WordNet versus our intuitive sense of their semantic separation, but this represents an enormous undertaking in the field of psycholinguistics. CONCLUSIONS

We have presented a system, Jabber, for interacting with, analyzing, and retrieving multimedia records of humanhuman conversations. Jabber is intended to aid a user in both real-time understanding, as the meeting is in progress, or for post-facto retrieval from a database of the meeting. Our objective is not merely to index the contents of a meet-

ing, but to automatically identify the “interesting bits” and make them readily accessible. Jabber is organized around the idea that the user should not have to deal with any of the computational aspects of multimedia, but rather should be able to browse and query the meeting using human-relevant concepts: principally the style of the conversation and the topic being discussed. This paper has concentrated on showing how we compute and present the topics found in a conversation. Our early results with ConceptFinder are encouraging. The system is able to make refined distinctions among different senses of the same words, and is able to summarize a set of related words. We know of no other system that can do this without substantial human intervention. ConceptFinder requires no human intervention. ACKNOWLEDGEMENTS

We would like to thank Iain Little for his work on Jabber’s user interface, Reem Al-Halimi for her work on LexTree, Stewart Chao for his work on temporal idiom recognition, and Marilyn Mantei and William Hunt for their work on earlier versions of Jabber. REFERENCES

[1] S. Adali, S. Candan, S. Chen, K. Erol, V. Subrahmanian, “The Advanced Video Information System: data structures and query processing”, Multimedia Systems, 4(4): 173-186, 1996. [2] R. Beckwith, C. Fellbaum, D. Gross, G. Miller, “WordNet: A lexical database organized on psycholinguistic principles”, In: U. Zernik (ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon, Lawrence Erlbaum, Hillsdale, NJ, 211-232, 1991. [3] A. Chakravarthy, K. Haase, “NetSerf: Using Semantic Knowledge to Find Internet Information Archives”, Proceedings SIGIR ‘95, 4-11, 1995.

[4] M. Christel et al, “Informedia Digital Video Library”, Communications of the ACM, 38(4):57-58, 1995. [5] W. Cody et al, “Querying Multimedia Data from Multiple Repositories by Content: The Garlic Project”, Proceedings of VDB-3, Lausanne, Switzerland, March, 1995. [6] G. Furnas, T. Landauer, L. Gomez, S. Dumais, “The Vocabulary Problem in Human-Systems Communication”, Communications of the ACM, 30(11): 964-971, 1987. [7] R. Kazman, R. Al-Halimi, W. Hunt, M. Mantei, “Four Paradigms for Indexing Video Conferences”, IEEE Multimedia, Spring 1996: 63-73. [8] R. Kazman, R. Al-Halimi, “Temporal Indexing of Video through Lexical Chaining”, in WordNet: An Electronic Lexical Database and some of its Applications, C. Fellbaum (ed.), MIT Press, 1996, to appear. [9] G. Salton, Automatic Text Processing: the transformation, analysis, and retrieval of information by computer, 1989, Addison-Wesley. [10] M. Turk, A. Pentland, “Eigenfaces for Recognition”, Journal of Cognitive Neuroscience, (3)1: 71-96, 1991. [11] E. Voorhees, “Query Expansion Using LexicalSemantic Relations”, Proceedings SIGIR ‘94, 61-69, 1994. [12] J. Walsh, G. Ungson, “Organizational Memory”, Academy of Management Review, 16(1), January 1991, 57-91. [13] H. Zhang, S. Tan, S. Smoliar, G. Yihong, “Automatic Parsing and indexing of news video”, Multimedia Systems, 2(6): 256-265, 1995.