HPQS: Providing Natural Language Access to Multimedia ... - CiteSeerX

HPQS: Providing Natural Language Access to Multimedia Documents Ingo Glöckner and Alois Knoll University of Bielefeld, Faculty of Technology P.O.Box 10 01 31, D-33501 Bielefeld, Germany fingo j [email protected]

The paper presents the design and prototypical implementation of an integrated retrieval system (HPQS) which provides natural language access to multimedia documents in a given application domain. It supports more flexible ways of querying by combining a semantically rich retrieval model based on fuzzy set theory with domain-specific methods for document analysis which can be applied online (i.e. the search criteria are not restricted to combinations of anticipated descriptors). Emphasis is put on the interplay of the system components because it is their mutual contribution which makes possible acceptable response times despite of the use of online search.

1 Introduction The number of documents available in electronical form is is increasing rapidly. The documents are no more restricted to the textual communication medium but utilize a number of other formats, e.g. figures, tables, images, image sequences and elements of other media, which are typically connected through hyperlinks. Although these elements can be viewed as documents in their own right (after all, today’s search engines treat each URL as a document reference), it would be more adequate to capture the structure imposed by the hyperlinks which lets these elements appear as a single, integral multimedia document which can be navigated. The document’s elements (subdocuments) are thus related to each others, both within documents and across several associated documents. Today’s search services do not meet these characteristics of the document bases in the world-wide web (WWW). Users of these services must often accept the following restrictions:

general-purpose search engines for fulltext search typically yield low precision;

the mutual contribution of a document’s parts to its content are not considered, nor are links across associated documents (e.g. hypertext links);

user-interfaces based on formal query languages pose technical problems to “average” or sporadic users. On the other hand, the search interfaces of popular WWW search engines are so simplistic that it becomes difficult to express search criteria efficiently;

retrieval of non-texual documents or document parts is either not possible or requires a manual annotation with metainformation about the document’s content.

The traditional retrieval techniques which often originate from the sixties are meeting their limits in these new multimedia-based applications. The reviving interest in information retrieval is therefore mainly focused on retrieval problems which result from the use of non-textual or multimedia documents. In the evolving field of content-based image retrieval [2, 5, 6, 14], images are typically analysed for simple features like structure, distribution of color (histograms or correlograms), texture etc., which all correspond to the signal level of analysis. These methods – as comparable to string matching in text retrieval – are not restricted to specific domains and can be considered as fully generic. However, the results obtained are not yet comparable to that of text retrieval.

The low filter quality of today’s generic techniques for multimedia retrieval prevents the extension of WWW search engines to generalpurpose search services for all media of relevance to the WWW. It suggests another strategy for building high-quality search services for multimedia documents, namely that of combining a substrate of generic methods for document description and information aggregation with domain-specific methods which are taylored to specific requirements on the search service in a chosen field of application. The current concept of broad-coverage search engines, which typically try to index the whole document base of the WWW, is thus contrasted with that of a search service specialized to a topic area of general interest – such as weather, geography, sports, or vacation – and which provides search facilities on a new level of quality in this area. These considerations lead to the following profile of a “high performance query server” (HPQS):

natural language interface, in order to help sporadic users to formulate their search interest;

on-line search of the document base under the user query: the complex modes of querying by natural language cannot be anticipated through pre-computed descriptors and call for the application of directsearch methods;

scalability: acceptable response times must be ensured even for large data sets;

evaluation, interpretation and aggregation of information items extracted from different sources by applying methods of information aggregation and data fusion;

improved result presentation: generation of result documents in dependence on the user’s information need.

In order to ensure the integrity of the document base, in order to guarantee the desired high availability and the required efficient access (for highresolution colour images), a service of this type should make use of local data storage (or data mirroring). Another simplifying assumption is the restriction to a given (but in principle arbitrary) application domain.

2 Application domain For the prototypical application, we have chosen meteorological (weather information) documents. These are available in all media of interest and can be acquired from the WWW. The range of meteorological documents comprises:

textual weather reports HTML);

(ASCII and

time series of ground temperature and other meteorological indicators (graphics);

Data of meteorological measurement stations (tables);

satellite images and weather maps (colour images);

animated satellite images and TV weather reports (image sequences);

spoken weather reports (audio).

Query types in this application scenario include the following:

What is the weather like in Bielefeld?

Is it more often rainy on Crete than in southern Italy?

Show me pictures of cloud formation over Bavaria!

In which federal states of Germany has it been humid but warm last week?

Where have temperatures been highest yesterday?

How many days was it sunny in Berlin this month?

Show me the average temperature readings in Germany for July!

Show me weather maps associated with the last weather report!

3 System architecture

NatLink is based on word-class controlled functional analysis (WCFA, [12]), a word-based, expectation oriented approach to syntactical description suited for robust incremental parsing. The hierarchical organisation of the lexicon profits from multiple inheritance with defaults, which yield condensed, virtually redundance-free lexicon entries—and a corresponding drop in lexicon development and maintenance effort. NatLink offers a broad syntactical and lexical coverage of the chosen application domain.

Fig. 1 displays the architecure of the HPQS system. The user interacts with the system via a graphical user interface (Java applet); queries are typed into a query mask by using the keyboard (spoken natural language input is not yet supported). The morphological and syntactical analysis are carried out by the natural language interface, which generates a semantical representation of the query content. This representation is purely declarative, i.e. not directly executable. The subsequent retrieval module hence applies domain specific transformation rules which translate the declarative representation into a sequence of executable database queries. These trigger the generic evaluation and aggregation functionality as well as the additional application methods. Execution of the generated database queries is controlled by the multimedia mediator which optimizes response times by maintaining a cache for storage and reuse of intermediate search results. The use of a parallel media server coupled with dedicated highspeed VLSI processors for image and text search ensures acceptable response times even when a computationally expensive online analysis of the mass data has to be carried out.

The robustness aspect is crucial to the acceptance of a natural language interface. NatLink gains a certain tolerance against ungrammatic input by its use of an analysis grammar (as opposed to more restrictive generation grammars); for example, it supports a rather free constituent order. Enforcement of grammatical constraints like agreement can be temporarily relaxed whenever appropriate. NatLink also knows how to process the important class of elliptical queries, i.e. of incomplete sentences such as nominal phrases. The problem of unknown words is alleviated by semantical decomposition of compound nouns, which results in a broader coverage of the base lexicon. Remaining unknown words are ignored in parsing; in this case, the incremental parsing process generates the ISR of the partial analysis result. Finally, standard techniques of spellingcorrection can be plugged into NatLink, increasing its robustness with respect to spelling errors or typographical variants.

Graphical User Interface Natural Language Interface

Parallel media server

- Morphological Analysis - Parsing - Construction of Semantics

Document postprocessing

VLSI Search Processors Text processor

Text

Request Handler

Retrieval Module

Server Gateway

- Transformation (ISR -> FRR -> OQL) - Iterative Request Processing

Generic methods

Query mediator - Query optimisation - Metadata management - Result caching - System security

Fig. 1: Architecture of the HPQS system

4 The Natural NatLink

Language

Interface

The task of the natural language interface (NatLink) is to construct the internal semantical representation (ISR) from the natural language query. ISR is a linearized variant of multilayered extended semantic networks (MESNET graphs, [13]). Grammatical analysis in

Params.

Computer Vision Algorithms

OQL Domainspecific methods

Image processor

Image

Transformation rules

Thesrs.

Request Handler

5 The Retrieval Component

RegExp

Edit distance computation

... Request Handler

ISR

Operat.

Computer lexicon

The retrieval component of the HPQS system utilizes a formal retrieval representation (FRR) which combines generic FRR methods (search techniques for documents of all relevant media and methods for information aggregation and fusion) and domain-specific methods (which implement domain concepts). The FRR is syntactically identical to ODMG-OQL (Object Query Language); the FRR functionality is provided by generic and application-specific classes in an ODMG database schema. The task of the retrieval component is thus to translate the declarative ISR as generated by the NLI into a sequence of executable database queries which conform to the FRR schema. This translation process comprises:

Normalisation: mapping of natural-language terms to their database correlates (e.g. names of cities to geographic identifiers); Anchoring in discourse situation: (e.g. resolution of temporal deictic expressions like “today”); Default assumptions (e.g. in order to limit the scope of a region search to Germany).

The translation is accomplished by domainspecific transformation rules which construct the query FRR from the ISR graph. The premise part of each transformation rule specifies the structure of subgraphs to which the rule applies (e.g. temporal or local specifications, domain concepts). The consequence parts of the rules provide the corresponding FRR expressions from which the FRR of the query is constructed (Fig. 2 displays the FRR sequence generated for an example query). Generated FRR q 311: element(select x.shape from x in Bundeslaender where x.name = "Bayern") q 312: select i from i in MeteoFranceImages where i.date.ge(1998,8,1,0,0,0) and i.date.lower(1998,8,8,0,0,0) q 313: select i.pred from i in q 312 where i.pred i q 314: select ImageAndRelevance(image:i, relevance:q 311.rateGreaterEqual(0.7, i.cloudiness(). sunny().negation().germanyProjection())) from i in q 312 q 315: select ImageAndRelevance(image:i, relevance:q 311.rateGreaterEqual(0.7, i.cloudiness().sunny().germanyProjection())) from i in q 313 q 316: select ImageAndRelevance(image:i.image, relevance:i.relevance.min(j.relevance)) from i in q 314, j in q 315 where j.image = ((HpqsMeteoFranceImage)i.image).pred q 317: select f.relevance from f in q 316 q 318: sort f in q 317 by 1 q 319: HpqsGreyValSeq(greyval sequence: o2 list GreyVal(q 318)).determineThreshold() q 320: select ImagesAndRelevance(image:f.image, pred:((HpqsMeteoFranceImage)f.image).pred, succ:((HpqsMeteoFranceImage)f.image).succ, relevance:f.relevance) from f in q 316 where f.relevance.ge(q 319) = 1

Fig. 2: FRR sequence generated for query: “Show me pictures of cloud formation over Bavaria in the first week of August!”

5.1 Generic FRR methods The generic part of FRR offers:

an elaborate text-search component (based on the dedicated VLSI processors for approximate full-text search); image analysis primitives (partly implemented in VLSI hardware);

discrete and parametrized fuzzy sets and corresponding connectives from fuzzy set theory;

interpolation methods;

fuzzy quantifiers which implement quantifying expressions in natural language queries (“most”, “almost everywhere”, “sometimes” etc.);

Further methods for information fusion, i.e. for combining associated pieces of information (e.g. fuzzy integral).

Special emphasis has been put on the development of a mathematically sound aggregation methodology. In [8], we have pointed out that providing natural language access to a multimedia retrieval system cannot be accomplished by merely adding an NL frontend to an existing retrieval “core”. This is because the modes of aggregation expressible in natural language are not restricted to the Boolean connectives supported by traditional retrieval systems. In particular, vague quantifying expressions (“fuzzy quantifiers”) are often used in natural language queries, and an adequate processing of such queries is only possible if the meaning of these aggregation operators is formally captured (examples of explicitely or implicitely quantifying expressions of relevance to the meteorological domain are shown in table 1). Quantifying expressions in HPQS Quantification over local regions few clouds over Italy many clouds over southern Germany more clouds over Spain than over Greece cloudy in Northrine-Westphalia (implicit) Quantification over regions in time almost always cold in the last weeks more often sunny in Portugal than in Greece hot in Berlin in the previous week (implicit) Table 1: Examples of fuzzy quantifiers in the meteorology domain Existing approaches to fuzzy quantification [19, 15, 17, 18] had to be rejected because of their inconsistency with linguistic facts. For example, it is well-known from the Theory of Generalized Quantifiers (TGQ, [1, 11]) that many

natural language quantifiers (e.g., “most”) cannot be reduced to one-place quantification (but still all of the above approaches try this!). Building on TGQ, we have formulated a set of axioms which characterizes mathematically sound models of fuzzy quantification; in addition, we have presented a model of the axioms [7]. In [10], we have shown that this approach is computational by presenting a histogram-based algorithm for the efficient evaluation of the resulting operators. In our system, we are currently using these operators to aggregate over fuzzy sets of pixels (local quantification) or fuzzy sets of time points (temporal quantification). Apart from these direct uses in query interpretation, the operators can be applied to more general problems of weighted aggregation and information fusion. For example, results obtained for the parts of a document can be combined to its overall evaluation, as shown by Bordogna&Pasi [4] (however based on a different model of fuzzy quantifiers). The range of applications even includes traditional bibliographic retrieval [9]. For example, if W is the fuzzy set of term-user-relevances (i.e. of user-specified term-weights) and if the gradual relevance of the search terms with respect to a given document is expressed by a fuzzy set A, then the quantifying expression all(W; A) models weighted conjunction (“all user-relevant terms are document-relevant”), and the quantifying expression some(W; A) models weighted disjunction (“at least one user-relevant term is document-relevant”). Genuine fuzzy quantifiers like most, many etc. offer more subtle ways of aggregation.

sity in satellite images; determination of degrees of cloudiness (“sunny”), and others domain concepts. This implementation of domain concepts is considerably facilitated by the restriction to a specific application domain. It is the context provided by choosing an application domain which permits additional (and simplifying) background assumptions. For example, our choice of image classes (satellite images) permits the detection of clouds with relatively simple, intensitybased methods; a simplification which would not be possible with unrestricted image material (e.g., clouds on landscape photographs). In the same way that text-matching provides only a very coarse, but often still useful, approximation of text-understanding, we attempt to model only that portion of the domain concepts which must be captured to restrict the search to useful query results.

6 Multimedia Mediator Mediators [16] are tools for information integration, i.e. pieces of information provided by different sources and presented in different formats are combined in a global logical view of the underlying information systems. In the HPQS system, the tasks of the multimedia mediator [3] include:

5.2 Domain-specific FRR methods The generic FRR can be extended by domainspecific methods. These have to provide an interpretation for the natural language domain concepts based on the raw document data. For example, the HPQS prototype has been taylored to the meteorology domain by implementing: cartographic projections of the considered image classes (satellite images, weather maps etc.); objective (“more than 20o ”) and subjective (“warm”) classification of temperature readings; estimation of cloud-top height and cloud den-

abstraction from details of the external source (parallel media server), e.g. communication protocol and query syntax; making optimal use of the parallelism available in the external source; establishing a well-structured view of the multimedia system, which to the retrieval module (the mediator’s client) should appear like an object-oriented ODMG database; maintenance of a proxy state of the external document base: method invocations can only be delegated to the parallel media server if the documents to which these methods should be applied are known to the mediator; materialization of results of method invocations, in order to avoid redundant computations by reusing query results stored in a result cache.

The transformation of ODMG-OQL mediator queries into simpler queries which can be executed in parallel will be illustrated by an example. The optimizer of the mediator might e.g. receive the query select ImageAndRelevance( image : I, relevance : BAY.rateGreaterEqual(0.7, I.cloudiness().sunny() .negation() .germanyProjection())) from I in q_18

By means of query transformations, the optimizer decomposes this query in a sequence of elementary queries: R1: select I.P_cloudiness() from I in q_18 R2: select I.P_sunny() from I in R1 ...

These simple queries are block-wise transmitted to the media server, which executes them in parallel and returns the set of results to the mediator.

7 Parallel Media-Server The retrieval quality aimed at by the HPQS can only be guaranteed if the integrity of the document base is ensured: it must be under exclusive control of the HPQS system in order to avoid WWW-typical problems such as “dangling links”. This central storage of data also seems best suited to ensure the desired high availability of the service. In addition, the presence of high-resolution color images (in our case, high-resolution multi-channel satellite images) requires a fast server both for the delievery of mass data and execution of the direct search methods; it was therefore decided to use a parallel platform. The parallel media server is a basically a scalable HTTP server, which manages the mass data of the HPQS system (insertion, delievery, deletion). The required meteorological documents are therefore acquired from the WWW and then transferred to the media server and stored on its system of distributed disks. In addition, the media server offers a parallel execution platform for the system’s online search

methods (image analysis, approximate full-text search, information aggregation). Those method invocations which require extreme computational power are delegated to the dedicated VLSI search hardware available on the nodes of the media server. The combination of parallel execution and special-purpose search processors will yield a dramatic decrease of processing times as compared to a sequential software solution. The processing times of an example query are displayed in table 2 (without VSLI-hardware— the search processor cards are not yet available).

Method mefExtFilteredFindLowClouds ociExtSunny fiExtNegation germanyProjection

Avg.Time [s] 14.2 0.3 0.2 4.0

Table 2: Processing Times of an Example Query Sequential processing of the example query on e.g. 56 documents in the specified time interval amounts to a total processing time of 1047:2 s ( 18 minutes). By using a parallel server with e.g. eight worker nodes, parallel processing time drops to about 2:2 minutes (communication effort is low in this case because the tasks can be processed independently). A further and more drastic decrease of processing times will be observed once the VLSI hardware cards become available.

8 Dedicated VLSI search processors The processing nodes of the parallel media server will be equiped with very fast special-purpose hardware cards. The VLSI image and text search processors specifically designed for the HPQS system are currently in the circuit test phase. The dedicated processor for image analysis implements two-dimensional convolution, two-dimension weighted median and histogram computations. In addition, a high-speed DSP processor and existing dedicated VLSI processors e.g. for block matching will be integrated into the hardware cards. The dedicated processor for approximate full-text search yields a throughput rate of more than 1:3 108 characters per second and supports the robust (error-tolerant)

search for up to 16 simultaneous search terms. Compared to a software solution, the use of dedicated VLSI search hardware will result in an acceleration factor of about 1000.

nodes, this results in a drop of processing times of about one order of magnitude);

Use of dedicated VLSI hardware (by using dedicated VLSI hardware, the search process is accelerated by a factor of about 1000. However, this improvement is possible only for the limited number of algorithms implemented in dedicated hardware);

Materialization/caching of query results (this technique is always applicable and yields a speed-up for frequent queries (or common generated subqueries) comparible to that of traditional indexing. In the above table of processing times, for example, the method for estimating cloudiness (mefExtFilteredFindLowClouds) is particularly time-consuming. If the results of this method are materialized (there is a high likelihood of that because the method is frequently used), response times in the above example reduce to approx. 252 s (sequential) and 31:5 s, if executed in parallel on eight processing nodes.)

9 Discussion We have presented a system architecture suitable for building high-quality multimedia search services for restricted (but in principle arbitrary) topic areas. By providing a natural language interface, technical barriers in accessing the system are removed, thus making it useful for a broad public. Of course, further research on natural language processing is required in order to make these techniques useful for broader applications. The imprecision and vagueness of natural language queries (and of the underlying natural language concepts) is of crucial importance because an adequate system behaviour can only be achieved if these factors do not result in system failure. In order to meet these requirements, we have developed a semantically rich retrieval model based on fuzzy set theory. Special emphasis is put on methods of information aggregation and data fusion. In the long run, such methods are a prerequisite of combining the contents spread over the parts of a multimedia document, and of utilizing links across documents, in a broad range of other (and more general) applications. HPQS supports online search and thus offers rather flexible ways of querying: there is no restriction to pre-computed descriptors and their Boolean combinations—by contrast, each query that can be expressed in the semantical model of HPQS can be transformed in an executable sequence of evaluation steps which are performed directly on the mass data. Apparently, a tradeoff between computational effort and scalability is necessary because application of image analysis methods and elaborate text-search techniques partially contradicts with the use of a large document base. In HPQS, we have combined several techniques in order to ensure acceptable response times:

Parallelisation of method invocations (always possible in our system; due to the limited number of available processing

In addition, the fine-grained direct search methods supported by HPQS can be complemented with pre-computed descriptors in order to further reduce processing times. This does not contradict our original intention of using direct search if these descriptors are only used to restrict the search space to some regular envelope of the final search result. This envelope then provides the range of the subsequent online search, which detects the actual (and usually less regular) result of the two-stage search process (see Fig. 3). In fact, HPQS already utilizes this mechanism, but it only uses descriptors which restrict the search space according to purely formal criteria such as document type or date. all documents envelope(q) = range of online search relevant docs(q)

Fig. 3: Combining Online and Index Search

An experimental prototype of the HPQS system has been implemented. Current research activities include the integration of the dedicated VLSI hardware, the optimisation of mediator and database, and an improved presentation of result documents.

[7] I. Glöckner. DFS: An axiomatic approach to fuzzy quantification. TR97-06, Technische Fakultät, Universität Bielefeld, 1997.

Acknowledgement —The HPQS system presented in this sequel is jointly developed by the research groups of J. Biskup (Dortmund), H. Helbig (Hagen), B. Monien (Paderborn), T. Noll (Aachen), H. Ney (Aachen) and A. Knoll (Bielefeld; coordinator). The authors wish to acknowledge the valuable contribution Volker Jentsch made by establishing this collaborative research effort.

[9] I. Glöckner and A. Knoll. Konzeption und Implementierung eines kooperativen Assistenten zur Recherche in den Datenbanken der wissenschaftlichen Bibliotheken. In 23. Jahrestagung der Gesellschaft für Klassifikation e.V.: Klassifikation und Informationsverarbeitung zur Jahrtausendwende, March 1999.

References [1] J. Barwise and R. Cooper. Generalized quantifiers and natural language. Linguistics and Philosophy, 4:159–219, 1981. [2] A. D. Bimbo and P. Pala. Visual image retrieval by elastic matching of user sketches. IEEE Trans. on Patt. Anal. and Mach. Intell., 19(2):121–132, Feb. 1997. [3] J. Biskup, J. Freitag, Y. Karabulut, and B. Sprick. A mediator for multimedia systems. In Proceedings 3rd International Workshop on Multimedia Information Systems, Como, Italia, Sept. 1997. [4] G. Bordogna and G. Pasi. A fuzzy information retrieval system handling users’ preferences on document sections. In D. Dubois, H. Prade, and R. Yager, editors, Fuzzy Information Engineering. Wiley, New York, 1997. [5] V. Castelli, L. Bergman, C.-S. Li, and J. Smith. Search and progressive information retrieval from distributed image/video databases: the SPIRE project. In C. Nikolaou and C. Stephanidis, editors, Research and Advanced Technology for Digital Libraries: Proceedings of ECDL ‘98, LNCS 1513. Springer, 1998. [6] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkhani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The QBIC system. IEEE Computer, 28(9), Sept. 1995.

[8] I. Glöckner and A. Knoll. Fuzzy quantifiers for processing natural-language queries in contentbased multimedia retrieval systems. TR9705, Technische Fakultät, Universität Bielefeld, 1997.

[10] I. Glöckner, A. Knoll, and A. Wolfram. Data fusion based on fuzzy quantifiers. In Proceedings of EuroFusion98, International Data Fusion Conference, pages 39–46, DERA Malvern, Worcestershire, UK, oct 1998. [11] F. Hamm. Natürlich-sprachliche Quantoren. Modelltheoretische Untersuchungen zu universellen semantischen Beschränkungen. Number 236 in Linguistische Arbeiten. Niemeyer, Tübingen, 1989. [12] H. Helbig. Syntactic-semantic analysis of natural language by a new word-class controlled functional analysis (wcfa). Computers and Artificial Intelligence, 1:53–59, 1986. [13] H. Helbig and M. Schulz. Knowledge representation with MESNET – A multilayered extended semantic network. Technical report, Fernuniversität Hagen, FB Informatik 197 - 5/1996, 1996. [14] V. Inc. http://www.virage.com. [15] A. Ralescu. A note on rule representation in expert systems. Information Sciences, 38:193– 203, 1986. [16] G. Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25(3):38–49, 1992. [17] R. Yager. Connectives and quantifiers in fuzzy sets. Fuzzy Sets and Systems, 40:39–75, 1991. [18] R. Yager. Families of OWA operators. Fuzzy Sets and Systems, 59:125–148, 1993. [19] L. Zadeh. A computational approach to fuzzy quantifiers in natural languages. Computers and Mathematics with Applications, 9:149–184, 1983.