Image Semantics induced by Web Structure - Semantic Scholar

Image Semantics induced by Web Structure Simone Santini University of California, San Diego ABSTRACT This paper presents a data model for images immersed in the world wide web and that derive their meaning from visual similarity, from the connection with the text of the pages that contains them, and from the link structure of the web. I will model images on the web as a graph whose nodes are either text documents or images, and whose edges are links, labeled with measures of relevance of one document towards the other. The paper presents briefly the features used to characterize the text and the visual aspect of the images, and then goes on to present a data algebra suitable to navigate and query the database.

1. INTRODUCTION Image database research aims at retrieving images based on their content. There is some indication that this generic programme might prove too ambitious not only because of formidable technical difficulties in the syntactic description of image contents, but also because of fundamental issues connected to the semantics of the images.1,2 The fundamental difference between image databases and other databases is that, in the case of images, the meaning is not induced by the images themselves, but it derives from data outside of the images set. Three modalities by which this “semantic endowement” can happen have been identified2 : images can derive meaning from a restriction in the semantic field (domain modelity), from being immersed into a self-contained text (linguistic modality), or from a meaning-forming interaction with a user (user modality). In this paper, I am interested in the second of such modalities, in particular, in the case in which the self-contained text that gives meaning to images is the world wide web. I have already considered the problem of the integration of visual features and the textual features that can be attached to images due to their placement in the context of a document.3 In this paper, I intend to analyze more in depth the issue arising from the inclusion of images in a structured collection of documents that is, in a nutshell, to include web links in the database model. The paper is organized as follows. Section 2 introduces a general model of the web which emphasises the rˆole that images play in it, and their connections to the text on the pages. Section 3 describes briefly the extraction of the text index from the web pages and from the text surrounding the images. Section 4 illustrates the visual features used in the database. Section 5 intriduces the data model for the database, and the operations of the data algebra. Some conclusions and future directions are presented in section 6.

2. A MODEL OF IMAGES ON THE WEB In this section, it is my intention to define several models of the world wide web, progressively refining them in order to highlight the rˆ ole of images in it and the relations between images and text documents. At the most general level, the web is a graph W = (D, E, φ), where D is a set of documents, and E is the set of edges, and φ is a function that associates to every edge e ∈ E an ordered pair of nodes: φ(e) = (d1 , d2 ), with d1 , d2 ∈ D, that is, φ : E → D × D∗ In this case the link is said to go from document d1 to document d2 . Given a document d, its downstream neighborhood is the set ν(d) = {d0 ∈ D : ∃e ∈ E : φ(e) = (d, d0 )}

(1)

Simone Santini, Center for Research in Biological Structure; University of California, San Diego; 9500 Gilman Drive; La Jolla, CA 92093-0608; USA; email: [email protected] ∗ In many cases a graph is defined simply as W = (D, E), where D is the set of nodes (documents, in this case), E ⊂ D × D is the set of edges, each edge being an ordered pair of documents. This definition is equivalent to that in this paper for simple graphs, but can’t be generalized to multigraphs, since two edges between the same pair of documents would be undistinguishable. Since on the web there can be multiple links between the same two documents, it is more appropriate to start from the multigrah model as a basis.

Edge type document-to-document document-to-image image-to-document

Syntactic HTML Construct Link text

Table 1. Syntactic definition of the different types of edges while its upstream neighborhood is the set ν ∗ (d) = {d0 ∈ D : ∃e ∈ E : φ(e) = (d0 , d)}

(2)

The two functions ν, ν ∗ : D → 2D are adjoint in the sense that d1 ∈ ν(d2 ) ⇔ d2 ∈ ν ∗ (d1 ). The functions ν and ν ∗ can be extended to a set of nodes N as ν(N ) =

(3) S

d∈N

ν(d) and ν ∗ (N ) =

S

d∈N

ν ∗ (d).

The outdegree of a node is the number of edges leaving that node, that is, o(d) = |{e ∈ E : π1 (φ(e)) = d}|† . Similarly, the indegree of a node d is the number of edges entering the node, that is i(d) = |{e ∈ E : π2 (φ(e)) = d}|. The concepts of indegree and outdegree can easily be generalized to sets of nodes. I will admit from the outset the possibility that links be labeled, with labels drawn from a set Λ which can be finite or infinite. The labeling function l : D × D → Λ ∪ {⊥} is defined as λ ∈ Λ if e ∈ E l(e) = (4) ⊥ otherwise

The web contains a rather appalling variety of types of documents or fragments thereof, and a more than cursory glance at any catalog of existing MIME types will convince anybody of this fact. In this context, however, I will purposedly disregard most of these document types, and consider only a “simplified web” consisting exclusively of text documents containing images. This assumption might seem a little drastic but, all in all, it is very hard to find a web site in which non-text and non-image documents add significantly to the information content of the site, rather than being simple additions designed to attract customers. An obvious exception I can think of are music exchange sites with their vast collection of audio (“mp3”) files but, for the majority of other sites, I feel justified in the simplifying assumption at the basis of this model. This modifies the model in the sense that the graph is now a colored graph, with nodes of two types: documents and images. Consequently, edges are of different types: document-to-document, document-to-image, and image-todocument (I will assume that the graph contains no image-to-image edges, which seems to be largely the case for the web). These types of edges derive from well identified syntactic constructs in the html code that constitutes the web, as shown in Table 1. Finally, we must consider the way in which images are modeled and in which they are associated to the text of the pages. Typically,3 a fragment the page is considered and encoded suitably in order to provide a textual description of the image. Typically, these parts include the so-called anchor text that surrounds the image, the title of the document containing the image, and its keywords (see below). A document, on the other hand, is indexed using the complete text of the page, regardless of the position of images. There are therefore two different circumstances in which a text-based index is extracted from a document: for the creation of a document index, associated to the whole document, and for the creation of an image text index for each image contained in the document. Each element in the index is a word, defined by a string, and has associated a weight. Defining it as a data type for inclusion in the database, each element in the index is a datum of type word, defined as: word : [text : string, w : R] †

(5)

Following the standard notation, for every pair (a, b), the two projections π1 and π2 are defined as π1 (a, b) = a and π2 (a, b) = b.

Documents and images are also data types in the database. Documents are characterized by an unique identifier and a set of words describing its contents: doc : [id : N, wds : {word}] (6) Images are also represented by a data type including a unique identifier, a text index, and a visual feature. At this time I am interested only in the text index portion of the description, so I will leave the visual features unspecified by making images a parametrized data type. Moreover, images are a sub-type of document, since they also have an identifier and a set of words associated with them: img(µ) = doc + [vis : µ]

(7)

Labels also can contain different types of information, but in the database they are required to specify at least a relevance measure between the documents that they join, which is a real value between 0 and 1. So, as a data type, the set of label Λ is also parametrized by whatever additional information is added, and is defined as Λ(µ) = [r : R, x : µ]

(8)

Relevance can be assigned either automatically at insertion time, as I will consider later, or updated explicitly using specific database operations. The following definition therefore applies: Definition 2.1. A web image database is a graph W (α, µ) = (D(α), E(W ), φ, l), where: • D is a set of documents of type D : set{d : doc|img} • E(W ) is the set of edges of the graph. • φ : L → D × D is the edge connection function • l : L → Λ(µ) is the labeling function, where Λ(µ) is the type of the labels. The data type of such a graph will be indicated as Γ(doc, Λ(µ)). In the following, the operations for the manipulation of the database will be defined but, first, I will consider briefly the construction of the text index and the visual features for the representation of the image contents.

3. CONSTRUCTION OF THE TEXT INDEX Following acquisition, the text collected in the pages and the image tags is analyzed to make it suitable for indexing. First, all uninformative words are removed. There are two classes of such words: the first is composed of very common words in the English language (such as “the,” “which,” and so on), while the second is composed of words widely used in web pages, such as “url,” “http,” “click,” “buy,” and so on. These words give no information about the content of a page, and are ignored. Common words removal is followed by stemming, a procedure that reduces the size of the index by eliminating variations of the same words. A typical example of stemming in English is the removal of the final “s” from plurals, so that, for instance, “books” and “book” will be considered as the same term. The system presented here uses Porter’s stemming algorithm,4 which is a rule based algorithm. The algorithm performs quite well, but it has the disadvantage (especially acute in web applications) that its set of rules is specific to the English language. It is therefore impossible to create a good multilingual index of the web. Other algorithms, based on the statistical occurrence of terminations5 can be made multilingual simply by using a multilingual dictionary and seem more promising for web applications. I am currently testing these algorithms. The terms produced by stemming are used directly as an index. I am gathering data on the effectiveness of this approach, to evaluate the possibility of using some more sophisticated indexing technique such as latent semantics.6 Indexing terms come from three separate areas of the page that links to the image under consideration: the anchor text of the tag (this include the alternate text for the image, if present), the collection of keywords contained in the tag of the page (if present), and the title of the page. These areas have different expected relevance for the characterization of the contents of the images. In particular, the considerations of the previous section point at a higher relevance of the anchor text. When the page is analyzed, the three different areas are summarized in three separate sets of terms, which then contribute differently to the final index.

Weighting. The three different areas of the page from which the index text is derived are assigned a priori “thematic” weights. In general, the anchor text is the most relevant for indexing, followed by the keywords and the title. I used the following three thematic weights: νa = 1.2/2 for the anchor text, νk = νt = 0.4/2 for the keywords and the title. Each term is also assigned a weight measuring its potential indexing quality, determined using probabilistic weighting.5 Let Tµ be a term to which a weight needs to be assigned, dfµ the term frequency of Tµ (that is, the number of documents in which Tµ appears as a term), and tfµk the term frequency of Tµ for image Ik (that is, the number of times Tµ appears as a term in the index of image Ik .) The basic term weight for Tµ in image Ik , βk (Tµ ) is then given by NI − dfµ (9) βk (Tµ ) = tfµk log dfµ where NI is the number of images in the database. This weight formula captures the two characteristics that make a term highly discriminative: 1. it appears in relatively few documents in the collection (low value of dfµ ), and 2. it appears many times in the image in question (high value of tfµk ). The weight of a term is then multiplied by the thematic weight corresponding to the section in which the term was found. If, in the index of image Ik , the term Tµ appears taµk times in the anchor text, tkµk times among the keywords, and ttµk times in the title, then the associated weight is αµ,k = α(Tµ , Ik ) = (taµk νa + tkµk νk + ttµk νt ) log

N − dfµ . dfµ

(10)

4. VISUAL REPRESENTATION Important information about the content of images can be derived by dividing an image in regions homogeneous with respect to a certain criterion (e.g. homogeneous with respect to texture, color, or other point features of the image). In particular, I use the edge flow segmentation algorithm, developed by Ma and Manjunath, and available on B.S. Manjunath’s web page. Each region is described by a suitable feature structure, typically including a histogram of its color (or a suitable statistical characterzation thereof), a texture description, a Fourier series description of its shape, and so on. I will not consider these descriptions in detail, but I will assume that the space X of region features is a vector space. Given a region r of an image, the region descriptor xr is used to describe it. The basic representation of an image is that of a labeled directed graph, in which nodes represent regions and edges are labeled with a representation of the relative positions of the regions they join. That is: Definition 4.1. A visual feature of an image I is a directed graph (R, E) such that: 1. for every element r ∈ R there is a region descriptor xr ; 2. for every edge e = (e1 , e2 ) there is an adjoint edge e∗ = (e2 , e1 ); 3. if e = (ra , rb ) then the edge e has label l(e) = sab eiθab where • sa,b ∈ [0, 1]; • sab = f (da,b ), where dab is the distance between the centroids of ra and rb , and f (x) : R+ → [0, 1] is a monotonically decreasing function with f (0) = 1; • θab is the angle between the horizontal and the line drawn from the centroid of ra to the centroid of rb ; 4. if l(e) = seiθ , then l(e∗ ) = se−iθ .

Figure 1. Graph representation of adjacent regions Giving an arbitrary order to the regions, the image can be represented by the region vector R = [xr1 , xr2 , . . . , xrn ]T = [x1 , x2 , . . . , xn ]T

(11)

(where the second notation will be used when no ambiguity arises), and the connection matrix   s11 eiθ11 s12 eiθ12 . . . s1n eiθ1n  s21 eiθ21 s22 eiθ22 . . . s2n eiθ2n    C=  .. .. ..   . . . iθn1 iθn2 iθnn sn1 e sn2 e . . . snn e

(12)

Note that the matrix is antisymmetric, that is, CT = −C. For instance, the image of Figure 1.a is represented by the graph of Figure 1.b which, choosing f (x) = 1/(x2 + 1) and assuming the distance between node 1 and node 2 as the measure unit, is represented by the matrix 

1

    

1 π 2 2e 1 − 66 π 2eπ 1 6 2e

0

1 −π 2 2e

1 − 56 π −π 6 π 2

1 2e 1 2e 4 13 e

1 −π 6 2e 1 π 6 e 2

1 − 56 π 2e 1 56 π 2e

0

1 0 0

0 1 0

0 0 1

4 −π 2 13 e

     

(13)

In addition to this, each region representation xr is collected into the matrix R. As usual in content based retrieval, the basic operation performed on these image representations is the determination of the similarity between two images, given a certain similarity funciton (which will, in many cases, depend on the query). The graph representation above introduces the complication that the result of the comparison of two images depends on the ordering of the nodes of the graphs. Checking all the possible ordering requires a time O(2n ), where n is the number of nodes of the graph that is, the number of regions in the images. Since this number can be fairly high, it is necessary to look for alternative representations. One such representation, invariant to the ordering of the nodes of the graph, is the spectral representation. The spectral representation is usually applied to the adjacency matrix of a graph but will be, in this case, applied to the relative position matrix. In particular, given the matrix C ∈ Cn×n , its singular value decomposition C = U SV ∗

(14)

is computed, where S = diag{λ1 , . . . , λn } is the matrix containing the eigenvalues of C. The eigenvalues of C constitute an ordering-independent representation of the relative position graph.

This representation still depends on the number of nodes in the graph that is, on the number of regions in the image. In order to eliminate this dependency, I resort to the very common technique of considering only the first k eigenvalues of C. Let  T if k ≤ n  [λ1 , . . . , λk ]  T   k−n z }| { (15) S|k =    if k > n   λ1 , . . . , λn , 0, . . . , 0

V|k

  v11     v21      .. =  .    v    n1 V |

v12 v22

... ...

v1k v2k .. .

    if k ≤ n 

vn2 . . . vnk 0

(16)

otherwise

and U|k defined similarly. The region descriptions are collected in a matrix R ∈ Rn×q , in which each row contains a vector of dimension q with the feature representation of a region. This representation also depends on the ordering of the region and, therefore, it is unsuitable for image comparison for the same reason for which the adjacency matrix was. Now, however, we have available a transformation from the original, order-dependent, regions space to the spectral, orderindependent, space, represented by the matrix V|k∗ . This transformation can be applied to the region matrix R, obtaining the ordering independent region description R|k = V|k∗ R

(17)

In the case of the image of Figure 1, for instance, the 3-representation of the graph is S|3 = [2.04, 1.22, 1.13]

(18)

An image is therefore represented by the vector S|k and by the matrix R|k . A distance function between two images can be derived from any distance function between matrices. For instance, the weighted Euclidean distance between two images is given by (I)

(J)

(I)

(J)

d(I, J; α) = αkS|k − S|k k + (1 − α)kR|k − R|k k

(19)

5. DATABASE QUERY AND NAVIGATION A web image and text database is a graph of type Γ(doc, Λ) and the goal of an image and text database is essentially to assist an user navigate this graph. In order to do this, the database must provide adequate data management operations. I will first introduce the abstract model of these graph operations, and then introduce the actual operators. For the graph structure management, the operators are largely based on those of the Aqua data model.7

5.1. General overview In addition to graphs, the model includes simple data types such as integers, floating point numbers, strings, and so on, sets, and functions. Functions are first class types, that is, they can be assigned values, be passed as parameters, and be returned by other functions. All expressions in the data algenra are composed of terms. A term is either • a variable, constant, or function symbol. The type of a function taking arguments of type T1 and producing results of type T2 will be represented as T1 → T2 ; • a lambda abstraction λ(x1 : T1 , . . . xn : Tn ).t : T where x1 , . . . , xn are variables, t is a term, and T, T1 . . . , Tn are data types;

(20)

• an application t0 (t1 : T1 , . . . , tn : Tn ) : T

(21)

where t1 , . . . , tn are terms and t0 is a function type. I will use the standard convention of representing operators using the infix notation, so the sum function +(a, b) will be represented equivalently as a + b. Also, as a shortcut, given a set of nodes N from a graph, the expression GC(N ) (GC for “Graph Completion”) will indicate the graph obtained taking the nodes in N and all the edges joining nodes in N , with their associated labels. A visual similarity function is a function f : µ × µ → [0, 1], where µ is the data type of the visual features of the images. A textual similarity function is a function f : doc × doc → [0, 1], which measures the text similarity between two documents. A similarity combination operator is a function 3 : (X × X → [0, 1]) × (X × X → [0, 1]) → (X × X → [0, 1])

(22)

which transforms two similarity functions into another similarity function. For example, the pointwise multiplication “·” defined as (f1 · f2 )(x, y) = f1 (x, y)f2 (x, y) is a combination. Informally, I will also use the same operator 3 to combine scores rather than scoring functions. In this case, the signature of the operator is 3 : [0, 1] × [0, 1] → [0, 1]. Given two documents d1 and d2 , a path between them is a list of edges e = [e1 , . . . , eq ] such that π1 (φ(e1 )) = d1 , π2 (φ(eq )) = d2 , and, for i = 2, . . . , q, π1 (φ(ei )) = π2 (φ(ei−1 )). The path similarity along a path p from document d1 to document d2 , given the score composition operator 3 is given by Σ3 (p) = (· · · ((l(p1 ).r3l(p2 ).r)3l(p3 ).r) · · · 3l(pq ).r) (23) Let P (d1 , d2 ) be the set of paths from d1 to d2 . The relevance similarity between d1 and d2 is the minimum over this set of the path similarities between d1 and d2 : Σ(3)(d1 , d2 ) =

min p∈P (d1 ,d2 )

Σ3 (p)

(24)

The relevance transitive closure of a document, with relevance r is the graph T (3)(d, r) = GC({d0 : Σ(3)(d, d0 ) ≥ r)

(25)

The downstream and upstream neighborhood functions (1) and (2) can be generalized to the k-downstream and k-downstream neighborhoods as ν(d) if k = 0 ν(d, k) = (26) ν(ν(d, k − 1)) otherwise and similarly for ν ∗ (d, k).

5.2. Algebra Operators This section introduces the operators that the database defines in order to work on the graphs. Graph Creation The following operators create and modify the web graph. graph : Γ(doc, Λ). The function graph[doc, Λ] creates an empty graph with nodes of type “doc” and edge labels of type Λ. append : Γ(doc, Λ) × Γ(doc, Λ) × doc × doc × Λ → Γ(doc, Λ). append(G, H, n1 , n2 , λ) with n1 ∈ nodes(G) n2 ∈ nodes(H) builds a graph whose node set is the union of the node sets of G and H, and whose node set is the union of the node sets of G and H with an additional edge from the node n1 of G to the node n2 of H. This edge has label λ.

insert : Γ(doc, Λ) × doc →: Γ(doc, Λ)×. Insert(G, n) inserts the node n into the graph without connecting it to other nodes. insert : Γ(doc, Λ) × doc × doc × λ →: Γ(doc, Λ)×. Insert(G, d1 , d2 , λ) inserts an edge with lael λ between the documents d1 and d2 . delete : Γ(doc, Λ) × doc → Γ(doc, Λ). The function delete(G, n), with n ∈ nodes(G) removes from the graph G the node n and all the edges connected to it. delete : Γ(doc, Λ) × E(G) → Γ(doc, Λ). The function delete(G, e) removes the edge e from the graph G. nodes : Γ(doc, Λ) → set{doc}. The function nodes(G) returns the set of nodes of the graph G. edges : Γ(doc, Λ) → set{E(G)}. The function edges(G) returns the set of edges of the graph G. type : doc → {img, txt}. Given a document, this function determines whether it is a text document or an image. σ : Γ(doc, Λ) × (doc → {true, f alse}) → Γ(doc, Λ)}. σ(G, λx.P x) returns the graph formed by all the nodes that satisfy the predicate P and all the edges connecting them. union : Γ(doc, Λ) × Γ(doc, Λ) → Γ(doc, Λ). The function union(G1 , G2 ) (also expressed in infix form as G1 ∪ G2 ) returns the union of the two graphs G1 and G2 . intersection : Γ(doc, Λ) × Γ(doc, Λ) → Γ(doc, Λ). The function intersection(G1 , G2 ) (also expressed in infix form as G1 ∩ G2 ) returns the intersection of the two graphs G1 and G2 . ν : doc × N → Γ(doc, Λ) This is the k-downstream neighborhood introduced in the previous section. ν ∗ : doc × N → Γ(doc, Λ) This is the k-upstream neighborhood introduced in the previous section. T C : (R × R → R) → doc × R → Γ(doc, Λ) This is the score transitive closure operator introduced in the previous section. Note that, in the definition, the operator is curried8 that is, applying it to a score combination operator 3 one obtains a function T (3) : doc × R → Γ(doc, Λ) that computes the score-transitive closure with respect to that operator. In addition to this, the database contains operations for querying sets of images described by features, either textual or visual. These operations were described elsewhere,9 and will not be described in detail here. The main operators are the scoring assigning function Σ(s), which assignes a score to all images in a set based on the scoring function s, the k-nearest neighbors operator σk# , the radius query operator σρ< , and the 3-join operator .

1 3

5.3. Examples The following examples illustrate the query process over a web image and text database. I will use a notation derived from ML8 with the addition of a comprehension syntax10,11 to express the results of the query. In this examples, I will make a rather informal use of similarity functions for visual features and for textual features. I will use the symbol sv for visual similarity functions and the symbol st for textual similarity function. I will also assume that combination operators will be available as needed. How to manage these operators, and how to get them out of the database is a topic beyond the scope of this paper. Example 1. Given a document d and an image i, find the image visually closest to i among those in the documents whose relevance for d is at least r. query q1(d, i, r) ⇒ let Q = TC(3)(d, r) I = σ(nodes(Q)), λx.(type(x) = img)) s = sv (i) in σk# (Σ(s, I)) end

The first line builds, in Q, the graph of the documents whose relevance for the given one is at least r. This graph will contain, in general, both text documents and images. The following step breaks the structure of the graph, by taking only the nodes, and selects only those nodes that are of type “image.” The third statement buids a scoring function given by a suitable similarity with the given image. Finally, the query statement takes the k nearest neighbors to the given image taken among the images of the set I. Example 2 Retrieve all the images from the pages pointing to document d that contain the word “mousetrap” with a weight of at least r query mtrap(d, r) ⇒ let U = σ(nodes(ν ∗ (d, 1)), λx.(type(x) == text)) P (d) = and{true|q ← d.wds, q.w > r, q.text = “mousetrap00 } D = {d|d ← U, P (d)} in {i|type(i) = img, i ← ν(d), d ← D} end The first step builds the set of documents that link to d, and extracts the set of nodes of type “text.” Note that the images themselves might contain the word “mousetrap,” but the query explicitly requests all the images from the pages pointing to document d so one must start by considering the full text of the pages, that is, the documents of type “text.” The second text defines a function that returns “true” if a document contains the word “mousetrap” with a weight of at least r. This is done using the comprehension syntax on the “and” monoid, whose details are beyond the scope of this paper.12 The third step uses again the comprehension syntax to build the set of documents that satisfy the predicate P . Finally, the query statement extracts all the images from these documents. Note that in the web model used in this paper, images are linked to documents that is, to go from a document to an image therein contained, one must traverse a link. This is the reason why the statement contains the function ν(d). Example 3 Find all images that either more similar than q to image i, or that belong to documents whose relevance for the page containing i is at least r. query rel(i, q, r) ⇒ let I = σq< (Σ(sv (i)) P = σ(nodes(ν ∗ (i, 1)), λx.(type(x) == text)) J = {d|d ← T C(3)(d0 ), type(d) == text, d0 ← P } in {i|i ← σ(ν(J, 1), λx.(type(x) == img)) ∪ I} end

6. CONCLUSIONS This paper has discussed some issues in the definition of a database in which images derive their meaning both from their visual features and from the structure of the web in which they are immersed. I have introduced a model of the web in which documents are of type image or of type text. Texts are characterized by word indices, while images are characterized both by word indices and by a visual feature. Text features are extracted using standard information retrieval techniques, while visual features are derived from a spectral decomposition of the region adjacency graph of the image. The resulting web graph is stored in the database. The paper proposed a data algebra for querying and navigating this graph using visual similarity, textual similarity, and relations induced by links.

The model presented here is rather schematic, and can be extended in several ways. One of the most obvious is to include additional information about pages that can be inferred from the structure of the world wide web, such as the “authoritativeness” of a page.? Such additions can be easily be incorporated and don’t alter significantly the structure of the model. For these reasons, they were not considered here.

REFERENCES 1. S. Santini, A. Gupta, and R. Jain, “Emergent semantics through interaction in image databases,” IEEE Transactions on Knowledge and Data Engineering , 2001. 2. S. Santini, Exploratory Image Databases; Content Based Retrieval, Academic Press, 2001. 3. S. Santini, “A query paradigm to discover the relation between text and images,” in Proceedings of SPIE Vol. 4315, Storage and Retrieval for Media Databases, 2001. 4. M. F. Porter, “An algorithm for suffix stripping,” Program 14(3), pp. 130–137, 1980. 5. W. B. Frakes and R. Baeza-Yates, eds., Information Retrieval, Data Structures and Algorithms, Prentice Hall, Englewood Cliffs, 1992. 6. S. Deerwester, S. Dumais, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American SOciety for Information Science 41, pp. 391–407, 1990. 7. B. Subramanian, S. Zdonik, T. Leung, and S. Vandemberg, “Ordered types in the aqua data model,” Tech. Rep. CS-93-10, Dept. of Computer Science, Brown University, 1993. 8. J. Ullman, Elements of ML Programming, Prentice Hall, 1994. 9. S. Santini and A. Gupta, “An extensible feature management engine for image retrieval,” in Proceedings of SPIE Vol. 4676, Storage and Retrieval for Media Databases, San Jose, 2002. 10. P. L. Wadler, “Comprehending monads,” in Proceedings of the 1990 ACM Conference on Lisp and Functional Programming, Nice, France, 1990. 11. P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong, “Comprehension syntax,” SIGMOD Record (ACM Special Interest Group on Management of Data) 23(1), pp. 87–96, 1994. 12. A. M. Alcantara and B. Buckles, “Supporting array types in monoid comprehensions.” 13. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Mining the link structure of the world wide web,” IEEE Computer 32(8), 1999.

Image Semantics induced by Web Structure - Semantic Scholar

Image Semantics induced by Web Structure - Semantic Scholar

Suggest Documents

Structure and Semantics of Data-Intensive Web ... - Semantic Scholar

Structure and Semantics of Data-Intensive Web ... - Semantic Scholar

Structure and Semantics of Data-Intensive Web ... - Semantic Scholar

Image semantics discovery from web pages for semantic-based image ...

Color Semantics for Image Indexing - Semantic Scholar

Semantics Based Web Services Discovery - Semantic Scholar

Bringing Semantics to the Web - Semantic Scholar

Hypertext Semantics for Web Applications - Semantic Scholar

Biomedical semantics in the Semantic Web - Semantic Scholar

Birefringence induced by irregular structure in ... - Semantic Scholar

Putting Semantics into the Semantic Web - Semantic Scholar

Tubular Structure Induced by a Plant Virus ... - Semantic Scholar

Spatial structure induced by marine reserves ... - Semantic Scholar

Probabilistic Web Image Gathering - Semantic Scholar

Functionality-Based Web Image Categorization - Semantic Scholar

Temporal Web Image Retrieval - Semantic Scholar

Neuroinflammation induced by ... - Semantic Scholar

Legal Semantics - Semantic Scholar

action semantics - Semantic Scholar

Dynamic Semantics - Semantic Scholar

GM1 structure determines SV40-induced ... - Semantic Scholar

Defect Induced Electronic Structure of ... - Semantic Scholar

Semantics Embodied - Semantic Scholar

Dynamic Semantics - Semantic Scholar