Sep 23, 2015 ... Relevance: ▷ semantic content based image retrieval via a query language; ...
an ontology provides the formal semantics and constraints.
Ontology-Based Semantic Image Interpretation Ivan Donadello1,2 1 Fondazione 2 DISI
Luciano Serafini (Advisor)1
Bruno Kessler, Via Sommarive, 18 I-38123, Trento, Italy
University of Trento, Via Sommarive, 9 I-38123, Trento, Italy
September 23, 2015
1 / 31
Context
I
Huge diffusion of digital images in recent years;
I
lack of semantic based retrieval systems for images, that is no complex queries: “a person riding a horse on a meadow”;
I
semantic gap between numerical image features and human semantics;
I
need a method that automatically understands the semantic content of images.
Relevance: I
semantic content based image retrieval via a query language;
I
semantic content enrichment with Semantic Web resource.
2 / 31
Problem Statement Semantic Image Interpretation (SII) is the task of extracting a graph representing the image content;
3 / 31
Problem Statement Semantic Image Interpretation (SII) is the task of extracting a graph representing the image content; I nodes represent visible and occluded objects in the image and their properties;
3 / 31
Problem Statement Semantic Image Interpretation (SII) is the task of extracting a graph representing the image content; I nodes represent visible and occluded objects in the image and their properties; I arcs represent relations between objects;
3 / 31
Problem Statement Semantic Image Interpretation (SII) is the task of extracting a graph representing the image content; I nodes represent visible and occluded objects in the image and their properties; I arcs represent relations between objects; I alignment between visible object regions and nodes;
3 / 31
Problem Statement Semantic Image Interpretation (SII) is the task of extracting a graph representing the image content; I nodes represent visible and occluded objects in the image and their properties; I arcs represent relations between objects; I alignment between visible object regions and nodes; I an ontology provides the formal semantics and constraints that guide the graph construction;
3 / 31
Aim of the Doctoral Thesis
I
Define a theoretical reference framework for SII;
I
implementation of a system for SII; graph construction guided by mixing:
I
I I
I
numeric information (low-level features of the image); symbolic information (high-level constraints available in the ontology);
perform system evaluation on a ground truth of semantically interpreted images.
4 / 31
State-of-the-art on SII Logic-Based Works (2014) I
I
a first description of the image (basic object recognition and their relations) is given;
Neural Networks-based (NN) works (2015) I
Caption generation;
model generation (deduction or abduction) by exploiting the ontology.
5 / 31
State-of-the-art on SII Logic-Based Works (2014) I
I
a first description of the image (basic object recognition and their relations) is given;
Neural Networks-based (NN) works (2015) I
Caption generation;
model generation (deduction or abduction) by exploiting the ontology.
Limitations I
Logic-based works: no consideration for low-level features;
I
NN works: no formal semantics and a priori knowledge. 5 / 31
SII Pipeline
6 / 31
SII Pipeline
7 / 31
SII Pipeline
8 / 31
Our Vision of SII Finding the maximum of a joint search space composed of semantic features and image features.
9 / 31
Theoretical Framework Background Knowledge encoded in a Description Logic ontology O.
Labelled picture is a pair P = hS, Li where S are segments of the image, L are (weighted) labels from Σ.
10 / 31
The Partial Model I
A picture is a partial view of the real world;
I
A partial model Ip is a structure that can be extended to a model of O;
11 / 31
The Partial Model I
A picture is a partial view of the real world;
I
A partial model Ip is a structure that can be extended to a model of O; . A partial model of an ontology O is an interpretation Ip = (∆Ip , ·Ip ) of O: there exists a model I = (∆I , ·I ) with ∆Ip ⊆ ∆I and ·Ip is a restriction of ·I on ∆Ip .
I
11 / 31
The Partial Model I
A picture is a partial view of the real world;
I
A partial model Ip is a structure that can be extended to a model of O; . A partial model of an ontology O is an interpretation Ip = (∆Ip , ·Ip ) of O: there exists a model I = (∆I , ·I ) with ∆Ip ⊆ ∆I and ·Ip is a restriction of ·I on ∆Ip . A semantically interpreted picture is a triple (P, Ip , G)O ;
I
I
11 / 31
The Most Plausible Partial Model Many partial models for a picture
Searching for the partial model that best fits the picture content, i.e. the most plausible partial model. 12 / 31
The Semantic Image Interpretation Problem Formalization I
A cost function S assigns a cost to semantically interpreted pictures (P, Ip , G)O ;
I
S(P, Ip , G)O expresses the gap between low-level features of P and objects and relations encoded in Ip ;
I
the most plausible partial model Ip∗ minimizes S: Ip∗ = argmin S(P, Ip , G)O Ip |=p O
G⊆∆Ip ×S
I
the semantic image interpretation problem is the construction of (P, Ip∗ , G)O that minimizes S.
13 / 31
Case Study: Clustering-Based Cost Function
I
I
Task: part-whole recognition, i.e., discovery complex objects from their parts; part-whole recognition can be seen as a clustering problem; I
parts of the same object tend to be grouped together;
14 / 31
Case Study: Clustering-Based Cost Function
I
I
Task: part-whole recognition, i.e., discovery complex objects from their parts; part-whole recognition can be seen as a clustering problem; I
I
parts of the same object tend to be grouped together;
cost function as a clustering optimisation function.
14 / 31
Case Study: Clustering-Based Cost Function I
Clustering: grouping a set of input elements into groups (clusters) such that:
15 / 31
Case Study: Clustering-Based Cost Function I
Clustering: grouping a set of input elements into groups (clusters) such that:
I
clustering solution of (P, Ip , G)O is C = {Cd | d ∈ ∆Ip } where Cd = {G(d 0 ) | d 0 ∈ ∆Ip , hd, d 0 i ∈ hasPartIp };
I
d represents the composite object, the centroid of the cluster; 15 / 31
Case Study: Clustering-Based Cost Function Mixing numeric and semantic features: I
grounding distance δG (d, d 0 ): the Euclidean distance between the centroids of G(d) and G(d 0 );
I
semantic distance δO (d, d 0 ) is the shortest path in O:
I I
if Muzzle(d 0 ), Tail(d 00 ) then δO (d 0 , d 00 ) = 2; if Muzzle(d 0 ), Horse(d) then δO (d 0 , d) = 1;
16 / 31
Case Study: Clustering-Based Cost Function I
Inter-cluster distance Γ:
I
Intra-cluster distance Λ:
I
Cost function: S(P, Ip , G)O = α · Γ + (1 − α) · Λ 17 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
18 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
19 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
20 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
21 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
22 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
23 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
24 / 31
Minimising the Cost Function
The Clustering Part-Whole Algorithm (CPWA) approximates the minimum of the cost function.
25 / 31
Evaluation Comparing the predicted partial model with the ground truth, two measures: I grouping (GRP):
26 / 31
Evaluation Comparing the predicted partial model with the ground truth, two measures: I grouping (GRP):
I
complex-object type prediction (COP):
26 / 31
Evaluation Comparing the predicted partial model with the ground truth, two measures: I grouping (GRP):
I
complex-object type prediction (COP):
I
precision, the fraction of predicted pairs that are correct; recall, the fraction of correct pairs that are predicted.
I
26 / 31
Experiments and Results Experiments Setting I
Ground truth of 203 manually obtained labelled pictures on the urban scene domain;
I
manually built ontology with basic formalism of meronymy of the domain;
I
task: discovering complex objects from their parts in pictures.
Results
CPWA
precGRP 0.61
recGRP 0.89
F 1GRP 0.67
precCOP 0.73
recCOP 0.75
F 1COP 0.74
27 / 31
Experiments and Results Experiments Setting I
Ground truth of 203 manually obtained labelled pictures on the urban scene domain;
I
manually built ontology with basic formalism of meronymy of the domain;
I
task: discovering complex objects from their parts in pictures.
Results
CPWA Baseline I
precGRP 0.61 0.45
recGRP 0.89 0.71
F 1GRP 0.67 0.48
precCOP 0.73 0.66
recCOP 0.75 0.69
F 1COP 0.74 0.66
Baseline: clustering without semantics; 28 / 31
Experiments and Results Experiments Setting I Ground truth of 203 manually obtained labelled pictures on the urban scene domain; I manually built ontology with basic formalism of meronymy of the domain; I task: discovering complex objects from their parts in pictures. Results
CPWA++ CPWA Baseline I I
precGRP 0.67 0.61 0.45
recGRP 0.81 0.89 0.71
F 1GRP 0.71 0.67 0.48
precCOP 0.71 0.73 0.66
recCOP 0.82 0.75 0.69
F 1COP 0.86 0.74 0.66
Baseline: clustering without semantics; CPWA + +: improved version of CPWA; 29 / 31
Conclusions and Future Work
I
Theoretical framework for SII: partial model that minimizes a cost function;
I
cost function as a clustering optimization function;
I
clustering algorithm that approximates the cost function;
I
explicitly using semantics improves the results; future work:
I
30 / 31
Conclusions and Future Work
I
Theoretical framework for SII: partial model that minimizes a cost function;
I
cost function as a clustering optimization function;
I
clustering algorithm that approximates the cost function;
I
explicitly using semantics improves the results; future work:
I
I I I I
integrating of semantic segmentation algorithms; generalizing to other relations; extending the evaluation to a standard dataset; using general purposes ontologies;
30 / 31
Thanks for listening Questions?
31 / 31