Jun 17, 2005 - Possible restriction to some domain (e.g. publications). .... Booking services. Mailing list .... With nonsense words. With domain words ...
Understanding the Hidden Web Pierre Senellart Introduction
Understanding the Hidden Web
Process description Discovery Wrappers
Pierre Senellart
Semantic Analysis Indexing and Querying
Summary
17 June 2005
The Hidden Web Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from the hyperlinked structure of the World Wide Web.
Semantic Analysis Indexing and Querying
Summary
Size estimate (2001) : 500 times larger than the surface Web. How to understand it and benefit from its content?
The Hidden Web Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from the hyperlinked structure of the World Wide Web.
Semantic Analysis Indexing and Querying
Summary
Size estimate (2001) : 500 times larger than the surface Web. How to understand it and benefit from its content?
The Hidden Web Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from the hyperlinked structure of the World Wide Web.
Semantic Analysis Indexing and Querying
Summary
Size estimate (2001) : 500 times larger than the surface Web. How to understand it and benefit from its content?
Understanding the Hidden Web Understanding the Hidden Web Pierre Senellart Introduction Process description
Purpose Intensional indexing of the Hidden Web High-level queries ⇒ a semantic search engine over the Hidden Web
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
In a fully, unsupervized, way!
Difficult and broad problem. Possible restriction to some domain (e.g. publications).
Understanding the Hidden Web Understanding the Hidden Web Pierre Senellart Introduction Process description
Purpose Intensional indexing of the Hidden Web High-level queries ⇒ a semantic search engine over the Hidden Web
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
In a fully, unsupervized, way!
Difficult and broad problem. Possible restriction to some domain (e.g. publications).
Understanding the Hidden Web Understanding the Hidden Web Pierre Senellart Introduction Process description
Purpose Intensional indexing of the Hidden Web High-level queries ⇒ a semantic search engine over the Hidden Web
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
In a fully, unsupervized, way!
Difficult and broad problem. Possible restriction to some domain (e.g. publications).
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
World Wide Web
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
World Wide Web
discovery
HTML form UDDI WSDL
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart
World Wide Web
discovery
HTML form UDDI WSDL
Introduction Process description
wrappers
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
WSDL
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart
World Wide World Wide Web Web
discovery
HTML form HTML form UDDI UDDI WSDL WSDL
Introduction
wrappers
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Analyzed Web Services analysis
WSDL WSDL
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart
World Wide World Wide Web Web
discovery
HTML form HTML form UDDI UDDI WSDL WSDL
Introduction
wrappers
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Analyzed Web Services analysis
Summary
indexing
Web Services Index
WSDL WSDL
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart
World Wide World Wide Web Web
discovery
HTML form HTML form UDDI UDDI WSDL WSDL
Introduction
wrappers
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Analyzed Web Services analysis
WSDL WSDL
Summary
indexing Queries Web Services Index
querying Results
Understanding the Hidden Web Pierre Senellart
1
Introduction
2
Process description Web Service Discovery Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3
Summary
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Discovery Understanding the Hidden Web Pierre Senellart
Crawling the World Wide Web for: HTML forms implementing a Web Service
Introduction
UDDI registries
Process description
WSDL descriptions
Discovery Wrappers Semantic Analysis Indexing and Querying
Other resources (XML, HTML, Web as a full-text index. . . )
Summary
Only interested in Web Services with no side effects: Ok
Not Ok
Yellow Pages
Booking services
Publication databases
Mailing list management
...
...
Web Service Discovery Understanding the Hidden Web Pierre Senellart
Crawling the World Wide Web for: HTML forms implementing a Web Service
Introduction
UDDI registries
Process description
WSDL descriptions
Discovery Wrappers Semantic Analysis Indexing and Querying
Other resources (XML, HTML, Web as a full-text index. . . )
Summary
Only interested in Web Services with no side effects: Ok
Not Ok
Yellow Pages
Booking services
Publication databases
Mailing list management
...
...
Web Service Discovery Understanding the Hidden Web Pierre Senellart
Crawling the World Wide Web for: HTML forms implementing a Web Service
Introduction
UDDI registries
Process description
WSDL descriptions
Discovery Wrappers Semantic Analysis Indexing and Querying
Other resources (XML, HTML, Web as a full-text index. . . )
Summary
Only interested in Web Services with no side effects: Ok
Not Ok
Yellow Pages
Booking services
Publication databases
Mailing list management
...
...
Web Service Discovery Understanding the Hidden Web Pierre Senellart
Crawling the World Wide Web for: HTML forms implementing a Web Service
Introduction
UDDI registries
Process description
WSDL descriptions
Discovery Wrappers Semantic Analysis Indexing and Querying
Other resources (XML, HTML, Web as a full-text index. . . )
Summary
Only interested in Web Services with no side effects: Ok
Not Ok
Yellow Pages
Booking services
Publication databases
Mailing list management
...
...
Understanding the Hidden Web Pierre Senellart
1
Introduction
2
Process description Web Service Discovery Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3
Summary
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Analyzing HTML forms Understanding the Hidden Web
Analyzing the structure of HTML forms.
Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues What are the relevant form fields? What is the concrete type of each field? What is the label of each field?
Analyzing HTML forms Understanding the Hidden Web
Analyzing the structure of HTML forms.
Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues What are the relevant form fields? What is the concrete type of each field? What is the label of each field?
Analyzing HTML forms Understanding the Hidden Web
Analyzing the structure of HTML forms.
Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues What are the relevant form fields? What is the concrete type of each field? What is the label of each field?
Probing HTML forms Understanding the Hidden Web Pierre Senellart Introduction
Probing HTML forms to retrieve sample HTML answer pages: With dictionary words
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
With nonsense words
Summary
With domain words
Probing HTML forms Understanding the Hidden Web Pierre Senellart Introduction
Probing HTML forms to retrieve sample HTML answer pages: With dictionary words
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
With nonsense words
Summary
With domain words
Probing HTML forms Understanding the Hidden Web Pierre Senellart Introduction
Probing HTML forms to retrieve sample HTML answer pages: With dictionary words
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
With nonsense words
Summary
With domain words
Query-answer webpages Understanding the Hidden Web
Extract data from query-answer webpages.
Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues What part of the webpage contains the answer? How to extract structured content? How to label this structured content?
Query-answer webpages Understanding the Hidden Web
Extract data from query-answer webpages.
Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues What part of the webpage contains the answer? How to extract structured content? How to label this structured content?
Query-answer webpages Understanding the Hidden Web
Extract data from query-answer webpages.
Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues What part of the webpage contains the answer? How to extract structured content? How to label this structured content?
R OAD RUNNER Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Example: R OAD RUNNER information extraction engine
Understanding the Hidden Web Pierre Senellart
1
Introduction
2
Process description Web Service Discovery Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3
Summary
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Conceptual Model Understanding the Hidden Web Pierre Senellart
IsA ontology of concepts (simple DAG) Thing
Introduction Process description Discovery
Person
Publication
Wrappers Semantic Analysis Indexing and Querying
Man
Woman
Proceedings
Article
Summary
n-ary typed roles AuthorOf(Person,Publication) HasName(Person,Name)
Book
Conceptual Model Understanding the Hidden Web Pierre Senellart
IsA ontology of concepts (simple DAG) Thing
Introduction Process description Discovery
Person
Publication
Wrappers Semantic Analysis Indexing and Querying
Man
Woman
Proceedings
Article
Summary
n-ary typed roles AuthorOf(Person,Publication) HasName(Person,Name)
Book
Semantic representation of a service Understanding the Hidden Web Pierre Senellart
What is a service described by?
Introduction
A n-uple of typed input parameters
Process description
A complex (= nested) type of its output
Discovery Wrappers
Semantic relations between inputs and outputs
Semantic Analysis Indexing and Querying
Summary
Definition (Complex types) S: set of concepts T ←− S||T ∗
Semantic representation of a service Understanding the Hidden Web Pierre Senellart
What is a service described by?
Introduction
A n-uple of typed input parameters
Process description
A complex (= nested) type of its output
Discovery Wrappers
Semantic relations between inputs and outputs
Semantic Analysis Indexing and Querying
Summary
Definition (Complex types) S: set of concepts T ←− S||T ∗
Semantic representation of a service Understanding the Hidden Web Pierre Senellart
What is a service described by?
Introduction
A n-uple of typed input parameters
Process description
A complex (= nested) type of its output
Discovery Wrappers
Semantic relations between inputs and outputs
Semantic Analysis Indexing and Querying
Summary
Definition (Complex types) S: set of concepts T ←− S||T ∗
Semantic representation of a service Understanding the Hidden Web Pierre Senellart
What is a service described by?
Introduction
A n-uple of typed input parameters
Process description
A complex (= nested) type of its output
Discovery Wrappers
Semantic relations between inputs and outputs
Semantic Analysis Indexing and Querying
Summary
Definition (Complex types) S: set of concepts T ←− S||T ∗
Services and queries Understanding the Hidden Web Pierre Senellart Introduction Process description
Example Service giving authors from publication titles A*← WrittenBy(P,A),HasTitle(P,T),Input(T)
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Example Query: *← WrittenBy(P,A), Article(P), HasTitle(P,T), KeywordOf(“xml”,P)
Services and queries Understanding the Hidden Web Pierre Senellart Introduction Process description
Example Service giving authors from publication titles A*← WrittenBy(P,A),HasTitle(P,T),Input(T)
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Example Query: *← WrittenBy(P,A), Article(P), HasTitle(P,T), KeywordOf(“xml”,P)
Managing extensional information Understanding the Hidden Web Pierre Senellart Introduction
How to represent extensional information (i.e. documents) in this formalism?
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Definition A document is a service with no input.
Summary
Complex types: natural representation of a DTD. (Disjunctions a|b simulated by (a?,b?)).
Semantic Interpretation of a Service Understanding the Hidden Web Pierre Senellart
How to analyze a Web Service?
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Field labels, variable names, tag names Concrete type descriptions (e.g. \d{4}-\d{2}-\d{2} is a date) Linguistic analysis of plain text descriptions and pages linking to the service Note: Extracting concepts is easier than extracting relations between concepts.
Semantic Interpretation of a Service Understanding the Hidden Web Pierre Senellart
How to analyze a Web Service?
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Field labels, variable names, tag names Concrete type descriptions (e.g. \d{4}-\d{2}-\d{2} is a date) Linguistic analysis of plain text descriptions and pages linking to the service Note: Extracting concepts is easier than extracting relations between concepts.
Semantic Interpretation of a Service Understanding the Hidden Web Pierre Senellart
How to analyze a Web Service?
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Field labels, variable names, tag names Concrete type descriptions (e.g. \d{4}-\d{2}-\d{2} is a date) Linguistic analysis of plain text descriptions and pages linking to the service Note: Extracting concepts is easier than extracting relations between concepts.
Semantic Interpretation of a Service Understanding the Hidden Web Pierre Senellart
How to analyze a Web Service?
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Field labels, variable names, tag names Concrete type descriptions (e.g. \d{4}-\d{2}-\d{2} is a date) Linguistic analysis of plain text descriptions and pages linking to the service Note: Extracting concepts is easier than extracting relations between concepts.
Semantic Interpretation of a Service Understanding the Hidden Web Pierre Senellart
How to analyze a Web Service?
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Field labels, variable names, tag names Concrete type descriptions (e.g. \d{4}-\d{2}-\d{2} is a date) Linguistic analysis of plain text descriptions and pages linking to the service Note: Extracting concepts is easier than extracting relations between concepts.
Understanding the Hidden Web Pierre Senellart
1
Introduction
2
Process description Web Service Discovery Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3
Summary
Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Indexing and Querying Understanding the Hidden Web Pierre Senellart Introduction Process description
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues Subsumption of input/output parameters Missing input parameters Composition of webservices
Web Service Indexing and Querying Understanding the Hidden Web Pierre Senellart Introduction Process description
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues Subsumption of input/output parameters Missing input parameters Composition of webservices
Web Service Indexing and Querying Understanding the Hidden Web Pierre Senellart Introduction Process description
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues Subsumption of input/output parameters Missing input parameters Composition of webservices
Web Service Indexing and Querying Understanding the Hidden Web Pierre Senellart Introduction Process description
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Issues Subsumption of input/output parameters Missing input parameters Composition of webservices
Differences with classical database querying Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Three main differences: Information can be queried only through views (Local As View)
Semantic Analysis Indexing and Querying
Summary
Nested types Incomplete information Three sources of complexity!
Differences with classical database querying Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Three main differences: Information can be queried only through views (Local As View)
Semantic Analysis Indexing and Querying
Summary
Nested types Incomplete information Three sources of complexity!
Differences with classical database querying Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Three main differences: Information can be queried only through views (Local As View)
Semantic Analysis Indexing and Querying
Summary
Nested types Incomplete information Three sources of complexity!
Differences with classical database querying Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Three main differences: Information can be queried only through views (Local As View)
Semantic Analysis Indexing and Querying
Summary
Nested types Incomplete information Three sources of complexity!
Differences with classical database querying Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers
Three main differences: Information can be queried only through views (Local As View)
Semantic Analysis Indexing and Querying
Summary
Nested types Incomplete information Three sources of complexity!
Web Service Semantic Interpretation Process Understanding the Hidden Web Pierre Senellart
World Wide World Wide Web Web
discovery
HTML form HTML form UDDI UDDI WSDL WSDL
Introduction
wrappers
Process description Discovery Wrappers Semantic Analysis Indexing and Querying
Analyzed Web Services analysis
WSDL WSDL
Summary
indexing Queries Web Services Index
querying Results