OWLDL semantics in order to perform advanced reasoning tasks. We will rely on pathways and chemicals elements. We will get into trouble. Don't panic!
Bioontologies tutorial
Olivier Dameron, Julie Chabalier
UPRESEA 3888 – Université Rennes 1 (France) http://www.ea3888.univrennes1.fr DILS conference (June 25-27 2008)
Slide 1
License
Copyright (c) 2008 Olivier Dameron and Julie Chabalier, Université Européenne de Bretagne, Rennes. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no FrontCover Texts, and no BackCover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". For saving paper, the text of the license is available at http://www.gnu.org/copyleft/fdl.html
Slide 2
Credits Matthew Horridge, Holger Knublauch, Nick Drummond and the Protégé and COODE teams A practical guide to building OWL ontologies using the ProtégéOWL plugin and COODE tools
Natasha Noy, Alan Rector W3C “Semantic Web best practice” working group
Robert Stevens, Carole Goble Slide 3
Goal Why do we (sometimes) need ontologies? In life sciences
What are ontologies by the way? How can I create my own ontology or extend an existing one? How can I use an ontology?
Slide 4
Outline Ontologies in life sciences Basic principles Classes, individuals, properties (aka relations) Taxonomy
Advanced features and reasoning Combining classes (conjunction, disjunction, negation) Describing classes with restrictions (quantification, cardinality, hasValue) The open world assumption
Slide 5
Disclaimer: what this tutorial is not How to use a particular ontology editor? Ontologies best practice Advanced course on Description Logics, decidability, etc
Slide 6
Part 1 Ontologies in life sciences
Slide 7
Context Sequencing projects and highthroughput technologies (microarray) Huge quantity of data
DNA sequence Protein sequence Micro-array experiment result
Slide 8
Data interpretation Associate a meaning to these data (annotation) BRCA1 protein Involved in breast cancer
DNA sequence Micro-array experiment result
Protein sequence
Slide 9
Knowledge sources Numerous & Scattered on the Web Ex: More than 200 databases in the field of metabolic and signaling pathways
?
Slide 10
Knowledge sources • Different point of view • Independent • Semantically heterogeneous Ex: domain (structural / functional ?)
?
Slide 11
Problems • Difficult to obtain a global vision of the available knowledge • Difficult to develop automatic methods using this knowledge in order to interpret data
Slide 12
Needs
• Shared vocabulary to annotate the data • This vocabulary must be understood by a computer in order to automate the treatments
Slide 13
Ontologies • Define the concepts of a domain and the relations between these concepts Conceptualization • Use a formal language to represent an ontology so that a computer can exploit the modeled knowledge Formalization
Slide 14
Bioontologies
• Gene Ontology: annotation of gene products • BioPAX: integration of biological pathways data
Slide 15
Gene Ontology biological process
cellular component
molecular fonction
is_a organelle developmental process
binding intracellular
transcription regulator activity multicellular organismal development
anatomical structure development
cell
nucleic acid binding
membranebound organelle
Intracellular organelle
DNA binding system development
nervous system development
transcription factor activity
intracellular membranebound organelle
part_of
nucleus
Slide 16
Annotation biological process
cellular component
molecular fonction
is_a organelle developmental process
binding intracellular
transcription regulator activity multicellular organismal development
anatomical structure development
cell
nucleic acid binding
membranebound organelle
Intracellular organelle
DNA binding system development
nervous system development
transcription factor activity
intracellular membranebound organelle
part_of
nucleus
Transcription factor AP-2 beta (AP2B) Gene product
Gene product annotation Slide 17
Gene Ontology Annotation Relations between gene products (UniProt) and GO terms Multispecies: human – mouse – rat – zebrafish – arabidopsis – chicken – cow UniProt identifier
Protein symbol
GO identifier
Evidence code
Protein name
GO hierarchy http://www.ebi.ac.uk/GOA/
Slide 18
GOA evidence codes How the relation between a gene product and a particular term is supported • Physical characterization of a gene product that has supported the association of a GO term • in silico analysis • Based on a statement made by the authors in the reference cited
• Based on a curatorial judgment that does not fit into one of the other evidence code classifications
• Without curatorial judgment Slide 19
Gene Ontology browser
http://amigo.geneontology.org/
Slide 20
Gene Ontology browser
http://amigo.geneontology.org/
Slide 21
Gene Ontology browser
Slide 22
Gene Ontology and sources Cross references
Slide 23
Gene Ontology and sources Cross references
Slide 24
Gene Ontology and sources
Slide 25
Gene Ontology and sources
Slide 26
GO and data interpretation Response to stimulus
Defense response Immune response
Slide 27
GO tools 103 tools on the GO web site alone!
48 tools
2002 publications referred to GO! • Use of GO in gene expression studies (955) • Use of GO in clinical application (554) • …
http://www.geneontology.org/GO.tools.shtml Slide 28
BioPAX BIOlogical PAthway eXchange A data exchange ontology for biological pathway integration
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
http://www.biopax.org/
Slide 29
BioPAX 220 pathway databases • Different data models (semantic) • Different formats (syntax) • Different data access methods
Extremely difficult to combine and use pathway data
Slide 30
BioPAX Motivation Application
Database User Before BioPAX
With BioPAX
Improve data accessibility and support data sharing Slide 31
BioPAX Structure is a contain s
Pathway
Entity
Interaction
Physical Entity
• Pathway: a set of interactions Glycolysis, MAPK, Apoptosis… • Interaction: a basic relationship between a set of entities Reaction, Molecular Association, Catalysis… • Physical Entity: a building block of simple interactions Small molecule, Protein, DNA, RNA Slide 32
BioPAX Structure Entity
Interaction
Physical interaction
Control
Catalysis
Modulation
Pathway
Physical Entity
SmallMolecule
Protein
Complex
RNA
DNA
Conversion
BiochemicalReaction
Transport
ComplexAssembly
TransportWithBiochemicalReaction
Slide 33
BioPAX Structure Entity
Interaction
Pathway
Physical Entity
glycolysis
Physical interaction
SmallMolecule
Protein
Complex
RNA
DNA
glucose-6-p fructose-6-p
Control
Catalysis
Modulation
catalysis of glucose-6-p to fructose-6-p
Conversion
BiochemicalReaction
Transport
ComplexAssembly
glucose-6-p to fructose-6-p
TransportWithBiochemicalReaction
Slide 34
Pathway data KEGG1
BioPAX Reactome2
BioCyc3
Data integration 1
Kyoto Encyclopedia of Genes and Genomes: http://www.genome.jp/kegg/
2
Reactome: http://www.reactome.org/
3
BioCyc: http://biocyc.org/
Slide 35
Conclusion Current state of bioontologies • Success of the first bioontologies Ex: More than 2000 GO referring publications
• Biological domain conceptualization is a work in progress
Ontologies are also (mostly?) for computer • How to make ontologies computer processable? • How to use ontologies efficiently?
Slide 36
Part 2 Basic principles for creating an ontology
Slide 37
Objective Defining the basic elements of bio ontologies and the way they should be organized We will rely on molecules and chemical reactions for seeing: how to build a simple ontology how to use this ontology for simple reasoning task Slide 38
2.1 Getting started
Slide 39
Getting started 1. Dowload Protégé Version 3.4 beta501 (http://protege.stanford.edu) 2. Get some documentation http://protege.stanford.edu/doc/users.html
OWL Tutorial : http://www.coode.org Wiki: http://protege.cim3.net/cgibin/wiki.pl http://protegewiki.stanford.edu
Mailing lists
Slide 40
Getting started 3. Use the chemistryv1.owl ontology from: http://www.ea3888.univ rennes1.fr/dameron/dils2008/ 4. Launch Protégé 5. Select “Open OWL ontology” 6. Retrieve your local copy of the ontology
Slide 41
2.2 The OWL language
Slide 42
OWL: Web Ontology Language W3C recommendation since 2004 (OWL 1.0) 3 flavors: OWLFull: rich and flexible but undecidable OWLDL: decidable fragment of OWLFull (e.g. no metaclass) OWLLite: fragment of OWLDL (e.g. only named superclasses)
Logicallydefined semantics allowing reasoning OWLLite and OWLDL belong to the Description Logics family Several reasoners are available (Pellet, Fact++, RacerPro) Slide 43
OWL: Web Ontology Language Advantages over other formalisms (OBO,...) Precise semantics Part of the Semantic Web framework Build up on existing technics (URIs, ...) Can be extended by additional technics (rules, ...) Reuse third party tools (editors, reasoners, ...) Conversion from OBO to OWL possible (but you do not take advantage of OWL's potential)
Slide 44
Individuals Individuals are the smallest elements who constitute a world. There are no subindividuals Individuals have an identity and can be counted They are fundamental for understanding the semantics of DL... ... but you hardly use them when building ontologies because you focus on general knowledge Slide 45
Classes A class is a set of individuals (its instances) We do not have to know explicitly all the individuals who instantiate a class Usually, ontologies do not have instances because they focus on general knowledge Classes are used in intension (e.g. “Student is a Person who attends a school or an university” asserts a proposition about all the past, current, future students in this world or in a parallel one)
An individual can instantiate several classes
Slide 46
Classes A class is a set of individuals (its instances) Special classes: top () = owl:Thing i.e. set of all the individuals bottom (⊥) = empty set
Can be combined using set operators subset (subsumption) disjoint sets union intersection complement Slide 47
Classes: subsumption T A B A B : all the instances of A are instances of B (A is subClass of B) Slide 48
Classes: subsumption (menu)
Slide 49
Classes: subsumption (menu)
Slide 50
Classes: subsumption (wizard)
Slide 51
Classes: subsumption (wizard)
Slide 52
Classes: subsumption
Slide 53
Classes: disjointness T A B By default, any individual MAY be an instance of any classes => partial overlap of classes is assumed
Slide 54
Classes: disjointness Sibling classes should not be disjoint by default Ion, Molecules and Atoms: some ions are also Molecules, some others are also Atoms (Conversely, some atoms and some molecules are ions) MoleculeContainingChlorine and MoleculeContainingSodium should not be disjoint either: some molecule can contain both chlorine and sodium Slide 55
Classes: disjointness Sibling classes should not be disjoint by default But in certain situations, we know that some classes are disjoint. Until we make it explicit, a reasoner will legitimately assume that the two classes can share instances Examples of disjoint classes: Molecule – Atom All the subclasses of NamedMolecule All the subclasses of ChemicalElement Slide 56
Slide 57
Slide 58
Classes Cumulative approach: combine classes using set operators (union, intersection, complement) express constraints define complex concepts
Intensional approach: describe the characteristics of a class and the system will automatically: recognize that an individual is an instance of it recognize that it is a subclass or a superclass of another class Slide 59
Properties A property is a relationship for describing individuals An object property is a relationship from instances of some class(es) to instances of some (other) class(es) A datatype property is a relation from instances of some class(es) to a datatype such as string, integer, float, date, ... An annotation property can be any relation applied to individuals, classes or properties. They are useful for humans but are ignored by reasoners
Slide 60
Object properties
Relationships from instances of some class(es) to instances of (other) class(es) It is possible to describe: the domain: the class(es) of the individuals the property applies to the range: the class(es) of the individuals the property leads to features: transitivity, symmetry, functional the inverse (reciprocity) the super and subproperties Slide 61
Object properties (cont.) R
●
● ● ● ● ● ●
(Domain of R)
(Range of R) Slide 62
Object properties (cont.) BiologicalProcess
Molecule actsOn
●
● ● ● ● ● ●
(Domain of actsOn)
(Range of actsOn) Slide 63
Slide 64
Inverse of properties If x.R1.y y.R2.x then R1 is the inverse of R2 (and viceversa) ex: isChildOf / isParentOf
The computer cannot guess it! If R1 and R2 exist: “add inverse” If R2 does not exist, “create inverse” Slide 65
Slide 66
Slide 67
Properties and subproperties Object properties can be specialized The domain of the subproperty is either The domain of the superproperty... ... or a subclass of the domain of the superproperty
The range of the subproperty is either The range of the superproperty... ... or a subclass of the range of the superproperty Slide 68
Properties and subproperties Create the following properties and their inverse: hasPart (Phys.Mat.Entity > Phys.Mat.Entity) hasDirectPart (Phys.Mat.Entity > Phys.Mat.Entity) hasAtom (Molecule > Atom)
Slide 69
Slide 70
Slide 71
Slide 72
Object properties (cont.) transitive (e.g. hasAncestor, isAncestorOf...) x.R.y ∧ y.R.z ⇒ x.R.z range has to be a subclass of the domain hasAncestor can be the transitive superproperty of nontransitive subproperties such as hasFather, hasMother...
Slide 73
Object properties (cont.) symmetric (e.g. isNeighborOf, isSiblingOf...) x.R.y ⇒ y.R.x range = domain isSiblingOf is symmetric, but its isBrotherOf and isSisterOf subproperties are not
Slide 74
Object properties (cont.) functional (e.g. hasMother...) each instance of the domain is linked to at most one instance of the range through the relation
inversefunctional (e.g. isMotherOf, isAtomOf...) just means that the inverse of the relation is functional two different instances from the domain cannot be linked to the same instance of the range through the relation
some properties can be both functional and inverse functional: hasSSN... (and therefore isSSNof) Slide 75
Inv. functional
Functional
Symmetric
Transitive hasPart
X
hasDirectPart hasAtom isPartOf
X X
isDirectPartOf isAtomOf
X Slide 76
Object properties (cont.) More fun with object properties features hasSibling: transitive? symmetric? functional? hasBrother: idem? hasSister: idem?
Slide 77
Datatype properties Relationships from instances of classes to datatype values (string, numbers, dates...) It is possible to describe: the domain: the class(es) of the individuals the property applies to the range: the type of the values the property leads to the super and subproperties the functional feature Slide 78
Slide 79
Annotation properties Relationships from instances of classes to whatever, beyond the scope of OWLDL and ignored by reasoners Cannot be used for representing constraints Use them when you want to describe things that are beyond the scope of OWLDL
Slide 80
2.3 Summary
Slide 81
Summary Ontologies are composed of classes, properties and individuals Classes are sets of individuals and are organized in taxonomies (subsumption hierarchies) Properties are relations and can also be specialized
Slide 82
Summary So far, limited reasoning capabilities: Is A a subclass of B? Is B a superclass of A? Is x an instance of class A? What are the instances of class A (and of its subclasses)?
Maintaining large ontologies can be difficult Multiple inheritance Dependencies between classes Slide 83
Part 3 Advanced features and reasoning
Slide 84
Objective Acquiring an indepth understanding of the OWLDL semantics in order to perform advanced reasoning tasks We will rely on pathways and chemicals elements for: a better formalization of the domain knowledge leveraging OWLDL reasoning capabilities for an easier curation of the ontology for a better use of the knowledge Slide 85
Objective Acquiring an indepth understanding of the OWLDL semantics in order to perform advanced reasoning tasks We will rely on pathways and chemicals elements We will get into trouble Don't panic!
Slide 86
Glucolysis from Gene Ontology
Slide 87
3.1 Combining classes
Slide 88
Objective Combine classes using the OR, AND and NOT operators Refer to the semantics of these operators (and avoid some basic mistakes) => find out which molecules contain: Sodium and chlorine Sodium or chlorine Both metals and not metals Slide 89
Getting in sync! If you need to catchup, the ontology at this point is chemistryv2.owl from: http://www.ea3888.univrennes1.fr/dameron/dils2008/
Slide 90
Prerequisite Molecule MoleculeContainingChlorine MoleculeContainingSodium
NOT DISJOINTS
NamedMolecule HydrogenChloride SodiumHydroxide
DISJOINTS
SodiumChloride
Don't worry about the atoms, this is the next step! Slide 91
AND (Intersection) Create MoleculeContainingChlorineAndSodium as a subclass of Molecule so far, except for the name, we have not provided any meaning we have not exploited the cumulative approach
Add the necessary condition: MoleculeContainingCl MoleculeContainingNa Classify Slide 92
Slide 93
Slide 94
is equivalent to:
... but the reasoning would have been trivial :) Slide 95
AND (Intersection)
A
A B
B
A B = set of indiv. instances of A and of B Slide 96
AND (Intersection)
A
A B
B
Ex: MoleculeContainingCl MoleculeContainingNa Slide 97
OR (Union) Create MoleculeContainingChlorineOrSodium as a subclass of Molecule Add the necessary condition: MoleculeContainingCl MoleculeContainingNa Classify :(
Slide 98
Slide 99
MoleculeContaingCl and MoleculeContainingNa are not recognized as subclasses of MoleculeContainingClOrNa
Slide 100
OR (Union)
A
A B
B
A B = set of indiv. instances of A or of BSlide 101
OR (Union)
A
A B
B
Ex: MoleculeContainingCl MoleculeContainingNa Slide 102
Slide 103
OR (Union)
Molecule containing Cl or Na
A
A B
B
MolecContClOrNa MolecContCl MolecContNa
Slide 104
OR (Union)
. Molecule containing Cl or Na
There could be instances of MoleculeContainingCl (red dot) that are not instances of MoleculeContainingClOrNa... ● ... therefore, MoleculeContainingCl is not subclass of MoleculeContainingClOrNa Slide 105 ●
OR (Union) We said that MoleculeContainingClOrNa is a subclass of the union If we want to say that it is equivalent to the union, we have to use a necessary and sufficient definition
Slide 106
Slide 107
Slide 108
Slide 109
Examples Declare HydrogenChloride to be a MoleculeContainingChlorine Declare SodiumHydroxide to be a MoleculeContainingSodium Declare SodiumChloride to be both MoleculeContainingChlorine and MoleculeContainingSodium Classify :) why isn't SodiumChloride classified as expected ? Slide 110
Slide 111
Slide 112
Slide 113
HCl and NaCl are recognised as containing chlorine NaOH and NaCl are recognised as containing sodium ... but NaCl is not recognised as containing Na and Cl
Slide 114
AND (Intersection) MoleculeContainingChlorineAndSodium
A
A B
B
MolecContClOrNa MolecContCl MolecContNa
Slide 115
Slide 116
Slide 117
Inconsistent classes An inconsistent class is a class that cannot have any instance without violating a logical constraint You can create an inconsistent class...
●
... but the reasoner will tell you
●
Slide 118
Slide 119
Slide 120
Getting in sync! If you need to catchup, the ontology at this point is chemistryv3.owl from: http://www.ea3888.univrennes1.fr/dameron/dils2008/
Slide 121
NEGATION (Complement)
T
A
A Slide 122
NEGATION (Complement) Create NotHalogenNorNobleGasAtom as a subclass of Atom A NotHalogenNorNobleGasAtom is an atom neither a halogen, nor noble gas
Classify
Slide 123
Slide 124
Alkali metals, Lanthanids, ... are not recognised as NotHalogenNorNobleGas :(
Slide 125
NEGATION (Complement) Create NotHalogenNorNobleGasAtom as a subclass of Atom A NotHalogenNorNobleGasAtom is neither a halogen atom, nor noble gas atom Classify Why do we have to provide a Necessary and Sufficient definition ?
Slide 126
Slide 127
Slide 128
NEGATION (Complement) An instance of NotHalogenNorNobleGas is neither an halogen, nor noble gas Why do we have to provide a Necessary and Sufficient definition ? it ensures that all the instances of Atom that are neither instances of Halogen nor of NobleGas are recognised as instances of NotHalogenNorNobleGas
Slide 129
NEGATION (Complement) Note that the reasoner found out that AlkaliMetal, Lanthanid, ... are subclasses of NotHalogenNorNobleGas whereas the definition of the later does not mention any of the former (intentionality)
Slide 130
NEGATION: De Morgan's law We defined NotHalogenNorNobleGasAtom as an Atom that is: not Halogen (and) not NobleGas
We could have defined NotHalogenNorNobleGasAtom2 that is: not (Halogen or NobleGas) Slide 131
NEGATION: De Morgan's law Not (A B) (Not A) (Not B) not(Meat or Fish) neither Meat nor Fish
Not (A B) (Not A) (Not B) not(Cheesy and Veggie) (not Cheesy) or (not Veggie)
Slide 132
Slide 133
Slide 134
3.2 Expressing restrictions
Slide 135
Objective Application of the intensional approach: leverage the expressivity of the OWLDL language for a precise representation of the classes' features ●
We will describe the constituents of molecules and use the reasoner to find out which one contain chlorine and/or sodium
Slide 136
Getting in sync! If you need to catchup, the ontology at this point is chemistryv4.owl from: http://www.ea3888.univrennes1.fr/dameron/dils2008/
Slide 137
Restrictions 1. Quantifier restriction (at least one, all of) How to represent the fact that every organic molecule must have at least a carbon atom ? How to represent the fact that all the atoms of a molecule with no metal must be nonmetalic ?
2. Cardinality restrictions How to represent that a sodium hydroxyde molecule must have 3 atoms as parts ?
3. hasValue restrictions How to define the value of a relation for a class ? Slide 138
Principles A restriction describes an anonymous class composed of all the individuals that satisfy the restriction e.g. all the individuals that have (amongst other things) a carbon atom as part
This anonymous class is used as a superclass of the (named) class we want to express a constraint on e.g. OrganicMolecule Slide 139
Existential restriction (∃ hasAtom Carbon) : set of the individuals being linked to at least one instance of Carbon through the hasAtom property They can be linked to multiple instances of Carbon They can also be linked to instances of other classes (provided domain and range integrity)
OrganicMolecule (∃ hasAtom Carbon)
Slide 140
Existential restriction (∃ hasAtom Carbon) : set of the individuals being linked to at least one instance of Carbon through the hasAtom property They can be linked to multiple instances of Carbon They can also be linked to instances of other classes (provided domain and range integrity)
OrganicMolecule (∃ hasAtom Carbon) Other molecules can also be made of carbon!
Slide 141
Complete the ontology Define OrganicMolecule as a molecule having at least one Carbon part Define MoleculeContainingChlorine as a molecule having at least one Chlorine part Remove the fact that HCl and NaCL are subclasses of MoleculeContainingChlorine!
Slide 142
Slide 143
Slide 144
Slide 145
Slide 146
Slide 147
Slide 148
Getting in sync! If you need to catchup, the ontology at this point is chemistryv5.owl from: http://www.ea3888.univrennes1.fr/dameron/dils2008/
Slide 149
Universal restriction (∀ hasAtom NotCarbonAtom) : set of all the individuals only linked to instances of NotCarbonAtom through the hasPart property Warning: also includes all the individuals linked to nothing through the hasPart property
Slide 150
Universal restriction (∀ hasAtom NotCarbonAtom) Remove the fact that hydrogen chloride, sodium chloride and sodium hydroxyde are subclasses of InorganicMolecule Define InorganicMolecule as any molecule for which all the atoms are different from Carbon (i.e. a molecule containing no Carbon) Classify :( Slide 151
Slide 152
Slide 153
Slide 154
Slide 155
:(
Slide 156
:)
Slide 157
Universal restriction Why hydrogen chloride, sodium chloride and sodium hydroxyde were not recognised as inorganic molecules? (even though their atoms were correctly recognised as NotCarbonAtom) ... find out in a few slides
Slide 158
Getting in sync! If you need to catchup, the ontology at this point is chemistryv6.owl from: http://www.ea3888.univrennes1.fr/dameron/dils2008/
Slide 159
Cardinality restriction MoleculeWithTwoAtoms Pizza (hasTopping = 2)
MoleculeWithFourOrMoreAtoms Pizza (hasTopping ≥ 4)
MoleculeWithThreeOrLessAtoms Pizza (hasTopping ≤ 3)
Warning: This is NOT qualified cardinality restr. Slide 160
Slide 161
Slide 162
Slide 163
:) for PyridylGlycol :( for Glucose
Slide 164
:/ MoleculeWithTwoAtoms is correctly recognized as a subclass of MoleculeWithThreeOrLessAtoms... ... but neither Hcl, NaCl nor NaOH are recognized as a MoleculeWithThreeOrLessAtoms (hint...) Slide 165
3.3 Open world assumption
Slide 166
Getting in sync! If you need to catchup, the ontology at this point is owlReasoningv7.owl from: http://www.ea3888.univrennes1.fr/dameron/dils2008/
Slide 167
Open VS Closed World Reasoning Remember a few slides ago ??? HCl (∃ hasPart H) (∃ hasPart Cl) InorganicMolecule = Molecule (hasPart NonCarbon) Hydrogen and Chlorine ARE NonCarbon So, why is HCl not classified under InorganicMolecule?
Slide 168
Open VS Closed World Reasoning Remember a few slides ago ??? HCl (∃ hasPart H) (∃ hasPart Cl) InorganicMolecule = Molecule (hasPart NonCarbon) Hydrogen and Chlorine ARE NonCarbon So, why is HCl not classified under InorganicMolecule? Because HCl can have other atoms (incl. Carbon) Slide 169
Open VS Closed World Reasoning Remember a few slides ago ??? HCl (∃ hasPart H) (∃ hasPart Cl) InorganicMolecule = Molecule (hasPart NonCarbon) Hydrogen and Chlorine ARE NonCarbon So, why is HCl not classified under InorganicMolecule? Because HCl can have other atoms (incl. Carbon) Remember that it was not classified under MoleculeWithTwoAtoms either Slide 170
Open VS Closed World Reasoning ClosedWorld reasoning Negation as failure Anything that cannot be found is false Reasoning about this world
OpenWorld reasoning Negation as contradiction Anything might be true unless it can be proven false Reasoning about any world consistent with the model Slide 171
Need for closure HydrogenChloride pizzas only have H and Cl for atoms HCl (∃ hasAtom H) (∃ hasAtom Cl) ???
Slide 172
Need for closure HydrogenChloride pizzas only have H and Cl for atoms HCl (∃ hasAtom H) (∃ hasAtom Cl) (hasAtom ???)
Slide 173
Need for closure HydrogenChloride pizzas only have H and Cl for atoms HCl (∃ hasAtom H) (∃ hasAtom Cl) (hasAtom (H Cl))
Slide 174
Need for closure HydrogenChloride pizzas only have H and Cl for atoms HCl (∃ hasAtom H) (∃ hasAtom Cl) (hasAtom (H Cl)) The universal constraint () alone is not enough ! We need both and constraints
Slide 175
Need for closure HydrogenChloride pizzas only have H and Cl for atoms HCl (∃ hasAtom H) (∃ hasAtom Cl) (hasAtom (H Cl)) Same principle for all the other molecules! Exercices: Molecules containing no halogen Molecule not containing any metal Slide 176
Slide 177
Slide 178
Slide 179
Slide 180
Getting in sync! If you need to catchup, the ontology at this point is owlReasoningv8.owl from: http://www.ea3888.univ rennes1.fr/dameron/protegeShortCourse2008/
Slide 181
More fun with cardinality Why are hydrogen chloride and sodium chloride still not classified under MoleculeWithTwoAtoms after adding a closure?
Slide 182
Slide 183
More fun with cardinality Why are hydrogen chloride and sodium chloride still not classified under MoleculeWithTwoAtoms after adding a closure?
Slide 184
More fun with cardinality Why are hydrogen chloride and sodium chloride still not classified under MoleculeWithTwoAtoms after adding a closure? Hints: Why isn't it classified under MoleculeWithThreeOrLessAtoms ? Why isn't it even classified under MoleculeWithFiveOrMoreAtoms ?... Slide 185
More fun with cardinality Why are hydrogen chloride and sodium chloride still not classified under MoleculeWithTwoAtoms after adding a closure? Still... the openworld assumption: imagine one instance of HydrogenChloride having as atoms: one instance of Hydrogen one other instance of Hydrogen one instance of Chlorine
Slide 186
Slide 187
The closure of HCl is not necessary
Slide 188
Slide 189
OWL 1.0 and cardinality The trick we used for HCl cannot be extended to glucose or methan In OWL1.0, we cannot create classes such as “MoleculeWithOneCarbonAndFourHydrogen” because cardinality constraints are not qualified OWL1.1 makes it possible (see in a few slides) Slide 190
More fun with subproperties and closure Modify the definitions of InorganicMolecule and of MoleculeNotContainingAnyMetal using hasPart instead of hasDirectPart in the closures. Why is nothing classified below these classes anymore?
Slide 191
More fun with subproperties and cardinality Modify the definitions of InorganicMolecule and of MoleculeNotContainingAnyMetal using hasPart instead of hasDirectPart in the closures. Why does it break all the cardinality constraints?
Slide 192
Reasoning makes life easier :) Allow to define classes by intension, rather than extension If you add a new named molecule in your ontology, you don't have to worry about “is it a molecule containing chlorine?”, “is it a molecule containing chlorine and sodium?” or “is it an organic molecule?” (and all the possible combinations such as containing chlorine and sodium but not oxygen)... Just describe your new molecule and let the classifier do the work for you
Slide 193
Reasoning makes life easier :) Supports queries such as: What are the molecules containing chlorine? What are the molecules containing halogen? What are the organic molecules ? What are the inorganic molecules ? What are the molecules containing a metal and a halogen but not carbon ?
... it allows you to take advantage of the knowledge you put into your ontology Slide 194
Reasoning tips Proceed by small incremental steps ! be cautious classify often
Use probe classes
Slide 195
Reasoning tips Use probe classes Positive probes: make sure that they are classified at the right place e.g. the class NotCarbonAtom when we were investigating why HCl (without closure) was not inferred to be organic
Negative probes: make sure that impossible or abnormal situations are prevented e.g. ImpossibleClass for checking that nothing can be both chlorine and sodium at the same time Slide 196
Suspicious case: Trivially satisfiable classes Trivially satisfiable class: with an universal restriction which filler is unsatisfiable There is probably something wrong going on These classes are not inconsistent
Ex: molecule AND (forall hasAtom (Cl AND Na)) This is the set of molecules with NO atom! Why ???
Slide 197
Slide 198
Common mistakes Confuse AND and OR (e.g. molecule with chlorine and sodium) Forget disjointness Forget closure Interference from relationships' domain and range Confusion between ( r A) and ( r A) e.g. ( hasChild Male) VS ( hasChild Male) Slide 199
OWL and beyond... OWL 1.1
Slide 200
Qualified Cardinality Restriction ●
●
OWL 1.0 Cardinality restrictions: ●
MoleculeWithTwoAtoms
●
PizzaWithFourOrMoreAtoms
●
PizzaWithThreeOrLessAtoms
OWL 1.1 Qualified cardinality restrictions ●
MoleculeWithFourHydrogen
●
PizzaWithAtLeastTwoCarbon
●
... Slide 201
Slide 202
Slide 203
Slide 204
Slide 205
Additional features for properties ●
Reflexivity ●
●
Irreflexivity ●
●
e.g. knows, isGreaterOrEqualTo e.g. isMotherOf, isGreaterThan
Asymmetry ●
e.g. isAncestorOf, isGreaterOrEqualTo
Slide 206
Slide 207
3.4 Exercice
Slide 208
Exercice Use OWL and the chemistry ontology so that Glucolysis and the subclasses of MetabolicProcess are classified correctly
Slide 209
Glucolysis from Gene Ontology
Slide 210
Asserted hierarchy
Slide 211
Inferred hierarchy
Slide 212
Exercice From Gene Ontology: GO:0019641 (Embden MeyerhofParnass pathway) “The main pathway for anaerobic degradation of carbohydrates. Starch or glycogen is hydrolyzed to glucose 1 phosphate and then through a series of intermediates, yielding two ATP molecules per glucose and producing either pyruvate (which feeds into the tricarboxylic acid cycle) or lactate. [source: http://cancerweb.ncl.ac.uk/] “
If you described your pathways correctly, it should be inferred to be a subclass of Glycolysis! Slide 213
Embden-Meyerhof-Parnass pathway from Gene Ontology
Slide 214
3.5 Summary
Slide 215
Summary ●
Compositional approach
●
Intensional description
●
Reasoning
Slide 216
Summary ●
●
●
Reasoning with taxonomy ●
And, or, not
●
Primitive vs. defined class
Reasoning with properties ●
Quantifier (existential and universal)
●
Cardinality constraints
●
HasValue
Open world assumption Slide 217
Conclusion More fun with OWL (incl. OWL1.1) http://www.coode.org/resources
Some great, reallife examples: Chemical Knowledge for the Semantic Web Mykola Konyk, Alexander De Leon and Michel Dumontier Friday, 03:00pm
Slide 218
Conclusion
Slide 219
Conclusion Bioontologies are already there! They are widely used for annotation They need to be used They need to be integrated They need to be improved
OWL can help ... well, hopefully :) It will also need other technologies (RDF stores, rules,...) Slide 220