a molecule ranking algorithm for mining biological semantic networks

22 downloads 390 Views 4MB Size Report
CrRank: A Multilayered Network Algorithm for Ranking. Social Media users. • Outperform Google's PageRank. • Outperform Common Centrality Measures.
MOLECRANK: A MOLECULE RANKING ALGORITHM FOR MINING BIOLOGICAL SEMANTIC NETWORKS AHMED ABDEEN HAMED, PH.D. K N OW L E D G E E N G I N E E R @ DATA P L AT FO R M S C I E N T I F I C I N FO R M AT I O N M A N AG E M E N T MERCK & CO. AG ATA L E S Z C Z Y Ń S K A , P H . D . P RO D U C T OW N E R / B U S I N E S S A N A LYST S C I E N T I F I C I N FO R M AT I O N M A N AG E M E N T MSD

MAIN RESEARCH QUESTION Problem: •  Precious biological knowledge is cap3vated in literature •  Answering biological ques3ons is not possible without further processing •  Can we design algorithms that provide highly relevant content and provide it fast? § Ex: Given a molecule search query against a literature dataset: can we find the most specific instances?

Sec. 6.2.1

MOLECULE NOTION OF SPECIFICITY 1.  The more aFached a molecule to a given biological feature, the more specific and most most useful 2.  The more knowledge we gather about the molecule, the more we know where it can be ranked 3.  The opposite is also valid

METHODS AND APPROACH •  Given a literature dataset we need the following • A feature selec3on process to extract biological en33es • Using Machine Learning • Ontology

• An expressive Linked Data model • Graph database • Query Mechanism •  Ranking as a Post-processing step

Sec. 6.2

OVERALL ARCHITECTURE Ranking and Outputting Results

Ranked Molecules

Merck Literature PubMed

MolecRank Algorithm

Text Mining Process

Pre-processing Step

Post-processing Step

Network Construction Process

JSON-LD Transformation + Ingestion

RDF4J WorkBench Query Portal

RDF4J TripleStore

Sec. 6.2

SEARCHING PUBMED FOR A DATASET

Sec. 6.2

COLLECTING THE ABSTRACTS AS MEDLINE

Sec. 6.2

BIOLOGICAL FEATURES EXTRACTED

EXPRESSIVE JSON LINKED DATA

Sec. 6.2

VISUAL REPRESENTATION OF THE GRAPH

INGESTING JSON-LD INTO A TRIPLE STORE

FINDING FEATURES FOR PMID:27690219

DISPLAYING/EXPORTING RESULT

QUERYING MORE THAN ONE PUBMED DOC

QUERY RESULT CAPTURING CONTEXT

POST-PROCESSING SPARQL RESULTS •  Expor3ng the query results into a CSV •  Construc3ng a network such as follows

EMERGENT NETWORK

MOLECRANK: MOLECULE RANKING ALGORITHM

CURRENT WORK •  Algorithm is implementa3on phase •  Rigorous experiments •  Fine tuning the JSON-LD data model

PRIOR SUCCESSFUL WORK •  CrRank: A Multilayered Network Algorithm for Ranking Social Media users •  Outperform Google’s PageRank •  Outperform Common Centrality Measures •  Guarantees a unique ranking mechanism to each node in the network

ACKNOWLEDGEMENT •  Adam Sotona: For the Halyard triple store • [“https://merck.github.io/Halyard/tools.html”] •  Mark Schreiber: Director of SIM Data Platform

QUESTIONS

Suggest Documents