Natural Language Processing and Text Mining for ...

3 downloads 0 Views 8MB Size Report
Natural Language Processing and Text ... 6. Natural Language Processing: Beispiel Frame Analysis .... Levy, Omer, Yoav Goldberg und Ido Dagan (2015).
Natural Language Processing and Text Mining for BIOfid Giuseppe Abrami, Sajawel Ahmed, Rüdiger Gleim, Wahed Hemati, Alexander Mehler, Tolga Uslu Goethe-Universität Frankfurt Text-technlogy Lab (TTLab)

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

1

Motivation Text sample

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

2

Motivation Semantic tagging

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

3

Motivation „Three-World“ approach

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

4

Motivation „Three-World“ approach

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

5

Motivation Natural Language Processing: Beispiel Frame Analysis

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

6

Motivation Semantic search: Scenarios

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

7

Motivation Semantic search: Scenarios

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

8

Agenda 1. Motivation 2. Natural Language Processing

3. Ontologies 4. Machine Learning

5. Annotation 6. Conclusion

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

9

Natural Language Processing Example: Parsing

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

10

Natural Language Processing Result annotation: UIMA XMI and TEI P5

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

11

Natural Language Processing The BIOfid-NLP-Model

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

12

Natural Language Processing The BIOfid-NLP-Model

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

13

Natural Language Processing TextImager

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

15

Agenda 1. Motivation 2. Natural Language Processing

3. Ontologies 4. Machine Learning

5. Annotation 6. Conclusion

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

16

Ontologies Architecture

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

17

Ontologies User Interface

08.03.2018

1. Sitzung des Wissenschaftlichen Beirats zum FID Biodiversitätsforschung

18

Ontologies User Interface

08.03.2018

1. Sitzung des Wissenschaftlichen Beirats zum FID Biodiversitätsforschung

19

Ontologies User Interface

08.03.2018

1. Sitzung des Wissenschaftlichen Beirats zum FID Biodiversitätsforschung

20

Ontologies Ressources I

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

21

Ontologies Ressources II

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

22

Ontologies Ontology Matching

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

23

Agenda 1. Motivation 2. Natural Language Processing

3. Ontologies 4. Machine Learning

5. Annotation 6. Conclusion

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

24

Machine Learning ML Architecture: Entire model

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

25

Machine Learning ML Architecture: Model components

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

26

Machine Learning ML Architecture: ML Model Class

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

27

Machine Learning ML Architecture: Structural Semantics

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

28

Machine Learning ML Architecture: Structural Semantics

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

29

Machine Learning ML Architecture : Named Entity Recognition

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

30

Machine Learning ML Architecture: Named Entity & Kind Recognition

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

31

Machine Learning ML Architecture: Natural Language Inference (NLI)

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

32

Machine Learning ML Architecture: Natural Language Inference (NLI)

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

33

Machine Learning ML Architecture: Ontology Enrichment

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

34

Machine Learning ML Architecture: Ontology Enrichment

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

35

Agenda 1. Motivation 2. Natural Language Processing

3. Ontologies 4. Machine Learning

5. Annotation 6. Conclusion

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

36

Annotation Requirements analysis

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

37

Annotation TextAnnotator (Helfrich et al. 2018; Abrami & Mehler 2018)

08.03.2018

1. Sitzung des Wissenschaftlichen Beirats zum FID Biodiversitätsforschung

38

Annotation TextAnnotator (Helfrich et al. 2018; Abrami & Mehler 2018)

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

39

Annotation TextAnnotator (Helfrich et al. 2018; Abrami & Mehler 2018)

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

40

Annotation TextAnnotator (Helfrich et al. 2018; Abrami & Mehler 2018)

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

41

Agenda 1. Motivation 2. Natural Language Processing

3. Ontologies 4. Machine Learning

5. Annotation 6. Conclusion

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

42

Conclusion

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

43

Bibliography

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

44

Bibliography

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

45

Bibliography

Abrami, Giuseppe und Alexander Mehler (2018). „A UIMA Database Interface for Managing NLP-related Text Annotations“. In: Proceedings of the 11th edition of the Language Resources and Evaluation Conference, May 7 - 12. LREC 2018. accepted. Miyazaki, Japan. Ahmed, Sajawel (2017). „Steps towards Question Answering with Deep Neural Memory Networks“. http://publica.fraunhofer.de/documents/N-467083.html. Master Thesis. Fraunhofer IAIS, PricewaterhouseCoopers, Goethe University Frankfurt. Batista-Navarro, R., M.A. LaPorte, M. Regan, W. Ulate und Claus Weiland (Sep. 2018). „Workflow of text-mining, term recommendation and data annotation pipeline“. In: Proceedings of ICEI 10th International Conference on Ecological Informatics. submitted. Jena, Germany. Bohnet, Bernd und Joakim Nivre (2012). „A Transition-based System for Joint Part-of-speech Tagging and Labeled Non-projective Dependency Parsing“. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. EMNLP-CoNLL ’12. Jeju Island, Korea: Association for Computational Linguistics, S. 1455–1465. Bowman, Samuel R., Gabor Angeli, Christopher Potts und Christopher D. Manning (2015). „A large annotated corpus for learning natural language inference“. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. Finkel, Jenny Rose, Trond Grenager und Christopher Manning (2005). „Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling“. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. ACL ’05. Ann Arbor, Michigan: Association for Computational Linguistics, S. 363–370. Gleim, Rüdiger u. a. (2018). „Practitioner’s view: A comparison and a survey of lemmatization and morphological tagging in German and Latin“. In: Journal of Language Modelling. submitted. Hemati, Wahed, Tolga Uslu und Alexander Mehler (2016). „TextImager: a Distributed UIMA-based System for NLP“. In: Proceedings of the COLING 2016 System Demonstrations. Federated Conference on Computer Science und Information Systems. Osaka, Japan. Komninos, Alexandros und Suresh Manandhar (2016). „Dependency based embeddings for sentence classification tasks“. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, S. 1490–1500. Levy, Omer und Yoav Goldberg (2014). „Dependency-based word embeddings“. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Bd. 2, S. 302–308. Levy, Omer, Yoav Goldberg und Ido Dagan (2015). „Improving Distributional Similarity with Lessons Learned from Word Embeddings“. In: Transactions of the Association for Computational Linguistics 3, S. 211–225. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado und Jeff Dean (2013). „Distributed representations of words and phrases and their compositionality“. In: Advances in neural information processing systems, S. 3111–3119. Moro, Andrea, Alessandro Raganato und Roberto Navigli (2014). „Entity Linking meets Word Sense Disambiguation: a Unified Approach“. In: Transactions of the Association for Computational Linguistics (TACL) 2, S. 231–244. Naderi, Nona und Graeme Hirst (2017). „Classifying Frames at the Sentence Level in News Articles“. In: Policy 9, S. 4–233. Parsons, Terence (1994). Events in the Semantics of English: A Study in Subatomic Semantics. Bd. 19. Current Studies in Linguistics Series. Cambridge: MIT Press. Pilehvar, Mohammad Taher und Roberto Navigli (2015). „From senses to texts: An all-in-one graph-based approach for measuring semantic similarity“. In: Artificial Intelligence 228, S. 95–128. Remus, Robert, Uwe Quasthoff und Gerhard Heyer (2010). „SentiWS-A Publicly Available German-language Resource for Sentiment Analysis.“ In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Malta. Strötgen, Jannik und Michael Gertz (Juni 2013). „Multilingual and cross-domain temporal tagging“. In: Language Resources and Evaluation 47.2, S. 269–298. Uslu, Tolga, Alexander Mehler, Daniel Baumartz, Alexander Henlein und Wahed Hemati (2018). „fastSense: An Efficient Word Sense Disambiguation Classifier“. In: Proceedings of the 11th edition of the Language Resources and Evaluation Conference, May 7 - 12. LREC 2018. accepted. Miyazaki, Japan. Uslu, Tolga, Alexander Mehler, Andreas Niekler und Wahed Hemati (2018). „Towards a DDC-based Topic Network Model of Wikipedia“. In: Proceedings of 2nd International Workshop on Modeling, Analysis,and Management of Social Networks and their Applications (SOCNET 2018), February 28, 2018. accepted. Wang, Zhen, Jianwen Zhang, Jianlin Feng und Zheng Chen (2014). „Knowledge Graph Embedding by Translating on Hyperplanes.“ In: AAAI. Bd. 14, S. 1112–1119.

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

46

Bibliography How to cite these slides Abrami, G., Ahmed, S., Gleim, R., Hemati, W., Mehler, A. and Uslu, T. (March 2018). Natural Language Processing and Text Mining for BIOfid. Presentation at the 1st Meeting of the Scientific Advisory Board of the BIOfid Project.

BibTex @misc{Abrami:Ahmed:Gleim:Hemati:Mehler:Uslu:2018, author = {Abrami, Giuseppe and Ahmed, Sajawel and Gleim, R{\"u}diger and Hemati, Wahed and Mehler, Alexander and Uslu, Tolga}, title = {{Natural Language Processing and Text Mining for BIOfid}}, howpublished = {Presentation at the 1st Meeting of the Scientific Advisory Board of the BIOfid Project}, adress = {Goethe-University, Frankfurt am Main, Germany}, year = {2018}, month = {March}, day = {08} }

08.03.2018

1st meeting of the Scientific Advisory Board on FID Biodiversity Research

47

Suggest Documents