Natural Language Processing (NLP) Information retrieval
Recommend Documents
'name of sender', 'title', 'affiliation', 'physical address',. 'phone number', 'home page', or 'motto', were extracted by applying an email sublanguage grammar.
even di erent search engines. ... as search engines for di erent streams. Among the ... representation as well as provides means for subsequent optimization. 3.
we discuss particulars of the present system and some of ... Before we proceed to discuss the particulars of our sys- ..... as such, e.g., U.S.. President Bill Clinton.
1. Introduction. 2. Getting some text. 3. Words. 4. Collocations, etc. 5. NLP Frameworks and tools. 6. ... Plain text, or very simple HTML, which may or may not be.
NLP: the computational modelling of human language. 1. ..... Child: What does a
weasel look like? ...... what kid did you say _ were making all that noise?
Aug 30, 2016 - 1 Health Equity Research Laboratory, Cambridge Health Alliance, ... 1035 Cambridge Street, Suite 26, Cambridge, MA 02141, USA.
search engine or search their email.1 Information retrieval is fast becoming the dominant form of information access, overtaking traditional database-.
Aug 1, 2006 - Online edition (c) 2009 Cambridge UP. An. Introduction to. Information. Retrieval. Christopher D. Manning.
Aug 1, 2006 - IR can also cover other kinds of data and information problems beyond that specified in the core definitio
aspects reflected by different verb forms, are important elements in a sentence for expressing temporal information ... actions may be viewed as complete regardless where they end. Based on the .... To figure out a proper temporal relation for.
and produces tokens such as (names , keywords, punctuation marks ,discards white space and comments ), which are the bas
Aug 1, 2006 - 17 Hierarchical clustering. 377. 18 Matrix decompositions and latent semantic indexing. 403. 19 Web search
Natural Language Processing (NLP) Information retrieval
Computationally oriented introduction to natural language ... To build computer
systems that can process text and speech more .... Jurafsky and Martin,. Speech ...
Introduction to NLP
Introduction to NLP
CS 4740 / CS 5740 / LING 4474 / COGST 4740
CS 4740 / CS 5740 / LING 4474 / COGST 4740
! Instructor: Claire Cardie
! Instructor: Claire Cardie
– Professor in CS and IS (and CogSci)
! Two TAs
– Professor in CS and IS (and CogSci) Computationally oriented introduction to natural language processing, the goal of which is to enable computers to use human languages as input, output, or both. Possible topics include parsing, grammar induction, information retrieval, and machine translation.
– Moontae Lee – Ashesh Jain
! One dog – Marseille (mahr-say)
Natural Language Processing (NLP) !
Natural language – Languages that people use to communicate with one another
Information retrieval ! Web search
! Ultimate goal – To create computational models that perform as well at using natural language as humans do
NL input
understanding
computer
NL output
generation
score
doc 2
score
doc 3
! Immediate goal – To build computer systems that can process text and speech more intelligently
doc 1
score !
Query: leveraged buyouts
information need
doc n
text collection
IR system
score
relevant documents (ranked)
Information retrieval
Question answering (QA)
! Query: (articles on) leveraged buyouts
! Task » How many calories are there in a Big Mac?
! Query: (articles on) leveraged buyouts involving more than 100 million dollars that were attempted but failed during 1986 and 1990
» Who is the voice of Miss Piggy? » Who was the first American in space?
– Retrieve not just relevant documents, but return the answer
! I see what I eat = I eat what I see [Mad Hatter, Alice in Wonderland]
Machine translation ! one of the first applications envisioned for NLP techniques – The spirit is willing, but the flesh is weak. – open
Dialogue-based systems
Why is dealing with NL hard?
! Assistant: Can I help you? ! Customer: I was wondering whether you have any switched brass lampholders. ! Assistant: The brass lampholders are out of stock, but they should be in on Wednesday. The plastic ones are over here...
Ambiguity!!!! !at all levels of analysis "
Why is dealing with NL hard?
Why is dealing with NL hard?
Ambiguity!!!! !at all levels of analysis "
Ambiguity!!!! !at all levels of analysis "
! Syntax
! Semantics
– Concerns sentence structure – Different syntactic structure implies different interpretation » Squad helps dog bite victim. ! [np squad] [vp helps [np dog bite victim]] ! [np squad] [vp helps [np dog] [inf-clause bite victim]]
» Helicopter powered by human flies.
! Phonetics and phonology – Concerns how words are related to the sounds that realize them. Important for speech-based systems. » I scream vs. ice cream » nominal egg » It s very hard to wreck a nice beach. (i.e., It s very hard to recognize speech. )
– Concerns what words mean and how these meanings combine to form sentence meanings. » Red-hot star to wed astronomer. » The once-sagging cloth diaper industry was saved by full dumps.
Why is dealing with NL hard?
Why is dealing with NL hard?
Ambiguity!!!! !at all levels of analysis "
Ambiguity!!!! !at all levels of analysis "
! Discourse
! Pragmatics
– Concerns how the immediately preceding sentences affect the interpretation of the next sentence » Jack drank the wine on the table. It was brown and round. » Jack saw Sam at the party. He went back to the bar to get another drink. » Jack saw Sam at the party. He clearly had drunk too much.
[Adapted from Wilks (1975)]
What topics can we cover? Language modeling Phonetic analysis Morphological analysis Word-sense disambiguation Part-of-speech tagging Parsing Grammar induction Semantic analysis Pronoun resolution Coreference analysis NL Generation Machine translation Dialogue systems Information extraction Information retrieval models QA systems Topic models
– Concerns how sentences are used in different situations and how use affects the interpretation of the sentence. I just came from Collegetown Bagels. » Do you want to go to Collegetown Bagels? » Do you want to go to Gimme Coffee? » Boy, you look tired.
Reference Material ! Required text book: – Jurafsky and Martin, Speech and Language Processing, Prentice-Hall, 2nd edition.
! Other useful references: – Manning and Schutze. Foundations of Statistical NLP, MIT Press, 1999. – Others listed on course web page!
Prereqs, Coursework and Grading ! Prerequisites – CS 2110.
! Grading – 72%: four programming/research projects with short (5-6pg)reports – 20%: critiques of selected readings and research papers – 7%: participation You'll be expected to participate in class discussion and class exercises or otherwise demonstrate an interest in the material studied in the course. – 1%: course evaluation completion http://www.cs.cornell.edu/courses/cs4740/