Sep 27, 2013 - for example, you are interested in the most frequent adjectives in the BASE Corpus ... comparative form ... the superlative form and followed by a common ... Exercise: identify all instances of phrases starting with an adverb ...
Corpus Workshop: USING SKETCH ENGINE Ramesh Krishnamurthy http://acorn.aston.ac.uk/140605CV-RAMESH%20KRISHNAMURTHY.html
Aston University, Birmingham, UK http://www1.aston.ac.uk/lss/staff-directory/krishnamurthyr Friday 27th September 2013 11.00 – 11.30 Towards Operationalizing Corpus Development Plan Howard College Campus University of KwaZulu-Natal
PRESENTATION: Thanks to: Sylvia Jaworska
Sketch Engine • www.sketchengine.co.uk (not freeware)
SPECIAL FREE OFFER FOR THIS WORKSHOP: Thanks to: Adam Kilgarriff • FREE Workshop Licence: up to 50 attendees, 3-month trial licences (until 27 December) • [rather than the standard 30 days; but under the same terms as standard 30-day trials] • At https://the.sketchengine.co.uk/register/, choose the second option: 'site licence member' • 'site licence key' (give to all the people on the course): brotherhood • encourage the whole univ to get a site licence, see price list linked from home page. • PLEASE DISTRIBUTE THE SKETCHENGINE BROCHURES TO COLLEAGUES AND STUDENTS
Sketch Engine • www.sketchengine.co.uk
Sketch Engine • Access to large number of corpora for 42 languages • Frequency Lists • Collocations • Keywords • Word Sketches • Create your own corpus • Build your own corpus from Internet sources (WebBootCat)
Sketch Engine • Access to large corpora for 42 languages
British National Corpus British Academic Spoken English Corpus British Academic Written Corpus SiBol/Port: Corpus of English broadsheet newspapers
Sketch Engine to open a corpus, just click on the link To do word searches, go to the option: Concordance SIMPLE : is the standard query which will match the lemma and all its forms. LEMMA: will again match any lemma but here you can specify the part of speech (PoS i.e. the grammatical class e.g. noun, verb, adjective etc...). PHRASE: will match a phrase e.g. runs away, but not runs away WORD FORM: will match any word form exactly
CHARACTER: matches a character string. For example, ate will match words containing this character sequence., e.g. indicate, material
Sketch Engine: British National Corpus Text types: you can select either spoken or written part of BNC or any other register/ text type
Sketch Engine: Collocations Step 1: Go to Concordance Step 2: Insert the search term Step 3: Click on Make Concordance
Collocations
Sketch Engine: Collocations Select the parameters: • Left and right co-text • Minimum frequency • Test of statistical significance
Sketch Engine: Collocations You can sort your results by clicking on the different statistical measures provided or by frequency Default: sorted by logDice
Here: sorted by T-score
Sketch Engine: Word Sketches o e-page automatic, corpus-based summaries of a o d s grammatical and collocational eha iou Kilgarriff et al., 2004:105). Step 1: select the option Word Sketch Step 2: insert your search term Step 3: specify the part of speech Step 4: Click on Show Word Sketch
Sketch Engine: Word Sketches You can access concordances by clicking on the frequency (in blue)
Sketch Engine: Frequency Lists
Sketch Engine: Frequency Lists For BNC: You can create frequencies of many different categories such as word, lemma, part of speech but also text type.
David Lee's Genre Classification Scheme: http://rdues.bcu.ac.uk/bncweb/manual/genres.ht ml Code W_ac_humanities_arts W_ac_medicine W_ac_nat_science
Description academic prose: humanities academic prose: medicine academic prose: natural sciences
Sketch Engine: Frequency Lists, POS To see the most frequent parts of speech, select the option TAG.
Sketch Engine: Frequency Lists, POS BNC – Annotation CLAWS5
Sketch Engine: Frequency Lists, POS The annotation scheme available is Sketch Engine is CLAWS7
Sketch Engine: Key Words Available under the option Word List Academic Key Words - British Academic Written English Corpus (BAWE) Benchmark – British National Corpus
Step 1: Go to Word List Step 2: Select the Reference Corpus Step 3: Set the SimpleMaths parameter at 1000
Sketch Engine: how to upload your own corpus Go to the option Create Corpus
Step 1: Give your corpus an ID and name it. Select the language from the drop-down menu
Sketch Engine: how to upload your own corpus Step 2: if you want your corpus to be automatically annotated with parts of speech, select the option: TreeTagger for English
Step 3: Sketch Grammar: no need to select it
Sketch Engine: how to upload your own corpus
Step 4: Now, you will need to add your file or files. To do so, select Add new file
Step 5: Select your files
Sketch Engine: how to upload your own corpus
Step 6: You can add subsequent files by selecting Add new file
Sketch Engine: how to upload your own corpus
Step 7: Now, you will need to compile your corpus, select Compile Corpus
Step 8: Click on OK and your corpus should appear in the list under My Corpora on the Sketch Engine Homepage
Sketch Engine: how to upload your own corpus
Step 9: to use your corpus, click on the link and then open it in SkE (in the Sketch Engine Modus)
Sketch Engine, CQL • CQL: Corpus Query Language
search for specific parts of speech or combinations with different parts of speech for example, you are interested in the most frequent adjectives in the BASE Corpus (British Academic Spoken English), then you will need to formulate a query in accordance with CQL TAG SET: http://ucrel.lancs.ac.uk/claws7tags.html
Sketch Engine: CQL • • • • •
Go to Concordance Select CQL from the menu Formulate your CQL query Select TAG from the drop-down menu on the right side (Default attribute) Click Make Concordance
Sketch Engine: CQL How to formulate a CQL query: 1. If you are searching for a part of speech, you will need to know with which tag set you corpus has been annotated, e.g. CLAWS5 or CLAWS7 Looking for adjectives: CLAWS5
[tag= AJO ] fi d all ge e al adje ti es
CLAWS7
[tag= JJ ]
Do t fo get the s ua e a kets a d the uotatio .* - means any character
a ks!
CLAWS5
[tag= AJ.*] fi d all adje ti es i ludi g supe lati e a d comparative form
You can search for words
[ o d= at ]
Sketch Engine: CQL British National Corpus •
tags for parts of speech (CLAWS5) available at: http://trac.sketchengine.co.uk/wiki/tagsets/bnc
Explanation
CQL (CLAWS-5)
A. All general adjectives B. All adjectives preceding the o d
o a
C. All phrases starting with a general adjective followed by any noun D. Instances of phrases starting with an adjective in the superlative form and followed by any word E. All phrases started with any adverb followed by an adjective F. Instances of phrases starting with an adjective in the superlative form and followed by a common noun H. All verbs ending ith .*ed
1. [tag="AJ0"] 2. [tag="AJ.*"] [word="woman"] 3. [tag="AJ0"] [tag="NN.*"] 4. [tag="AJS"] [word=".*"] 5. [tag="AV0"] [tag="AJ0"] 6. [tag="AJ0"] [tag="NN0"] 7. [tag="V.*" & word=".*ed"]
Sketch Engine: British National Corpus Exercise: identify all instances of phrases starting with an adverb followed by an adjective in the spoken and written part of BNC SPOKEN
WRITTEN
Sketch Engine: CQL for BASE and BAWE (CLAWS7) Explanation CQL (CLAWS7)
A. All general adjectives
1. [word: "cat"]
B. All adje ti es J* p e edi g the o d
2. [word="confus.*"]
C. All phrases starting with a general adjective followed by any noun
3. [tag="JJ"] 4. [tag= "JJ"] [tag="N*"] 5. [tag="J*"] [word="woman"] 6. [tag= "R*"] [tag="JJ"] 7. [tag="V.*" & word=".*ed"] 8. [tag= "JJT"] [word= ".*"]
9. [tag= "JJT"] [tag= "N*"]
o a
D. All o ds sta ti g ith confus E. Instances of phrases starting with an adjective in the superlative form and followed by any word F. All phrases started with any adverb (R*) followed by an adjective G. All i sta es of the o d at
H. Instances of phrases starting with an adjective in the superlative form and followed by any noun I. All e s V* e di g ith .*ed
Sketch Engine: CQL the most frequent phrases starting with an adverb followed by an adjective CLICK on NODE FORM BAWE (written academic) BASE (spoken academic)