Corpus Workshop: USING SKETCH ENGINE

0 downloads 0 Views 957KB Size Report
Sep 27, 2013 - for example, you are interested in the most frequent adjectives in the BASE Corpus ... comparative form ... the superlative form and followed by a common ... Exercise: identify all instances of phrases starting with an adverb ...
Corpus Workshop: USING SKETCH ENGINE Ramesh Krishnamurthy http://acorn.aston.ac.uk/140605CV-RAMESH%20KRISHNAMURTHY.html

Aston University, Birmingham, UK http://www1.aston.ac.uk/lss/staff-directory/krishnamurthyr Friday 27th September 2013 11.00 – 11.30 Towards Operationalizing Corpus Development Plan Howard College Campus University of KwaZulu-Natal

PRESENTATION: Thanks to: Sylvia Jaworska

Sketch Engine • www.sketchengine.co.uk (not freeware)

SPECIAL FREE OFFER FOR THIS WORKSHOP: Thanks to: Adam Kilgarriff • FREE Workshop Licence: up to 50 attendees, 3-month trial licences (until 27 December) • [rather than the standard 30 days; but under the same terms as standard 30-day trials] • At https://the.sketchengine.co.uk/register/, choose the second option: 'site licence member' • 'site licence key' (give to all the people on the course): brotherhood • encourage the whole univ to get a site licence, see price list linked from home page. • PLEASE DISTRIBUTE THE SKETCHENGINE BROCHURES TO COLLEAGUES AND STUDENTS

Sketch Engine • www.sketchengine.co.uk

Sketch Engine • Access to large number of corpora for 42 languages • Frequency Lists • Collocations • Keywords • Word Sketches • Create your own corpus • Build your own corpus from Internet sources (WebBootCat)

Sketch Engine • Access to large corpora for 42 languages

British National Corpus British Academic Spoken English Corpus British Academic Written Corpus SiBol/Port: Corpus of English broadsheet newspapers

Sketch Engine to open a corpus, just click on the link To do word searches, go to the option: Concordance SIMPLE : is the standard query which will match the lemma and all its forms. LEMMA: will again match any lemma but here you can specify the part of speech (PoS i.e. the grammatical class e.g. noun, verb, adjective etc...). PHRASE: will match a phrase e.g. runs away, but not runs away WORD FORM: will match any word form exactly

CHARACTER: matches a character string. For example, ate will match words containing this character sequence., e.g. indicate, material

Sketch Engine: British National Corpus Text types: you can select either spoken or written part of BNC or any other register/ text type

Sketch Engine: Collocations Step 1: Go to Concordance Step 2: Insert the search term Step 3: Click on Make Concordance

Collocations

Sketch Engine: Collocations Select the parameters: • Left and right co-text • Minimum frequency • Test of statistical significance

Sketch Engine: Collocations You can sort your results by clicking on the different statistical measures provided or by frequency Default: sorted by logDice

Here: sorted by T-score

Sketch Engine: Word Sketches o e-page automatic, corpus-based summaries of a o d s grammatical and collocational eha iou Kilgarriff et al., 2004:105). Step 1: select the option Word Sketch Step 2: insert your search term Step 3: specify the part of speech Step 4: Click on Show Word Sketch

Sketch Engine: Word Sketches You can access concordances by clicking on the frequency (in blue)

Sketch Engine: Frequency Lists

Sketch Engine: Frequency Lists For BNC: You can create frequencies of many different categories such as word, lemma, part of speech but also text type.

David Lee's Genre Classification Scheme: http://rdues.bcu.ac.uk/bncweb/manual/genres.ht ml Code W_ac_humanities_arts W_ac_medicine W_ac_nat_science

Description academic prose: humanities academic prose: medicine academic prose: natural sciences

Sketch Engine: Frequency Lists, POS To see the most frequent parts of speech, select the option TAG.

Sketch Engine: Frequency Lists, POS BNC – Annotation CLAWS5

Sketch Engine: Frequency Lists, POS The annotation scheme available is Sketch Engine is CLAWS7

Sketch Engine: Key Words Available under the option Word List Academic Key Words - British Academic Written English Corpus (BAWE) Benchmark – British National Corpus

Step 1: Go to Word List Step 2: Select the Reference Corpus Step 3: Set the SimpleMaths parameter at 1000

Sketch Engine: how to upload your own corpus Go to the option Create Corpus

Step 1: Give your corpus an ID and name it. Select the language from the drop-down menu

Sketch Engine: how to upload your own corpus Step 2: if you want your corpus to be automatically annotated with parts of speech, select the option: TreeTagger for English

Step 3: Sketch Grammar: no need to select it

Sketch Engine: how to upload your own corpus

Step 4: Now, you will need to add your file or files. To do so, select Add new file

Step 5: Select your files

Sketch Engine: how to upload your own corpus

Step 6: You can add subsequent files by selecting Add new file

Sketch Engine: how to upload your own corpus

Step 7: Now, you will need to compile your corpus, select Compile Corpus

Step 8: Click on OK and your corpus should appear in the list under My Corpora on the Sketch Engine Homepage

Sketch Engine: how to upload your own corpus

Step 9: to use your corpus, click on the link and then open it in SkE (in the Sketch Engine Modus)

Sketch Engine, CQL • CQL: Corpus Query Language

search for specific parts of speech or combinations with different parts of speech for example, you are interested in the most frequent adjectives in the BASE Corpus (British Academic Spoken English), then you will need to formulate a query in accordance with CQL TAG SET: http://ucrel.lancs.ac.uk/claws7tags.html

Sketch Engine: CQL • • • • •

Go to Concordance Select CQL from the menu Formulate your CQL query Select TAG from the drop-down menu on the right side (Default attribute) Click Make Concordance

Sketch Engine: CQL How to formulate a CQL query: 1. If you are searching for a part of speech, you will need to know with which tag set you corpus has been annotated, e.g. CLAWS5 or CLAWS7 Looking for adjectives: CLAWS5

[tag= AJO ] fi d all ge e al adje ti es

CLAWS7

[tag= JJ ]

Do t fo get the s ua e a kets a d the uotatio .* - means any character

a ks!

CLAWS5

[tag= AJ.*] fi d all adje ti es i ludi g supe lati e a d comparative form

You can search for words

[ o d= at ]

Sketch Engine: CQL British National Corpus •

tags for parts of speech (CLAWS5) available at: http://trac.sketchengine.co.uk/wiki/tagsets/bnc

Explanation

CQL (CLAWS-5)

A. All general adjectives B. All adjectives preceding the o d

o a

C. All phrases starting with a general adjective followed by any noun D. Instances of phrases starting with an adjective in the superlative form and followed by any word E. All phrases started with any adverb followed by an adjective F. Instances of phrases starting with an adjective in the superlative form and followed by a common noun H. All verbs ending ith .*ed

1. [tag="AJ0"] 2. [tag="AJ.*"] [word="woman"] 3. [tag="AJ0"] [tag="NN.*"] 4. [tag="AJS"] [word=".*"] 5. [tag="AV0"] [tag="AJ0"] 6. [tag="AJ0"] [tag="NN0"] 7. [tag="V.*" & word=".*ed"]

Sketch Engine: British National Corpus Exercise: identify all instances of phrases starting with an adverb followed by an adjective in the spoken and written part of BNC SPOKEN

WRITTEN

Sketch Engine: CQL for BASE and BAWE (CLAWS7) Explanation CQL (CLAWS7)

A. All general adjectives

1. [word: "cat"]

B. All adje ti es J* p e edi g the o d

2. [word="confus.*"]

C. All phrases starting with a general adjective followed by any noun

3. [tag="JJ"] 4. [tag= "JJ"] [tag="N*"] 5. [tag="J*"] [word="woman"] 6. [tag= "R*"] [tag="JJ"] 7. [tag="V.*" & word=".*ed"] 8. [tag= "JJT"] [word= ".*"]

9. [tag= "JJT"] [tag= "N*"]

o a

D. All o ds sta ti g ith confus E. Instances of phrases starting with an adjective in the superlative form and followed by any word F. All phrases started with any adverb (R*) followed by an adjective G. All i sta es of the o d at

H. Instances of phrases starting with an adjective in the superlative form and followed by any noun I. All e s V* e di g ith .*ed

Sketch Engine: CQL the most frequent phrases starting with an adverb followed by an adjective CLICK on NODE FORM BAWE (written academic) BASE (spoken academic)

Suggest Documents