Nov 14, 2013 - (Tamil, English, Hindi, Latin, French, German, Sanskrit);. Cambridge, London ... (using a corpus to write/edit dictionaries, grammar/usage books, etc); Birmingham/COBUILD ..... the best statistical procedures to use in order.
Corpus Linguistics: from lexis to discourse Ramesh Krishnamurthy http://acorn.aston.ac.uk/140605CV-RAMESH%20KRISHNAMURTHY.html
Aston University, Birmingham, UK http://www1.aston.ac.uk/lss/staff-directory/krishnamurthyr
Universität Hildesheim November 14th 2013 This universe has been waiting c 13.8bn years = c 435,485,220bn seconds for this moment; there have been 80646.513* bn periods of 90 mins so far; welcome to period 80646.513*4!
2. Overview Some of the ways in which Corpus Linguistics is helping us to: • Describe the English language • reveal the importance of Collocation • develop Language Learning and Teaching
3. Who is Ramesh? (Google me! :)) • A learner and student of LANGUAGES (Tamil, English, Hindi, Latin, French, German, Sanskrit); Cambridge, London (SOAS) • An analyst and describer of ENGLISH (using a corpus to write/edit dictionaries, grammar/usage books, etc); Birmingham/COBUILD • A CORPUS linguist and researcher (literature, sociology, business, politics, translation); Birmingham, Wolverhampton, EU projects • A stand-up performer/storyteller/comedian/teacher? (language learning, academic writing, discourse analysis); Aston
4. Corpus Linguistics: background (1) • Traditional Linguistics: LANGUAGE = SYSTEM cf Physical Sciences, Maths, Engineering • Linguistics = Phonology, Morphology, Syntax • Evidence = Intuition and Introspection • Focus on: the ideal; the possible – Constituents – Structure (Grammar) – Sentence as Unit
• • • •
MEANING? (Semantics = Philosophy) WORDS? (Dictionaries = Lexicography) HUMAN BEINGS? ACTUAL USAGE?
5. Corpus Linguistics: background (2) London University, 1930s: Anthropology >< Linguistics B. MALINOWSKI (Anthropology): Functionalism: (Language) is a mode of action; Context of Situation
J.R. FIRTH (Linguistics): Context of Situation: Participants; Actions (Verbal & Non-Verbal); (physical) Objects; Effects of the Verbal Actions; Modes of Meaning: language event as a whole; levels of analysis... collocation (some words co-occur more often than others)
LANGUAGE = USE: human; activity; in a context; has functions; produces meanings M.A.K. HALLIDAY: Lexico-grammar: lexis = “most delicate grammar”; Lexis as a Linguistic Level; Malinowski’s context of situation = the “Environment of the Text”; Systemic Functional Linguistics: “language serves functions”
6. Corpus Linguistics: background (3) USE = performance; empirical evidence = CORPUS; all levels of language (morphology, words, chunks, text, discourse) J. M. SINCLAIR: Beginning the Study of Lexis (1966); English Lexical Studies (1970) – world’s first computerised corpus of speech (135,000 words), focus on collocation; J.M. Sinclair and M. Coulthard (1975) Towards an Analysis of Discourse; COBUILD (1980>): world’s first corpus-based dictionary, grammar, etc; semantic prosody; Corpus, Concordance, Collocation (1991): Idiom principle and open-choice principle; The Lexical Item (1998): item and environment; verbal environment, textual environment; Trust the Text: Language, Corpus and Discourse (2004)
7. Why use corpora at all? 1. Even “expert speakers” have only a partial knowledge of a language. A corpus can be more comprehensive and balanced. 2. Even expert speakers tend to notice the unusual and think of what is possible. A corpus can show us what is common and typical. 3. Even expert speakers cannot quantify their knowledge of language. A corpus can give us accurate statistics. 4. Even expert speakers cannot remember everything they know. A corpus can store and recall all the information that has been input. 5. Even expert speakers generally find it difficult to make up natural examples. A corpus can provide us with a vast number of real examples.
8. Why use corpora at all? (cont.) 6. Even expert speakers have prejudices and preferences (and every language has its cultural connotations and underlying ideology). A corpus can give you more objective evidence. 7. Even expert speakers are not always available for consultation. A corpus can be permanently accessible. 8. Even expert speakers cannot keep up with language change. A frequently updated corpus can reflect even recent changes in language. 9. Even expert speakers lack authority – they can be challenged by other expert speakers. A corpus can include the actual language use of many expert speakers. etc etc
9. Why not use the Web and Search Engines? Web and Search Engines
Corpora
SIZE
Vast
Manageable
PROCESSING SPEED
Slow
Fast
ANALYSES
Coarse-grained, General
Fine-grained, Detailed, Specific
CONTENT RANGE
No Overview; Diffuse, Uncategorized
Selected, Documented, Categorized
CONTENT STABILITY
Volatile/Dynamic: cannot replicate analyses
Stable: can replicate analyses
CONTENT QUALITY
uncontrolled
Controlled by selection
complex, ‘black box’
simple, fully documented
SOFTWARE
10. Developments in corpus linguistics • Improvements in computer hardware: eg Sinclair (1965, Edinburgh) used Atlas computer at Manchester Uni; COBUILD (1984-1997) used PDP-11, ICL Mainframe, IBM workstations, Sun workstations, PCs; laptops... >Tablet? Cell phone?
• Improvements in computer software: eg word frequency lists, concordances, collocation lists and profiles, n-grams, keywords, POS-tagging, parsing, word sketches, semantics, thesaurus, sentiment, risk, emotions; audio, video; WordClouds and other graphic/visual/interactive displays; AntConc; WordSmith Tools; SketchEngine
• Increase in corpus sizes: Sinclair (1965; 135,000); Brown Uni (1967; 1m);
Birmingham (1986: 18m); BNC (1993: 100m); Bank of English (2002: 512m); GloWbE (2013: 1.9bn)
• Increase in corpus access: – – – – –
http://www.uow.edu.au/~dlee/CBLLinks.htm http://www.hit.uib.no/corpora/sites.html http://www.michaelbarlow.com/ http://corpus.byu.edu/ https://www.sketchengine.co.uk/documentation/wiki/Website/LanguageResourcesAnd Tools
• Increase in coverage of languages & text types: eg English > European, Asian, African languages; eg written, spoken, historical, literature, media, business, language learners, translation, politics, advertisements, emails, blogs, social networks; topic-based corpora (climate change, immigration, feminism, etc)
11. Corpus Tool 1: WORD FREQUENCY (= importance) In Descending Frequency order: Most frequent words
BCET (1986:18 Million words)
Bank of English (2002:418 M-words)
the
1,081,654
22,849,031
of
535,391
10,551,630
and
511,333
9,787,093
to
479,191
10,429,009
a
419,798
9,279,905
in
334,183
7,518,069
that
215,332
4,175,495
s
(8,570)
4,072,762
is
(166,691)
3,900,784
it
198,578
3,771,509
12. FREQUENCY: grouping items RANK 1 2 3 4 5 6 7 8 9 10 11 12 13 14
FREQ 577 319 270 251 204 197 178 168 167 166 156 143 107 101
TYPE the and a to she of I Alice it was in said her you
RANK 23 68 125 146 250 429 402
FREQ 58 22 14 12 7 4 4
TYPE little large high low small top middle
SUB-TOTAL
121
26 30 173 191 203 204
down up size side grow growing
TOTAL
50 41 10 9 8 8
257
13. FREQUENCY: WordCloud: Politics: BBC website (2008)
14. FREQUENCY: WordCloud > World Population http://en.wikipedia.org/wiki/File:Word_population_tagcloud_2011.png
15. Corpus Tool 1: WORD FREQUENCY (= importance) In Alphabetical order: HOW COMMON OR RARE IS THE ITEM? ARE YOU A ... TEACHER? ... STUDENT? ... LINGUIST?
receivability
5
receivable
147
receivables
182
receival
2
receivals
3
receive
29,942
received
48,309
receiver
3,129
receivers
1,311
receivership
737
receiverships
28
receives
4,705
receiveth
2
receiving
11,340
WHICH ITEMS SHOULD YOU ... TEACH? ... LEARN? ... ANALYSE?
HOW MUCH TIME/EFFORT SHOULD YOU SPEND ON EACH ITEM?
16. Corpus Tool 2: CONCORDANCE (= cotext) however, have trimmed its interest receivable figure by £22 tell whether accounts receivable had been paid Word couldn't class offset by an increase in accounts receivable. Later, when would routinely tell her they had Lexical dollars of credit-card relations The value of debtors (or foreign-currency payables against
receivables of thousands receivables it has so fa receivables) has to be p receivables. This type
a glance at Andrei. Replacing the receiver, she spoke hurr of the dining room, picked up the receiver. `Hello," she s collocation want it done now!" He replaced the receiver then walked Kol the path of incoming flights. A to a nearby in on-board waveband with other transmitters,
receiver on an aircraft receiver -- even an FM receivers and computers. receivers and communica
we had anticipated to appoint a to creditors. The Official phraseology Pensioners are keen to bring in the A syndicate of banks called in the
receiver who may well Receiver - part of the receivers to protect wha receivers over debts tot
Lexical room's conversations domains and causing `static"
17. Corpus Tool 3: COLLOCATION LIST (> ‘chunks’) e.g. hard A. From frequency...
B. To statistics... [e.g. t-score]
= co-occurrence
= ‘significant’ collocation
to
58444
it
37451
146.778704
the
39225
to
58444
136.672099
it
37451
work
11378
99.794844
and
27875
very
7297
73.848184
a
24531
is
18314
64.938617
is
18314
find
4915
64.513063
s
16947
working
4347
61.992590
of
14980
worked
4039
61.399611
for
11462
s
16947
55.243460
work
11378
so
5902
49.255196
Too many ‘corpus-frequent’ words
Calculates ‘actual vs expected’ frequency: ... eliminates many ‘corpus-frequent’ words ... retains genuinely significant ‘corpus-frequent’ words ... promotes other ‘more than expected’ words ...
18. Corpus Tool 4: COLLOCATION PROFILE e.g. hard it have but will i has had would he hit they find going
it find is worked s work found was will would been be have
is very s a so be it too work worked was working how
NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE
to work for working time line on pressed core enough currency times hitting
believe get imagine find see say beat tell keep understand come make work
how that but what why it heels to them follow up because whether
WITHIN A FEW MINUTES, CORPUS USER WILL NOTICE: LEXICO-GRAMMAR: (impersonal) it, hard+to-infinitive, hard+for/on (sb?); PHRASEOLOGY: hard + time(s), line, core, pressed, currency, hitting; IDIOMATICITY: hard (on the?) heels (of?); PRAGMATICS: advice, excuses, warnings; MODALITY: will/would, degree (very, so, too, how) SEMANTICS: noun objects (work/time/currency); verb sets (perception: believe /imagine /see /understand; practical: get, find, say, beat, tell, keep, come, make)
19. Corpus Tool 5: N-grams = fixed collocation 1-gram 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2
1527 802 725 615 545 527 509 500 456 395 356 352 345 275 246
the and to a I it she of said Alice in was you that as 2 1
2-grams 1 2 3 4 5
206 127 111 92 74
said the of the said Alice in a and the
3-grams 1 2 3 4 5
47 30 28 27 21
4-grams
the Mock Turtle I don t said the King the March Hare said the Hatter
“Alice in Wonderland”
1 2 3 4 5
18 10 10 7 7
said the Mock Turtle she said to herself you won t you a minute or two said Alice in a
1 2
4 4
Will you won t you will you won t you won t you will you won t you
9-grams
16-grams Will you won t you will you won t you won t you join the dance a body to cut it off from that he had never had to do such a
20. Corpus Tool 5: N-grams UK Conservative Party Manifestos: UK Labour Party Manifestos: UK Liberal Party Manifestos: Party
4-gram
Rank
the cost of living
6
Z
the cost of living
8
Y
the cost of living
22
X
the rule of law
14
X
the rule of law
37
Z
38
Y
the rule of law
From 1900 to 2001 From 1900 to 2001 From 1945 to 1992
4-gram
Rank
Party
the long term unemployed 16
Z
the long term unemployed 80
Y
the long term unemployed 329
X
the men and women
39
Y
men and women of
40
Y
men and women who
79
Z
the National Health Service
17
Y
for men and women
83
Z
the National Health Service
23
X
men and women of
246
X
the National Health Service
123
Z
for men and women
261
X
21. Corpus Tool 5: N-grams: Michaela Mahlberg (Clusters in DICKENS, 2007)
22. Google Books: N-gram Viewer • Copy-and-paste this URL into a browser: • https://books.google.com/ngrams
23. Corpus Tool 6: Keywords • = the WORDS that have the most significant frequency difference between 2 corpora • NB there is currently much discussion about the best statistical procedures to use in order to achieve this; in particular, current methods do not take account of ‘effect size’
24. Corpus Tool 6: Keywords: UK ‘riots’ 2011 (Aug 7>) riot in UK newspaper articles: BEFORE (Aug 1-5: 346); DURING (Aug 10: 500); AFTER (Aug 27: 249) BEFORE (compared to DURING)
DURING (compared to BEFORE)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 FESTIVAL PLAY MUBARAK THEATRE SPRAOI HIS YEARS TRIAL WATERFORD DIRECTOR COMMISSIONED CONTEMPORARY A AWARD
LONDON POLICE RIOTERS WEDNESDAY BIRMINGHAM LOOTERS LOOTING DAILY RIOTS SHOPS YOUTHS NIGHT SHOP STREETS TELEGRAPH
25. Corpus Tool 6: Keywords: UK ‘riots’ 2011 (Aug 7>) DURING (compared to BEFORE) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LONDON POLICE RIOTERS WEDNESDAY BIRMINGHAM LOOTERS LOOTING DAILY RIOTS SHOPS YOUTHS NIGHT SHOP STREETS TELEGRAPH
DURING (compared to AFTER) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
POLICE WEDNESDAY # NIGHT LONDON DAILY SAID SHOPS VIOLENCE Â TELEGRAPH WERE VIDEO REPORT FIRE
26. Corpus Tool 6: Keywords: UK ‘riots’ 2011 (Aug 7>) DURING (compared to AFTER) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
POLICE WEDNESDAY # NIGHT LONDON DAILY SAID SHOPS VIOLENCE Â TELEGRAPH WERE VIDEO REPORT FIRE
AFTER (compared to DURING) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I SATURDAY YOU CARNIVAL A MY IS YEARS THAT HOW HOMELESS PLYMOUTH Ñ OR PRISON
27. COBUILD: INITIAL LEXICAL DISCOVERIES • the most frequent content words: e.g. time, people [Cobuild, 1987] • frequent verbs are mostly used ‘delexically’: e.g. take a bath/an exam, make a decision/mistake, have a rest/an idea [Cobuild, 1987] • see mostly means ‘understand’ (esp. in spoken): e.g. I see / You see • thing, this, that mostly refer to abstract items (event, reason, etc), NOT physical objects: e.g. A strange thing happened… Is that why you had a few days off?... This is why I'm opposed to the plan [Willis, 1990] • -ly adverbs are NOT used in all meanings of the adjective: e.g. lamely for excuses (not walking), crisply with speech verbs [Cobuild; cf ~ in other dictionaries] • of is NOT a preposition: not used in adverbial phrases, but in noun phrases: e.g. slice of bread, waste of time [Sinclair, 1991] • identifying new words: e.g. 1985 corpus = 0 > 1995 corpus = camcorder (1214), virtual reality (458), imaging (463), mobile phone (455), satellite dish (236), laptop (184), videophone (144), palmcorder, smart card, microsurgery, teleworker, email, helipad, hypertext [Cobuild 1995]
28. Effect of corpus on dictionary • INCLUSION/EXCLUSION OF HEADWORDS • TRADITIONAL DICTIONARY ENTRY: en·crust (ĕn-krŭst') also in·crust (ĭn-) tr.v., -crust·ed, crust·ing, -crusts. 1. To cover or coat with or as if with a crust 2. To decorate by inlaying or overlaying with a contrasting material • CORPUS EVIDENCE: encrust 9 encrusts 1 encrusting 13 encrusted 645 • CORPUS-BASED DICTIONARY: • Adjective headword – Stative (not dynamic) – Gradable
29. Collocation • Collocation is the ‘lexical realisation of the situational context’ Sinclair (1987) • often, ‘no particular group of collocates occurs in a structured relationship with the word’ (ibid) [i.e. NOT a grammatical feature] • For many years, I could not understand the meaning of Firth’s statement: ‘one of the meanings of night is its collocability with dark’... • I suddenly realised what I needed to do: check the statement using corpus techniques!
30. dark night
31. Collocation disambiguates near-synonyms
32. Collocation disambiguates parts of speech: file as VERB file as NOUN collocate to for bankruptcy complaint against suit charges will returns lawsuit a
frequency as collocate 669 187 52 45 54 41 39 66 30 28 249
t-score significance 18.45 8.84 7.19 6.69 6.64 6.34 6.14 5.60 5.43 5.28 5.27
collocate rank fact tape on and single a feature from the archetype
frequency as collocate 442 178 150 398 934 96 801 75 227 1739 51
t-score significance 21.00 12.72 12.14 11.24 11.08 9.30 9.06 8.54 7.68 7.24 7.14
33. Collocates: Transitivity and Semantic Prosody: build up (2 billion words: http://corpus2.byu.edu/glowbe/ ) 1-4 words to the LEFT
1-4 words to the RIGHT
VERBS and ADVERBS
NOUNS
1 HELP
14 FLUID
2 TRYING
15 TOXINS
3 GRADUALLY
16 BACTERIA
4 SLOWLY
17 PLAQUE
5 TAKES
18 ACID
6 SLOW
19 STRESS
6 RELATIONSHIP
7 HELPED
20 ICE
7 PORTFOLIO
8 PRESSURE
21 CARBON
8 SKILLS
9 ALLOWED
22 DEPOSITS
9 BASE
10 HELPS
23 DUST
11 PREVENT
24 CELLS
12 HEAT
25 TENSION
13 HELPING
26 EXERCISE
NOUNS
[animate] + builds up + Y POSITIVE
1 PICTURE 2 CONFIDENCE 3 STRENGTH 4 BODY 5 RESERVES
10 MUSCLES 11 TRUST 12 NETWORK 13 CAPACITY
34. Semantic Prosody • “seemingly neutral words can be perceived with positive or negative associations through frequent occurrences with particular collocations” • J.M. Sinclair (1987) Looking Up – e.g. rot, rain, fog, night, decline, depression, etc + set in
• “consistent aura of meaning with which a form is imbued by its collocates” (Louw, 1993: 157) – e.g. bent on, utterly, symptomatic of, without feeling
• Louw, W. E. (1993) Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In: Text and technology. In honour of John Sinclair. Ed. M. Baker, G. Francis and E. Tognini-Bonelli, 152—176. Amsterdam: John Benjamins
35. Semantic Prosody of bent on: (from Bill Louw’s 1993 article; new data from http://corpus2.byu.edu/glowbe/ )
Right-hand collocates – and examples of the“seemingly neutral words“ DESTROYING DESTRUCTION MAKING
the few ruling class are bent on making life difficult for the poor
REVENGE GETTING
If you are just bent on getting rid of her and do not care about
TAKING
SCP-079, an Artificial Intelligence bent on taking over the facility.
DOING
bent on doing everything possible to introduce a silly culture of
KILLING DOMINATION KEEPING
scientists hell bent on keeping their funding gravy train rolling
CREATING
he is bent on creating a new world where he is supreme ruler
BRINGING
bent on bringing down Zubiri from the crest of moral ascendancy
CAUSING
+ trouble, mayhem, hysteria, havoc, chaos, confusion, destruction
36. Collocations can be culture-specific • spider & bath: in BNC (British) – not in other European languages? (Hanks 2013: c134-9) • GloWbE: 21: US 1, NZ 1, Singapore 1, GB 18
37. From collocation to prefabricated chunks (the “idiom principle”): naked eye • Sinclair JM. 1996. The Search for Units of Meaning. Textus IX: 75-106
45. new broom: fixed expression reduced to chunk COBUILD Dictionary (1987): broom... 1 A broom is a kind of brush with a long handle. You use a broom for sweeping the floor. 2 Broom is a wild bush with a lot of thorns and tiny yellow flowers which grows on waste ground or sandy ground.
new broom… A person who has just started a new job and who intends to make a lot of changes may be referred to as a new broom; an informal expression.
46. new broom: the development 1. There are examples of new broom meaning a person: ..But he is hardly a new broom... ...Adrian Shinwell, president and new broom at the Scottish Conservative and Unionist Association... 2. One or two retain sweep clean from the proverb: ...The new broom is determined to sweep clean. But she never expected... ...They're still feeling their way, Alec, and new brooms always want to sweep cleaner than the old... 3. But many examples show that new broom can now also refer to the policies or changes carried out by the new appointee: ...Heseltine's new broom has yet to show its bristles. ...lending enthusiastic strength to the hand that pushes the new broom. 4. sweep can be re-introduced as a pun: ...The new broom that swept in with the new season is held by Tony Russ... ...his new broom has already swept into some dark corners of the Asda empire. 5. new can be separated from broom: ...Since taking helm back in March, the new chief has wielded the broom through the business and its debts... 6. new can be omitted altogether and replaced by the person making the changes: ...as the Kinnock broom swept clean in Walworth Road... 7. The nature of the changes can be specified by replacing new: ...Bishko' s rationalisation broom has swept through the 253-strong operation... 8. new broom can be used like an adjective, in front of another noun referring to a person or a policy: ...the company's new broom chief executive... ...as a result of the `new broom' policy of Ian Rock, the managing director...
47. How do we learn L1? How do we teach L2? • We learn our L1 by experiencing thousands of EXAMPLES IN THE LANGUAGE (i.e. INDUCTIVELY), in a wide range of places and situations, from a variety of sources, over a long period of time – focus on PROCESS: L1 language learning, riding a bicycle, driving a car, parenting – Aristotle (350BC/1908, Nicomachean Ethics II,1, Oxford; “It [moral virtue], like the arts, is acquired by repetition of the corresponding acts... the things we have to learn before we can do them, we learn by doing them”
• We teach an L2 using abstract INFORMATION ABOUT THE LANGUAGE - grammar rules, dictionary definitions (i.e. DEDUCTIVELY) - with very few examples, from few sources, mostly in a classroom, over a much shorter time period – focus on PRODUCT: L2 language learning – Willis (2003:1) “what is taught is often not learnt, and learners often ‘learn’ things which have not been taught at all”
48. Why do we learn an L2? • Changes in goals of language learning: – – – –
knowledge religion, trade, politics literature, culture, travelling abroad study abroad, migration, work (home or abroad), tourism, business (multinational companies), globalised products (planes, electronics, etc), translation, interpreting
• Other changes: – – – – –
population increase, population mobility - geographical & social economics/globalisation/recession/market forces pressures on education systems heterogeneous students - age, background, L1, etc teaching methods and resources - no longer adequate, sufficient, appropriate – rise of English as global language – changes in language relationships, endangered languages
49. Changes in methods and resources • focus: from classroom to real world • materials: from made-up to authentic texts • from passive to active learning • from teacher to student focus • from chalk & paper to computers • Corpora are well-suited to these shifts
50. Advantages of corpora • • • • • • • • •
exposure to varied authentic materials intensive/condensed linguistic experience always up-to-date constantly raising language awareness encouraging learner autonomy and research focus on common and typical but shows variations lifelong learning transferable skills flexibility – accessible anywhere, any time
51. LANGUAGE TEACHING: deductive teaching •
http://www.bbc.co.uk/worldservice/learningenglish/grammar/learnit/learnitv230.shtml
•
Yee from Hong Kong writes: I'm not sure how the suffix -ness works. Can we add -ness to all types of words to make nouns? Roger Woodham replies: -ness is one of a number of noun suffixes. It is used to make nouns from adjectives, although not every adjective can be modified in this way. Here are some common adjectives whose noun forms are made by adding -ness: happy sad weak good ready tidy forgetful Note the spelling change to adjectives that end in -y: – Everybody deserves happiness in their life. To be happy is a basic human right. – There was a lot of sadness in the office when people learned of his illness. – His readiness to have a personal word with everybody at the funeral was much appreciated. – He is such a forgetful person. Such forgetfulness cannot be excused. – If you want to work for such an organisation, you are expected to maintain a high standard of tidiness in your appearance.
•
52. FREQUENCY: facilitating inductive learning •
http://www.morewords.com/most-common-ends-with/ness/
•
from the Enable2k North American word list, which uses “Word Frequencies in Written and Spoken English: based on the British National Corpus” (Lancaster University) • The most frequently occurring words ending with –ness • Frequency lists can be used for INDUCTIVE LEARNING: business
752
effectiveness
21
witness
80
madness
16
awareness
72
sadness
16
illness
71
goodness
15
darkness
68
thickness
15
consciousness
52
bitterness
14
weakness
44
harness
14
happiness
34
kindness
14
fitness
32
wilderness
14
sickness
24
loneliness
12
53. CONCORDANCE (1): “gapped concordance” exercise to make a sturdy coffee ------------------------ with plenty of room for others who sit round her ------------------------ include the business men MACPHERSON : Now on the ------------------------ here we've got what looks a loveseat, a coffee ------------------------ two large cardboard suitcases lay open on a ------------------------ The silver had already 1818? [p] Answer: From ------------------------ C, we find the year 1816 I lowered my tray ------------------------ from the back of the seat end of the dark shiny ------------------------ and toasted each other in the size of a snooker ------------------------ Another room upstairs is
54. more about CONCORDANCE (2): “gapped concordance” exercise ...- so it's generally best to feed a small …………………… twice a day. Similarly, growing an ...fascinate me. You can tell how a …………………… is feeling by his tail. If his tail is up, ...of days later I'm watching TV and this …………………… food commercial comes on an it's the ...Year. For what remains of the Year of the ……………………, the Tiger should remain committed 7 ...he was with a woman walking a black ……………………. On neither occasion did the man speak to ...I'm prepared if a policeman sees me with my …………………… off a lead in a public place
55. ERROR ANALYSIS • Teachers analyse students’ errors (Fei-Yu Chuang 2005) • a corpus of 50 essays written by Chinese EAP (English for Academic Purposes) foundation students • 5232 errors identified. Most frequent errors were: • • • • • • • • • •
(1) Missing definite article (2) Bare singular count noun for plural (3) Redundant definite article (4) Mis-selection of preposition (5) Lexical misconception (6) Wrong tense and aspect (7) S-V non-agreement (8) Wrong collocation (9) Missing ‘a’/’an’ (10) Comma splice
10.1% 8.8% 8.5% 6.1% 5.8% 3.8% 2.4% 2.1% 2.0% 2.0%
55
56. TEACHING • Language Learning & All Subjects • From National Education Planning to Individual Institutions and Teachers • Syllabus/Course design: lists of words/topics required at different levels; domain-specific definitions; common errors; Lexical Syllabus • Coursebooks: design, examples • Software development; spellcheckers; grammar checkers; • Classroom use: produce teaching materials; students improve academic writing • Examination Boards: design and marking • Detecting plagiarism: software compares student texts, academic articles, internet sources
57. How can corpora help? THE MODEL • Input Syllabus – CORPUS 1: Expert/Input Material (= ‘target’) • • • •
Drafting academic writing Feedback Editing and revision Assessment
– CORPUS 2: Student/Output Material – Contrastive Corpus Analyses of Input and Output
• Revised Syllabus
58. A PEDAGOGICAL CORPUS (Kosem & Krishnamurthy 2007) (chemistry, education, etc)
(chemistry, education, etc)
DOMAIN
DOMAIN
(lecture, discussion, etc)
(article, essay, etc)
GENRE
GENRE
expert
LEVEL
learner SPOKEN
WRITTEN MODE
59. TEACHING: ACADEMIC WRITING (Kosem & Krishnamurthy 2007) General English: say = “speak” http://acorn.aston.ac.uk/
Academic English: say = “write” http://acorn.aston.ac.uk/
argue = “quarrel” (General English); = “propose/claim” (Academic English) http://corpus.byu.edu/bnc/
60. How can students improve their writing skills? • Focus on Product = Neglect of Process? – – – – –
Exam results, League tables Marking systems (class distribution) Equality (irrespective of motivation/performance) Increasing instrumentality in attitudes to education Grenfell and James (2004:510): academic products structure practice
1. Read more journal articles and books? – student time; slow
2. Receive more instruction? – teacher time; slow
3. Receive more corrections/feedback? – students don’t read/understand/process (passive)
4. Corpora? – Learner autonomy; active/inductive/’discovery’ learning – student time; quicker?
61. A CASE STUDY • CONTEXT: Computer Science Student; 3rd year = placement with me; Chinese native-speaker; came to UK in 2002; studied English for 9 months; 2 years of A-levels (Maths, Chinese, Physics); submitted weekly 1-page report on his work (corpus programming) • AIMS: help him improve his English & write better reports; test the corpus system & improve the software; understand pedagogic implications of this method • PROCEDURE: started informally; worked well; started to preserve data – RAMESH: 2-3 minutes highlighting items (‘errors’) in green – STEVEN: • 5-10 minutes correcting 10-15% ‘silly mistakes’ • 30 minutes checking corpus, correcting 60-70% other errors
– R&S: • 15 minutes discussing 15-20% ‘complex’ items • 15 minutes discussing Chinese/English, corpus software, search procedures
• EXAMPLES of highlighted items: – I will take a deep look into it next week… I replied him… He was not an expert with MySQL… The PHP engine on the server might out put an error message.
62. Corpus & discourse • language is the core of human activity, and the basis for decisions; we are presented with varying hypotheses and scenarios from science, politics, business, society, etc; but language/discourse is crucially involved in the decisions we make and the ways in which they are implemented
63. After Smith, N. V. (2002) How to be the centre of the universe. In Language, Bananas and Bonobos: Linguistic Problems, Puzzles and Polemics. Oxford: Blackwell [http://www.llas.ac.uk/resources/gpg/98
Language & Linguistics
64. Corpus Linguistics enables: • • • • • •
quantitative analyses of texts a more objective basis for qualitative analyses more reliable/verifiable statements more generalisable statements new insights into basic themes and threads discovery of aspects missed by qualitative approaches alone, e.g. – https://www.upworthy.com/the-one-video-iguarantee-youll-watch-twice?c=ufb1
65. FEMINISM
Corpus Analysis: Bank of English Collocation list by t-score of 1211 women 204 and 965 radical 131 feminism 110 has 248 is 415 wave 70 lesbian 64 political 76 post 65 feminist 58 anti 55 socialist 42 her 123 modern 45 men 55 about 115 politics 40 new 96 second 54
16.568256 13.460741 12.321319 11.360602 10.471986 10.450125 9.037262 8.253780 7.905844 7.788159 7.620464 7.556627 7.044710 6.375852 6.322697 6.261655 6.019111 5.981272 5.944469 5.725465 5.723481
a) political: radical, political, post, anti, socialist, politics, movement, Marxism, therapy, theory, liberal, minority, gender, black b) sexual: lesbian, gay, lesbianism c) qualitative/evaluative: wave, modern, new, social, equality, backlash, liberation, contemporary (?) d) western (?) e) second (?), scale, rise f) 1970s
66. FEMINISM Corpus Analysis: IDS corpus, Mannheim Collocates of ‘Feminismus’ categorised into 8 thematic clusters Category
Collocates
political movements/ ideas
radikal (8) Ökologie (16) kämpferisch (8) Sozialismus (11) Vertreter (11) militant (7) Kommunismus (10) Marxismus (3) existieren real (6) extrem (3) Pazifismus (3) Ideologie (6) politisch (15) Nationalismus (3) Antisemit (3) Revolution (3)
sex, gender roles/ body
Frau (119) Machismo (10) Sexismus (10) Geschlecht (11) männlich (14) feminin (5) weiblich (12) sexual (6) schwul (4) sexuell (3) sexy (3) Weib (3) Mutter (5)
feminist ideas, Emanzipation (13) Ikone (12) Gleichberechtigung (20) neu (61) feministisch (10) Feministin (6) Wegbereiter (3) Schwarzer Alice (6)
academic/ arts/ literature
Pusch (12) Thema (43) Basis (5) akademisch (7) Theorie (6) Geschichte (9) Diskussion (5) Autor (3) denken (5) Philosophie (4) Kampf (5) Werk (4) Ansatz (3) Vortrag (3) schreiben (3) Kritik (3) Kultur (3) Medium (4) Frage (4) lernen (3) Wut (3)
places
amerikanisch (11) westlich (7) irisch (3)
time
70er (7) siebziger Jahr (4) Zeit (5) Jahr (14) Jahrzehnt (13) heute (6)
religion
Schabbat (3) Bibel (3)
social trends
Konsumgesellschaft (7) Esoterik (3) sozial (4) Pop (5)
67. Search terms used in Nexis UK database to select articles for inclusion in Climate Change corpus – US and UK: climate change / global warming / greenhouse effect – FR: changement climatique / effet de serre / réchauffement de la planète / réchauffement climatique – GE: Klimawandel / globale Erwärmung / Treibhauseffekt / Klimakatastrophe
68. Climate change in the media over time Number of articles / news items 70000 60000 50000 40000 30000 20000 10000 0 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20
UK
US
GE
FR
69. (edited) FREQUENCY lists – most frequent content words: ENGLISH US
UK
US
climate change
UK
US
global warming
UK
greenhouse effect
climate
climate
global
global
global
greenhouse
new
change
warming
warming
warming
effect
warming
energy
new
new
greenhouse
years
energy
new
energy
climate
new
energy
change
people
people
people
years
global
years
years
years
years
carbon
new
emissions
mr
president
change
effect
carbon
environmental
global
state
world
climate
world
state
world
bush
energy
energy
climate
people
us
climate
last
dioxide
per
percent
government
environmental
pg
environmental
warming
states
carbon
time
type
scientists
people
president
pg
world
mr
percent
dioxide
greenhouse
emissions
percent
world
change
carbon
uk
states
people
us
world
environment
emissions
time
time
70. (edited) COLLOCATION lists – FR / GE content word collocates (+/-5 words) changement
réchauffement
serre
Changement
Réchauffement
serre
climatique
climatique
effet
contre
contre
gaz
réchauffement
Klimawandel
Erwärmung
Treibhauseffekt
KlimaKatastrophe
réchauffement
Kampf
Globale
Menschen
Drohende
climatique
Menschen
contre
Folgen
Drohenden Globalen
Kampf
Folgen
Kohlendioxid
Verhindern
lutte
lutte
contre
planète
Globalen
Grad
Prozent
Kampf
lutter
planète
lutte
lutte
Globale
Erde
Erde
Wort
lutter
Diskussion
Kampf
Verantwort lich
Welt
réchauf fement
planétaire
Deutschland
Globaler
Verstärkt
Jahres
pays
climat
Maßnahmen
Folge
Verursachten
Abzuwenden
Wissenschaft
Auswirkungen
Co2
Weltweiten
Anpassung
Menschen
Ozonloch
lutter conférence
ges conséquences
effets onu
effets
unies
effet
effets
serre
conséquences
serre
respon sables
effet
climati que
conséquences
responsables nations climat face
Bedroht global
rapport effets
climat
global
co2
Ende Erdatmosphäre
Atmosphäre
Atmosphäre
Beitrag
Verantwortlich
Klimawandel
Gore Globale Folgen
71. Climate Change: Conclusion •
•
•
Commonality across countries – Similar pattern of media attention – steep rise in attention after 2005 Divergence across countries: – sense of urgency in EU countries – self-referential discourse, BUT EU looks very much to US, US is self-centred – different frequency of key terms – US discourse is dependent on key terms • GW more dramatizing than CC (‘threat’, ‘action’, fight’) • No difference in UK – EU is more dramatizing overall • Esp. FR and GE: ‘lutte’, ‘combat’, ‘Kampf’, ‘verhindern’, ‘drohend’ Future research: – Understand Nexis better – Better cleaning algorithms; more rigorously defined datasets – analyse more items in more depth; add search terms revealed by the pilot data: eg Klimaschutz – identify agency, discourse participants, subtopics – Widen country sample (Brazil, China, India)