Corpus Linguistics: from lexis to discourse

1 downloads 0 Views 3MB Size Report
Nov 14, 2013 - (Tamil, English, Hindi, Latin, French, German, Sanskrit);. Cambridge, London ... (using a corpus to write/edit dictionaries, grammar/usage books, etc); Birmingham/COBUILD ..... the best statistical procedures to use in order.
Corpus Linguistics: from lexis to discourse Ramesh Krishnamurthy http://acorn.aston.ac.uk/140605CV-RAMESH%20KRISHNAMURTHY.html

Aston University, Birmingham, UK http://www1.aston.ac.uk/lss/staff-directory/krishnamurthyr

Universität Hildesheim November 14th 2013 This universe has been waiting c 13.8bn years = c 435,485,220bn seconds for this moment; there have been 80646.513* bn periods of 90 mins so far; welcome to period 80646.513*4!

2. Overview Some of the ways in which Corpus Linguistics is helping us to: • Describe the English language • reveal the importance of Collocation • develop Language Learning and Teaching

3. Who is Ramesh? (Google me! :)) • A learner and student of LANGUAGES (Tamil, English, Hindi, Latin, French, German, Sanskrit); Cambridge, London (SOAS) • An analyst and describer of ENGLISH (using a corpus to write/edit dictionaries, grammar/usage books, etc); Birmingham/COBUILD • A CORPUS linguist and researcher (literature, sociology, business, politics, translation); Birmingham, Wolverhampton, EU projects • A stand-up performer/storyteller/comedian/teacher? (language learning, academic writing, discourse analysis); Aston

4. Corpus Linguistics: background (1) • Traditional Linguistics: LANGUAGE = SYSTEM cf Physical Sciences, Maths, Engineering • Linguistics = Phonology, Morphology, Syntax • Evidence = Intuition and Introspection • Focus on: the ideal; the possible – Constituents – Structure (Grammar) – Sentence as Unit

• • • •

MEANING? (Semantics = Philosophy) WORDS? (Dictionaries = Lexicography) HUMAN BEINGS? ACTUAL USAGE?

5. Corpus Linguistics: background (2) London University, 1930s: Anthropology >< Linguistics B. MALINOWSKI (Anthropology): Functionalism: (Language) is a mode of action; Context of Situation

J.R. FIRTH (Linguistics): Context of Situation: Participants; Actions (Verbal & Non-Verbal); (physical) Objects; Effects of the Verbal Actions; Modes of Meaning: language event as a whole; levels of analysis... collocation (some words co-occur more often than others)

LANGUAGE = USE: human; activity; in a context; has functions; produces meanings M.A.K. HALLIDAY: Lexico-grammar: lexis = “most delicate grammar”; Lexis as a Linguistic Level; Malinowski’s context of situation = the “Environment of the Text”; Systemic Functional Linguistics: “language serves functions”

6. Corpus Linguistics: background (3) USE = performance; empirical evidence = CORPUS; all levels of language (morphology, words, chunks, text, discourse) J. M. SINCLAIR: Beginning the Study of Lexis (1966); English Lexical Studies (1970) – world’s first computerised corpus of speech (135,000 words), focus on collocation; J.M. Sinclair and M. Coulthard (1975) Towards an Analysis of Discourse; COBUILD (1980>): world’s first corpus-based dictionary, grammar, etc; semantic prosody; Corpus, Concordance, Collocation (1991): Idiom principle and open-choice principle; The Lexical Item (1998): item and environment; verbal environment, textual environment; Trust the Text: Language, Corpus and Discourse (2004)

7. Why use corpora at all? 1. Even “expert speakers” have only a partial knowledge of a language. A corpus can be more comprehensive and balanced. 2. Even expert speakers tend to notice the unusual and think of what is possible. A corpus can show us what is common and typical. 3. Even expert speakers cannot quantify their knowledge of language. A corpus can give us accurate statistics. 4. Even expert speakers cannot remember everything they know. A corpus can store and recall all the information that has been input. 5. Even expert speakers generally find it difficult to make up natural examples. A corpus can provide us with a vast number of real examples.

8. Why use corpora at all? (cont.) 6. Even expert speakers have prejudices and preferences (and every language has its cultural connotations and underlying ideology). A corpus can give you more objective evidence. 7. Even expert speakers are not always available for consultation. A corpus can be permanently accessible. 8. Even expert speakers cannot keep up with language change. A frequently updated corpus can reflect even recent changes in language. 9. Even expert speakers lack authority – they can be challenged by other expert speakers. A corpus can include the actual language use of many expert speakers. etc etc

9. Why not use the Web and Search Engines? Web and Search Engines

Corpora

SIZE

Vast

Manageable

PROCESSING SPEED

Slow

Fast

ANALYSES

Coarse-grained, General

Fine-grained, Detailed, Specific

CONTENT RANGE

No Overview; Diffuse, Uncategorized

Selected, Documented, Categorized

CONTENT STABILITY

Volatile/Dynamic: cannot replicate analyses

Stable: can replicate analyses

CONTENT QUALITY

uncontrolled

Controlled by selection

complex, ‘black box’

simple, fully documented

SOFTWARE

10. Developments in corpus linguistics • Improvements in computer hardware: eg Sinclair (1965, Edinburgh) used Atlas computer at Manchester Uni; COBUILD (1984-1997) used PDP-11, ICL Mainframe, IBM workstations, Sun workstations, PCs; laptops... >Tablet? Cell phone?

• Improvements in computer software: eg word frequency lists, concordances, collocation lists and profiles, n-grams, keywords, POS-tagging, parsing, word sketches, semantics, thesaurus, sentiment, risk, emotions; audio, video; WordClouds and other graphic/visual/interactive displays; AntConc; WordSmith Tools; SketchEngine

• Increase in corpus sizes: Sinclair (1965; 135,000); Brown Uni (1967; 1m);

Birmingham (1986: 18m); BNC (1993: 100m); Bank of English (2002: 512m); GloWbE (2013: 1.9bn)

• Increase in corpus access: – – – – –

http://www.uow.edu.au/~dlee/CBLLinks.htm http://www.hit.uib.no/corpora/sites.html http://www.michaelbarlow.com/ http://corpus.byu.edu/ https://www.sketchengine.co.uk/documentation/wiki/Website/LanguageResourcesAnd Tools

• Increase in coverage of languages & text types: eg English > European, Asian, African languages; eg written, spoken, historical, literature, media, business, language learners, translation, politics, advertisements, emails, blogs, social networks; topic-based corpora (climate change, immigration, feminism, etc)

11. Corpus Tool 1: WORD FREQUENCY (= importance) In Descending Frequency order: Most frequent words

BCET (1986:18 Million words)

Bank of English (2002:418 M-words)

the

1,081,654

22,849,031

of

535,391

10,551,630

and

511,333

9,787,093

to

479,191

10,429,009

a

419,798

9,279,905

in

334,183

7,518,069

that

215,332

4,175,495

s

(8,570)

4,072,762

is

(166,691)

3,900,784

it

198,578

3,771,509

12. FREQUENCY: grouping items RANK 1 2 3 4 5 6 7 8 9 10 11 12 13 14

FREQ 577 319 270 251 204 197 178 168 167 166 156 143 107 101

TYPE the and a to she of I Alice it was in said her you

RANK 23 68 125 146 250 429 402

FREQ 58 22 14 12 7 4 4

TYPE little large high low small top middle

SUB-TOTAL

121

26 30 173 191 203 204

down up size side grow growing

TOTAL

50 41 10 9 8 8

257

13. FREQUENCY: WordCloud: Politics: BBC website (2008)

14. FREQUENCY: WordCloud > World Population http://en.wikipedia.org/wiki/File:Word_population_tagcloud_2011.png

15. Corpus Tool 1: WORD FREQUENCY (= importance) In Alphabetical order: HOW COMMON OR RARE IS THE ITEM? ARE YOU A ... TEACHER? ... STUDENT? ... LINGUIST?

receivability

5

receivable

147

receivables

182

receival

2

receivals

3

receive

29,942

received

48,309

receiver

3,129

receivers

1,311

receivership

737

receiverships

28

receives

4,705

receiveth

2

receiving

11,340

WHICH ITEMS SHOULD YOU ... TEACH? ... LEARN? ... ANALYSE?

HOW MUCH TIME/EFFORT SHOULD YOU SPEND ON EACH ITEM?

16. Corpus Tool 2: CONCORDANCE (= cotext) however, have trimmed its interest receivable figure by £22 tell whether accounts receivable had been paid Word couldn't class offset by an increase in accounts receivable. Later, when would routinely tell her they had Lexical dollars of credit-card relations The value of debtors (or foreign-currency payables against

receivables of thousands receivables it has so fa receivables) has to be p receivables. This type

a glance at Andrei. Replacing the receiver, she spoke hurr of the dining room, picked up the receiver. `Hello," she s collocation want it done now!" He replaced the receiver then walked Kol the path of incoming flights. A to a nearby in on-board waveband with other transmitters,

receiver on an aircraft receiver -- even an FM receivers and computers. receivers and communica

we had anticipated to appoint a to creditors. The Official phraseology Pensioners are keen to bring in the A syndicate of banks called in the

receiver who may well Receiver - part of the receivers to protect wha receivers over debts tot

Lexical room's conversations domains and causing `static"

17. Corpus Tool 3: COLLOCATION LIST (> ‘chunks’) e.g. hard A. From frequency...

B. To statistics... [e.g. t-score]

= co-occurrence

= ‘significant’ collocation

to

58444

it

37451

146.778704

the

39225

to

58444

136.672099

it

37451

work

11378

99.794844

and

27875

very

7297

73.848184

a

24531

is

18314

64.938617

is

18314

find

4915

64.513063

s

16947

working

4347

61.992590

of

14980

worked

4039

61.399611

for

11462

s

16947

55.243460

work

11378

so

5902

49.255196

Too many ‘corpus-frequent’ words

Calculates ‘actual vs expected’ frequency: ... eliminates many ‘corpus-frequent’ words ... retains genuinely significant ‘corpus-frequent’ words ... promotes other ‘more than expected’ words ...

18. Corpus Tool 4: COLLOCATION PROFILE e.g. hard it have but will i has had would he hit they find going

it find is worked s work found was will would been be have

is very s a so be it too work worked was working how

NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE

to work for working time line on pressed core enough currency times hitting

believe get imagine find see say beat tell keep understand come make work

how that but what why it heels to them follow up because whether

WITHIN A FEW MINUTES, CORPUS USER WILL NOTICE: LEXICO-GRAMMAR: (impersonal) it, hard+to-infinitive, hard+for/on (sb?); PHRASEOLOGY: hard + time(s), line, core, pressed, currency, hitting; IDIOMATICITY: hard (on the?) heels (of?); PRAGMATICS: advice, excuses, warnings; MODALITY: will/would, degree (very, so, too, how) SEMANTICS: noun objects (work/time/currency); verb sets (perception: believe /imagine /see /understand; practical: get, find, say, beat, tell, keep, come, make)

19. Corpus Tool 5: N-grams = fixed collocation 1-gram 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2

1527 802 725 615 545 527 509 500 456 395 356 352 345 275 246

the and to a I it she of said Alice in was you that as 2 1

2-grams 1 2 3 4 5

206 127 111 92 74

said the of the said Alice in a and the

3-grams 1 2 3 4 5

47 30 28 27 21

4-grams

the Mock Turtle I don t said the King the March Hare said the Hatter

“Alice in Wonderland”

1 2 3 4 5

18 10 10 7 7

said the Mock Turtle she said to herself you won t you a minute or two said Alice in a

1 2

4 4

Will you won t you will you won t you won t you will you won t you

9-grams

16-grams Will you won t you will you won t you won t you join the dance a body to cut it off from that he had never had to do such a

20. Corpus Tool 5: N-grams UK Conservative Party Manifestos: UK Labour Party Manifestos: UK Liberal Party Manifestos: Party

4-gram

Rank

the cost of living

6

Z

the cost of living

8

Y

the cost of living

22

X

the rule of law

14

X

the rule of law

37

Z

38

Y

the rule of law

From 1900 to 2001 From 1900 to 2001 From 1945 to 1992

4-gram

Rank

Party

the long term unemployed 16

Z

the long term unemployed 80

Y

the long term unemployed 329

X

the men and women

39

Y

men and women of

40

Y

men and women who

79

Z

the National Health Service

17

Y

for men and women

83

Z

the National Health Service

23

X

men and women of

246

X

the National Health Service

123

Z

for men and women

261

X

21. Corpus Tool 5: N-grams: Michaela Mahlberg (Clusters in DICKENS, 2007)

22. Google Books: N-gram Viewer • Copy-and-paste this URL into a browser: • https://books.google.com/ngrams

23. Corpus Tool 6: Keywords • = the WORDS that have the most significant frequency difference between 2 corpora • NB there is currently much discussion about the best statistical procedures to use in order to achieve this; in particular, current methods do not take account of ‘effect size’

24. Corpus Tool 6: Keywords: UK ‘riots’ 2011 (Aug 7>) riot in UK newspaper articles: BEFORE (Aug 1-5: 346); DURING (Aug 10: 500); AFTER (Aug 27: 249) BEFORE (compared to DURING)

DURING (compared to BEFORE)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

 FESTIVAL PLAY MUBARAK THEATRE SPRAOI HIS YEARS TRIAL WATERFORD DIRECTOR COMMISSIONED CONTEMPORARY A AWARD

LONDON POLICE RIOTERS WEDNESDAY BIRMINGHAM LOOTERS LOOTING DAILY RIOTS SHOPS YOUTHS NIGHT SHOP STREETS TELEGRAPH

25. Corpus Tool 6: Keywords: UK ‘riots’ 2011 (Aug 7>) DURING (compared to BEFORE) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

LONDON POLICE RIOTERS WEDNESDAY BIRMINGHAM LOOTERS LOOTING DAILY RIOTS SHOPS YOUTHS NIGHT SHOP STREETS TELEGRAPH

DURING (compared to AFTER) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

POLICE WEDNESDAY # NIGHT LONDON DAILY SAID SHOPS VIOLENCE Â TELEGRAPH WERE VIDEO REPORT FIRE

26. Corpus Tool 6: Keywords: UK ‘riots’ 2011 (Aug 7>) DURING (compared to AFTER) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

POLICE WEDNESDAY # NIGHT LONDON DAILY SAID SHOPS VIOLENCE Â TELEGRAPH WERE VIDEO REPORT FIRE

AFTER (compared to DURING) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

I SATURDAY YOU CARNIVAL A MY IS YEARS THAT HOW HOMELESS PLYMOUTH Ñ OR PRISON

27. COBUILD: INITIAL LEXICAL DISCOVERIES • the most frequent content words: e.g. time, people [Cobuild, 1987] • frequent verbs are mostly used ‘delexically’: e.g. take a bath/an exam, make a decision/mistake, have a rest/an idea [Cobuild, 1987] • see mostly means ‘understand’ (esp. in spoken): e.g. I see / You see • thing, this, that mostly refer to abstract items (event, reason, etc), NOT physical objects: e.g. A strange thing happened… Is that why you had a few days off?... This is why I'm opposed to the plan [Willis, 1990] • -ly adverbs are NOT used in all meanings of the adjective: e.g. lamely for excuses (not walking), crisply with speech verbs [Cobuild; cf ~ in other dictionaries] • of is NOT a preposition: not used in adverbial phrases, but in noun phrases: e.g. slice of bread, waste of time [Sinclair, 1991] • identifying new words: e.g. 1985 corpus = 0 > 1995 corpus = camcorder (1214), virtual reality (458), imaging (463), mobile phone (455), satellite dish (236), laptop (184), videophone (144), palmcorder, smart card, microsurgery, teleworker, email, helipad, hypertext [Cobuild 1995]

28. Effect of corpus on dictionary • INCLUSION/EXCLUSION OF HEADWORDS • TRADITIONAL DICTIONARY ENTRY: en·crust (ĕn-krŭst') also in·crust (ĭn-) tr.v., -crust·ed, crust·ing, -crusts. 1. To cover or coat with or as if with a crust 2. To decorate by inlaying or overlaying with a contrasting material • CORPUS EVIDENCE: encrust 9 encrusts 1 encrusting 13 encrusted 645 • CORPUS-BASED DICTIONARY: • Adjective headword – Stative (not dynamic) – Gradable

29. Collocation • Collocation is the ‘lexical realisation of the situational context’ Sinclair (1987) • often, ‘no particular group of collocates occurs in a structured relationship with the word’ (ibid) [i.e. NOT a grammatical feature] • For many years, I could not understand the meaning of Firth’s statement: ‘one of the meanings of night is its collocability with dark’... • I suddenly realised what I needed to do: check the statement using corpus techniques!

30. dark night

31. Collocation disambiguates near-synonyms

32. Collocation disambiguates parts of speech: file as VERB file as NOUN collocate to for bankruptcy complaint against suit charges will returns lawsuit a

frequency as collocate 669 187 52 45 54 41 39 66 30 28 249

t-score significance 18.45 8.84 7.19 6.69 6.64 6.34 6.14 5.60 5.43 5.28 5.27

collocate rank fact tape on and single a feature from the archetype

frequency as collocate 442 178 150 398 934 96 801 75 227 1739 51

t-score significance 21.00 12.72 12.14 11.24 11.08 9.30 9.06 8.54 7.68 7.24 7.14

33. Collocates: Transitivity and Semantic Prosody: build up (2 billion words: http://corpus2.byu.edu/glowbe/ ) 1-4 words to the LEFT

1-4 words to the RIGHT

VERBS and ADVERBS

NOUNS

1 HELP

14 FLUID

2 TRYING

15 TOXINS

3 GRADUALLY

16 BACTERIA

4 SLOWLY

17 PLAQUE

5 TAKES

18 ACID

6 SLOW

19 STRESS

6 RELATIONSHIP

7 HELPED

20 ICE

7 PORTFOLIO

8 PRESSURE

21 CARBON

8 SKILLS

9 ALLOWED

22 DEPOSITS

9 BASE

10 HELPS

23 DUST

11 PREVENT

24 CELLS

12 HEAT

25 TENSION

13 HELPING

26 EXERCISE

NOUNS

[animate] + builds up + Y POSITIVE

1 PICTURE 2 CONFIDENCE 3 STRENGTH 4 BODY 5 RESERVES

10 MUSCLES 11 TRUST 12 NETWORK 13 CAPACITY

34. Semantic Prosody • “seemingly neutral words can be perceived with positive or negative associations through frequent occurrences with particular collocations” • J.M. Sinclair (1987) Looking Up – e.g. rot, rain, fog, night, decline, depression, etc + set in

• “consistent aura of meaning with which a form is imbued by its collocates” (Louw, 1993: 157) – e.g. bent on, utterly, symptomatic of, without feeling

• Louw, W. E. (1993) Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In: Text and technology. In honour of John Sinclair. Ed. M. Baker, G. Francis and E. Tognini-Bonelli, 152—176. Amsterdam: John Benjamins

35. Semantic Prosody of bent on: (from Bill Louw’s 1993 article; new data from http://corpus2.byu.edu/glowbe/ )

Right-hand collocates – and examples of the“seemingly neutral words“ DESTROYING DESTRUCTION MAKING

the few ruling class are bent on making life difficult for the poor

REVENGE GETTING

If you are just bent on getting rid of her and do not care about

TAKING

SCP-079, an Artificial Intelligence bent on taking over the facility.

DOING

bent on doing everything possible to introduce a silly culture of

KILLING DOMINATION KEEPING

scientists hell bent on keeping their funding gravy train rolling

CREATING

he is bent on creating a new world where he is supreme ruler

BRINGING

bent on bringing down Zubiri from the crest of moral ascendancy

CAUSING

+ trouble, mayhem, hysteria, havoc, chaos, confusion, destruction

36. Collocations can be culture-specific • spider & bath: in BNC (British) – not in other European languages? (Hanks 2013: c134-9) • GloWbE: 21: US 1, NZ 1, Singapore 1, GB 18

37. From collocation to prefabricated chunks (the “idiom principle”): naked eye • Sinclair JM. 1996. The Search for Units of Meaning. Textus IX: 75-106

45. new broom: fixed expression reduced to chunk COBUILD Dictionary (1987): broom... 1 A broom is a kind of brush with a long handle. You use a broom for sweeping the floor. 2 Broom is a wild bush with a lot of thorns and tiny yellow flowers which grows on waste ground or sandy ground.

new broom… A person who has just started a new job and who intends to make a lot of changes may be referred to as a new broom; an informal expression.

46. new broom: the development 1. There are examples of new broom meaning a person: ..But he is hardly a new broom... ...Adrian Shinwell, president and new broom at the Scottish Conservative and Unionist Association... 2. One or two retain sweep clean from the proverb: ...The new broom is determined to sweep clean. But she never expected... ...They're still feeling their way, Alec, and new brooms always want to sweep cleaner than the old... 3. But many examples show that new broom can now also refer to the policies or changes carried out by the new appointee: ...Heseltine's new broom has yet to show its bristles. ...lending enthusiastic strength to the hand that pushes the new broom. 4. sweep can be re-introduced as a pun: ...The new broom that swept in with the new season is held by Tony Russ... ...his new broom has already swept into some dark corners of the Asda empire. 5. new can be separated from broom: ...Since taking helm back in March, the new chief has wielded the broom through the business and its debts... 6. new can be omitted altogether and replaced by the person making the changes: ...as the Kinnock broom swept clean in Walworth Road... 7. The nature of the changes can be specified by replacing new: ...Bishko' s rationalisation broom has swept through the 253-strong operation... 8. new broom can be used like an adjective, in front of another noun referring to a person or a policy: ...the company's new broom chief executive... ...as a result of the `new broom' policy of Ian Rock, the managing director...

47. How do we learn L1? How do we teach L2? • We learn our L1 by experiencing thousands of EXAMPLES IN THE LANGUAGE (i.e. INDUCTIVELY), in a wide range of places and situations, from a variety of sources, over a long period of time – focus on PROCESS: L1 language learning, riding a bicycle, driving a car, parenting – Aristotle (350BC/1908, Nicomachean Ethics II,1, Oxford; “It [moral virtue], like the arts, is acquired by repetition of the corresponding acts... the things we have to learn before we can do them, we learn by doing them”

• We teach an L2 using abstract INFORMATION ABOUT THE LANGUAGE - grammar rules, dictionary definitions (i.e. DEDUCTIVELY) - with very few examples, from few sources, mostly in a classroom, over a much shorter time period – focus on PRODUCT: L2 language learning – Willis (2003:1) “what is taught is often not learnt, and learners often ‘learn’ things which have not been taught at all”

48. Why do we learn an L2? • Changes in goals of language learning: – – – –

knowledge religion, trade, politics literature, culture, travelling abroad study abroad, migration, work (home or abroad), tourism, business (multinational companies), globalised products (planes, electronics, etc), translation, interpreting

• Other changes: – – – – –

population increase, population mobility - geographical & social economics/globalisation/recession/market forces pressures on education systems heterogeneous students - age, background, L1, etc teaching methods and resources - no longer adequate, sufficient, appropriate – rise of English as global language – changes in language relationships, endangered languages

49. Changes in methods and resources • focus: from classroom to real world • materials: from made-up to authentic texts • from passive to active learning • from teacher to student focus • from chalk & paper to computers • Corpora are well-suited to these shifts

50. Advantages of corpora • • • • • • • • •

exposure to varied authentic materials intensive/condensed linguistic experience always up-to-date constantly raising language awareness encouraging learner autonomy and research focus on common and typical but shows variations lifelong learning transferable skills flexibility – accessible anywhere, any time

51. LANGUAGE TEACHING: deductive teaching •

http://www.bbc.co.uk/worldservice/learningenglish/grammar/learnit/learnitv230.shtml



Yee from Hong Kong writes: I'm not sure how the suffix -ness works. Can we add -ness to all types of words to make nouns? Roger Woodham replies: -ness is one of a number of noun suffixes. It is used to make nouns from adjectives, although not every adjective can be modified in this way. Here are some common adjectives whose noun forms are made by adding -ness: happy sad weak good ready tidy forgetful Note the spelling change to adjectives that end in -y: – Everybody deserves happiness in their life. To be happy is a basic human right. – There was a lot of sadness in the office when people learned of his illness. – His readiness to have a personal word with everybody at the funeral was much appreciated. – He is such a forgetful person. Such forgetfulness cannot be excused. – If you want to work for such an organisation, you are expected to maintain a high standard of tidiness in your appearance.



52. FREQUENCY: facilitating inductive learning •

http://www.morewords.com/most-common-ends-with/ness/



from the Enable2k North American word list, which uses “Word Frequencies in Written and Spoken English: based on the British National Corpus” (Lancaster University) • The most frequently occurring words ending with –ness • Frequency lists can be used for INDUCTIVE LEARNING: business

752

effectiveness

21

witness

80

madness

16

awareness

72

sadness

16

illness

71

goodness

15

darkness

68

thickness

15

consciousness

52

bitterness

14

weakness

44

harness

14

happiness

34

kindness

14

fitness

32

wilderness

14

sickness

24

loneliness

12

53. CONCORDANCE (1): “gapped concordance” exercise to make a sturdy coffee ------------------------ with plenty of room for others who sit round her ------------------------ include the business men MACPHERSON : Now on the ------------------------ here we've got what looks a loveseat, a coffee ------------------------ two large cardboard suitcases lay open on a ------------------------ The silver had already 1818? [p] Answer: From ------------------------ C, we find the year 1816 I lowered my tray ------------------------ from the back of the seat end of the dark shiny ------------------------ and toasted each other in the size of a snooker ------------------------ Another room upstairs is

54. more about CONCORDANCE (2): “gapped concordance” exercise ...- so it's generally best to feed a small …………………… twice a day. Similarly, growing an ...fascinate me. You can tell how a …………………… is feeling by his tail. If his tail is up, ...of days later I'm watching TV and this …………………… food commercial comes on an it's the ...Year. For what remains of the Year of the ……………………, the Tiger should remain committed 7 ...he was with a woman walking a black ……………………. On neither occasion did the man speak to ...I'm prepared if a policeman sees me with my …………………… off a lead in a public place

55. ERROR ANALYSIS • Teachers analyse students’ errors (Fei-Yu Chuang 2005) • a corpus of 50 essays written by Chinese EAP (English for Academic Purposes) foundation students • 5232 errors identified. Most frequent errors were: • • • • • • • • • •

(1) Missing definite article (2) Bare singular count noun for plural (3) Redundant definite article (4) Mis-selection of preposition (5) Lexical misconception (6) Wrong tense and aspect (7) S-V non-agreement (8) Wrong collocation (9) Missing ‘a’/’an’ (10) Comma splice

10.1% 8.8% 8.5% 6.1% 5.8% 3.8% 2.4% 2.1% 2.0% 2.0%

55

56. TEACHING • Language Learning & All Subjects • From National Education Planning to Individual Institutions and Teachers • Syllabus/Course design: lists of words/topics required at different levels; domain-specific definitions; common errors; Lexical Syllabus • Coursebooks: design, examples • Software development; spellcheckers; grammar checkers; • Classroom use: produce teaching materials; students improve academic writing • Examination Boards: design and marking • Detecting plagiarism: software compares student texts, academic articles, internet sources

57. How can corpora help? THE MODEL • Input Syllabus – CORPUS 1: Expert/Input Material (= ‘target’) • • • •

Drafting academic writing Feedback Editing and revision Assessment

– CORPUS 2: Student/Output Material – Contrastive Corpus Analyses of Input and Output

• Revised Syllabus

58. A PEDAGOGICAL CORPUS (Kosem & Krishnamurthy 2007) (chemistry, education, etc)

(chemistry, education, etc)

DOMAIN

DOMAIN

(lecture, discussion, etc)

(article, essay, etc)

GENRE

GENRE

expert

LEVEL

learner SPOKEN

WRITTEN MODE

59. TEACHING: ACADEMIC WRITING (Kosem & Krishnamurthy 2007) General English: say = “speak” http://acorn.aston.ac.uk/

Academic English: say = “write” http://acorn.aston.ac.uk/

argue = “quarrel” (General English); = “propose/claim” (Academic English) http://corpus.byu.edu/bnc/

60. How can students improve their writing skills? • Focus on Product = Neglect of Process? – – – – –

Exam results, League tables Marking systems (class distribution) Equality (irrespective of motivation/performance) Increasing instrumentality in attitudes to education Grenfell and James (2004:510): academic products structure practice

1. Read more journal articles and books? – student time; slow

2. Receive more instruction? – teacher time; slow

3. Receive more corrections/feedback? – students don’t read/understand/process (passive)

4. Corpora? – Learner autonomy; active/inductive/’discovery’ learning – student time; quicker?

61. A CASE STUDY • CONTEXT: Computer Science Student; 3rd year = placement with me; Chinese native-speaker; came to UK in 2002; studied English for 9 months; 2 years of A-levels (Maths, Chinese, Physics); submitted weekly 1-page report on his work (corpus programming) • AIMS: help him improve his English & write better reports; test the corpus system & improve the software; understand pedagogic implications of this method • PROCEDURE: started informally; worked well; started to preserve data – RAMESH: 2-3 minutes highlighting items (‘errors’) in green – STEVEN: • 5-10 minutes correcting 10-15% ‘silly mistakes’ • 30 minutes checking corpus, correcting 60-70% other errors

– R&S: • 15 minutes discussing 15-20% ‘complex’ items • 15 minutes discussing Chinese/English, corpus software, search procedures

• EXAMPLES of highlighted items: – I will take a deep look into it next week… I replied him… He was not an expert with MySQL… The PHP engine on the server might out put an error message.

62. Corpus & discourse • language is the core of human activity, and the basis for decisions; we are presented with varying hypotheses and scenarios from science, politics, business, society, etc; but language/discourse is crucially involved in the decisions we make and the ways in which they are implemented

63. After Smith, N. V. (2002) How to be the centre of the universe. In Language, Bananas and Bonobos: Linguistic Problems, Puzzles and Polemics. Oxford: Blackwell [http://www.llas.ac.uk/resources/gpg/98

Language & Linguistics

64. Corpus Linguistics enables: • • • • • •

quantitative analyses of texts a more objective basis for qualitative analyses more reliable/verifiable statements more generalisable statements new insights into basic themes and threads discovery of aspects missed by qualitative approaches alone, e.g. – https://www.upworthy.com/the-one-video-iguarantee-youll-watch-twice?c=ufb1

65. FEMINISM

Corpus Analysis: Bank of English Collocation list by t-score of 1211 women 204 and 965 radical 131 feminism 110 has 248 is 415 wave 70 lesbian 64 political 76 post 65 feminist 58 anti 55 socialist 42 her 123 modern 45 men 55 about 115 politics 40 new 96 second 54

16.568256 13.460741 12.321319 11.360602 10.471986 10.450125 9.037262 8.253780 7.905844 7.788159 7.620464 7.556627 7.044710 6.375852 6.322697 6.261655 6.019111 5.981272 5.944469 5.725465 5.723481

a) political: radical, political, post, anti, socialist, politics, movement, Marxism, therapy, theory, liberal, minority, gender, black b) sexual: lesbian, gay, lesbianism c) qualitative/evaluative: wave, modern, new, social, equality, backlash, liberation, contemporary (?) d) western (?) e) second (?), scale, rise f) 1970s

66. FEMINISM Corpus Analysis: IDS corpus, Mannheim Collocates of ‘Feminismus’ categorised into 8 thematic clusters Category

Collocates

political movements/ ideas

radikal (8) Ökologie (16) kämpferisch (8) Sozialismus (11) Vertreter (11) militant (7) Kommunismus (10) Marxismus (3) existieren real (6) extrem (3) Pazifismus (3) Ideologie (6) politisch (15) Nationalismus (3) Antisemit (3) Revolution (3)

sex, gender roles/ body

Frau (119) Machismo (10) Sexismus (10) Geschlecht (11) männlich (14) feminin (5) weiblich (12) sexual (6) schwul (4) sexuell (3) sexy (3) Weib (3) Mutter (5)

feminist ideas, Emanzipation (13) Ikone (12) Gleichberechtigung (20) neu (61) feministisch (10) Feministin (6) Wegbereiter (3) Schwarzer Alice (6)

academic/ arts/ literature

Pusch (12) Thema (43) Basis (5) akademisch (7) Theorie (6) Geschichte (9) Diskussion (5) Autor (3) denken (5) Philosophie (4) Kampf (5) Werk (4) Ansatz (3) Vortrag (3) schreiben (3) Kritik (3) Kultur (3) Medium (4) Frage (4) lernen (3) Wut (3)

places

amerikanisch (11) westlich (7) irisch (3)

time

70er (7) siebziger Jahr (4) Zeit (5) Jahr (14) Jahrzehnt (13) heute (6)

religion

Schabbat (3) Bibel (3)

social trends

Konsumgesellschaft (7) Esoterik (3) sozial (4) Pop (5)

67. Search terms used in Nexis UK database to select articles for inclusion in Climate Change corpus – US and UK: climate change / global warming / greenhouse effect – FR: changement climatique / effet de serre / réchauffement de la planète / réchauffement climatique – GE: Klimawandel / globale Erwärmung / Treibhauseffekt / Klimakatastrophe

68. Climate change in the media over time Number of articles / news items 70000 60000 50000 40000 30000 20000 10000 0 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20

UK

US

GE

FR

69. (edited) FREQUENCY lists – most frequent content words: ENGLISH US

UK

US

climate change

UK

US

global warming

UK

greenhouse effect

climate

climate

global

global

global

greenhouse

new

change

warming

warming

warming

effect

warming

energy

new

new

greenhouse

years

energy

new

energy

climate

new

energy

change

people

people

people

years

global

years

years

years

years

carbon

new

emissions

mr

president

change

effect

carbon

environmental

global

state

world

climate

world

state

world

bush

energy

energy

climate

people

us

climate

last

dioxide

per

percent

government

environmental

pg

environmental

warming

states

carbon

time

type

scientists

people

president

pg

world

mr

percent

dioxide

greenhouse

emissions

percent

world

change

carbon

uk

states

people

us

world

environment

emissions

time

time

70. (edited) COLLOCATION lists – FR / GE content word collocates (+/-5 words) changement

réchauffement

serre

Changement

Réchauffement

serre

climatique

climatique

effet

contre

contre

gaz

réchauffement

Klimawandel

Erwärmung

Treibhauseffekt

KlimaKatastrophe

réchauffement

Kampf

Globale

Menschen

Drohende

climatique

Menschen

contre

Folgen

Drohenden Globalen

Kampf

Folgen

Kohlendioxid

Verhindern

lutte

lutte

contre

planète

Globalen

Grad

Prozent

Kampf

lutter

planète

lutte

lutte

Globale

Erde

Erde

Wort

lutter

Diskussion

Kampf

Verantwort lich

Welt

réchauf fement

planétaire

Deutschland

Globaler

Verstärkt

Jahres

pays

climat

Maßnahmen

Folge

Verursachten

Abzuwenden

Wissenschaft

Auswirkungen

Co2

Weltweiten

Anpassung

Menschen

Ozonloch

lutter conférence

ges conséquences

effets onu

effets

unies

effet

effets

serre

conséquences

serre

respon sables

effet

climati que

conséquences

responsables nations climat face

Bedroht global

rapport effets

climat

global

co2

Ende Erdatmosphäre

Atmosphäre

Atmosphäre

Beitrag

Verantwortlich

Klimawandel

Gore Globale Folgen

71. Climate Change: Conclusion •





Commonality across countries – Similar pattern of media attention – steep rise in attention after 2005 Divergence across countries: – sense of urgency in EU countries – self-referential discourse, BUT EU looks very much to US, US is self-centred – different frequency of key terms – US discourse is dependent on key terms • GW more dramatizing than CC (‘threat’, ‘action’, fight’) • No difference in UK – EU is more dramatizing overall • Esp. FR and GE: ‘lutte’, ‘combat’, ‘Kampf’, ‘verhindern’, ‘drohend’ Future research: – Understand Nexis better – Better cleaning algorithms; more rigorously defined datasets – analyse more items in more depth; add search terms revealed by the pilot data: eg Klimaschutz – identify agency, discourse participants, subtopics – Widen country sample (Brazil, China, India)