A corpus-driven study of the language of hotel websites

1 downloads 0 Views 3MB Size Report
Langham. Hong. 18. 17. Place. Hong. 18. 18. Place. Kong. 18. 19. Grand. Hong. 17. 20. Harbour. Hong. 17. Total no. of two-word concgrams. 19,862 ...
A corpus-driven study of the language of hotel websites: Semantic fields and phraseology Winnie Cheng Department of English The Hong Kong Polytechnic University Conference on Corpus Linguistics and ESP 2014 Northwestern Polytechnical University Xi'an Shaanxi 17-18 October 2014

1

Research project Title: Multimodal analysis of hotel homepages: A comparison of hotel websites across different star categories (4-ZZAV) Co-Investigator: Dr Amy Suen Aim: To compare a sample of websites of hotels of different star categories (3-star, 4-star and 5-star) in Hong Kong: Functionality Prototypical and hyperlinked structure Multi/Hypersemioticity 2

Purpose of talk To report on comparative findings of a corpusdriven study of the way in which hotels of different star categories in Hong Kong describe and promote the entities (hotel products and services) in the English introductory texts of the hotel homepages by examining the semantic fields (using Wmatrix) and phraseological patterns (using ConcGram) characteristic of the three corpora to have an informed understanding of the ‘aboutness’ of the groups of hotels.

3

Overview of talk Corpus linguistics and ESP Corpus-based and corpus-driven approach

The phraseological tendency of language aboutness Semantic categories/fields Use of different corpus analytical programs/tools 4

Corpus linguistics and ESP

5

6

Corpus linguistics and ESP English for Specific Purposes (ESP)

RCPCE Professionspecific Corpora

English for Academic Purposes (EAP)

Corpus of Research Articles (CRA)

English for Professional Purposes (EPP)

Hong Kong Financial Services Corpus, Engineering Corpus, Surveying and Construction Corpus

7

Online RCPCE Profession-specific Corpora • Hong Kong Corpus of Spoken English (1 million words, prosodically transcribed) • Hong Kong Corpus of Surveying and Construction Engineering (5.7 million) • Hong Kong Engineering Corpus (9.2 million) • Hong Kong Financial Services Corpus (7.3 million) • Hong Kong Budget Speeches Corpus 1997 – 2010 (176,515) • Hong Kong Policy Address Speeches Corpus 1997 – 2009 (153,198) • Corpus of Research Articles (5.7 million) ConcGramOnline© , Chris Greaves 8

8

8

Corpus of Research Articles (CRA) 5.6 million words 39 disciplines For each discipline, 20 articles in 20 popular journals with high impact factors in 2007 780 journals 16 research article sections

9

39 disciplines in CRA 1 Accounting & Finance 2 Anthropology 3 Applied Biology & Chemical Technology 4 Applied Linguistics 5 Applied Mathematics 6 Applied Physics 7 Applied Social Sciences 8 Archaeology 9 Building & Real Estate 10 Building Services Engineering 11 Civil & Structural Engineering 12 Computing 13 Design 14 Economics

15 Education 16 Electrical Engineering 17 Electronic & Information Engineering 18 Geography 19 Health Technology & Informatics 20 History 21 History of Art 22 Hotel & Tourism Management 23 Industrial & Systems Engineering 24 Land Surveying & Geoinformatics 25 Law 26 Linguistics

27 Literature 28 Logistics 29 Management & Marketing 30 Mechanical Engineering 31 Music 32 Nursing 33 Optometry 34 Philosophy 35 Politics 36 Psychology 37 Rehabilitation Sciences 38 Sociology 39 Textiles & Clothing

10

11

12

16 sections in CRA 16 sub-corpora Sections

Sections

1. Abstract

9. Literature Review

2. Application

10. Method and Results

3. Conclusion

11. Method, Results and Discussion

4. Directions

12. Results and Discussion

5. Discussion

13. Method

6. Implications

14. Recommendations

7. Introduction

15. Results

8. Limitations

16. Summary 13

The corpus-based approach 

The corpus is valuable as a source of quantitative evidence but often “the potential of corpus evidence is not exploited fully … in order not to threaten some existing theoretical positions” (Tognini-Bonelli 2001: 10), i.e.  

“the data is relegated to a secondary position with respect to the theoretical statement proper” (p. 68) the theoretical categories/statements are not revised, based on corpus evidence

14

The corpus-based approach A corpus is used as an inventory of language data (a repository) to extract appropriate material: • to support intuitive knowledge • to extract illustrative examples • to quantify linguistic phenomena • to verify expectations • to find proof for existing language theories 15

The corpus-driven approach A corpus is used “beyond the selection of examples to support or quantify a pre-existing theoretical category. Here the theoretical statement can only be formulated in the presence of corpus evidence and is fully accountable to it” (Tognini-Bonelli, 2001: 11), i.e. any conclusions or claims are made exclusively on the basis of corpus observations Makes minimal a priori assumptions about the linguistic categories and units that should be used for the analysis (Biber, 2009: 278) 16

The phraseological tendency of language

aboutness 17

18

Meaning is Created I couldn’t believe that I could actually understand what I was reading. The phenomenal power of the human mind

According to a researcher at Cambridge University, it doesn’t matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Amazing huh? Yeah and I always thought spelling was important! 19

Meaning is Created … the human mind does not read every word in a clause by itself, but the coselection of words as a whole.

20

The phraseological tendency of language The way in which words are co-selected by speakers and writers, i.e., word cooccurrences evident in texts

To fully describe the meaning and use of language 21

Aboutness • ‘aboutness’ (Phillips, 1983): the phraseology of the language contained in a text or a corpus that is specific to a discipline or a profession

22

Phraseological variation Clusters/n-grams/ bundles/chunks, i.e. patterns of contiguous words such as ‘you know’, ‘in terms of’, ‘a lot of’, ‘work hard’, etc.

But what about patterns with phraseological variations, e.g. – ‘a lot of business people’ and ‘a lot of different types of

people’ – ‘work hard’, ‘work very hard’, ‘hard work’, and ‘hard I had to work’? 23

How can the phraseological tendency of a language be objectively and formally identified? Is there a means to fully and automatically extract phraseologies which exhibit variation from a corpus? 24

Cheng, W., Greaves, C., Sinclair, J. McH, & Warren, M. (2009). Uncovering the extent of the phraseological tendency: Towards a systematic analysis of concgrams. Applied Linguistics, 30/2, 236-252. Cheng, W., Greaves, C. & Warren, M. (2006). From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11(4): 411-433. 25

How to uncover phraseologies (1) n-gram (bi-grams, trigrams, etc.) contiguous words which constitute a pattern of use and which recur in a corpus

skipgram

e.g., in terms of, in terms of the

e.g., a lot of business people, a lot of different types of people

non-contiguous word cooccurrences of limited membership which constitute a pattern of use and which recur in a corpus

26

How to uncover phraseologies (2/2)

Defining ‘concgram’: All of the permutations of constituency and positional variation generated by the co-occurrence of two or more words (Cheng, Greaves, & Warren, 2006), e.g., ‘work hard’, ‘work very hard’, ‘hard work’, and ‘hard I had to work’? 27

ConcGram 1.0 (Greaves, 2009) "a search-engine, which on top of the capability to handle constituency variation (i.e. AB, ACB), also handles positional variation (i.e. AB, BA), conducts fully automated searches, and searches for word associations of any size." (Cheng, Greaves, & Warren, 2006: 413) Main search functions: Unique words – to generate unique word list Concgrams – to generate concgram list Concordance search – to search for words Concgram search – to search for concgrams (two-word, three-word, up to five-word) Exclusion list – to exclude the top 50 words in the BNC

28

29 29

‘because/so’ in British National Corpus 1 won't know is that she's never bothered to ask because she's not talking so it's okay while there but 2 and you've got to have the front door [unclear] because there's a bar at back so these are special 3 the taxi. He goes well, let me read it. Because, because I'm a complete stranger so I don't have to spend 4 you see, J Julie's likely to do quite a lot because she's got to stay there so you've got to 5 home, Rowan's mother wouldn't let her have it because it was too revealing and so Penny was stuck with 6 the morning Yeah, no it wouldn't be tomorrow because I think my mum's working so Yeah It doesn't 7 No, it's not going to cost her any more, because it's included in the plan, so it's not going to 8 with Chris and Chris insisted that he did it. Because he's got a plan of the site so he wants to know 9 give you a bit of my advice [unclear] on a lead, because er you haven't had the call so you ought to be 10 scratch in Alan's well equipped kitchen. But because Linda has to stop half way through so that other 11 the movement and people need labels. I think , because the society does want to categorise people so 12 1960s were in the lowest housing class. This was because they generally had low and insecure incomes, so 13 he, was he so naughty to you? so Richard's crying because he'd been hitting him the face. He's howling and 14 get the land. Er, so I just make that point because of the debate last week. Thank you. Thank you. 15 I'll put those down, so let's find some of these because obviously you won't have met them all, maybe. 16 hundreds of years. So he brought his family over because negotiations were taking so long, and he 17 at this time. So it must be that one Mm mm because the other chap comes about half past eight in the 18 that's fine, so I'll have to get it in soon because I won't be able to get him in till about for 19 [unclear] So to get this You had to pay this, because when it came to the end of the quarter, you had 20 worried so I thought well I might as well go up because I shall start to worry and things get out of 21 effort so that they will fear losing their jobs because the alternative jobs are less well paid (see 22 that. So you, you've got to think about those, because if you want to survive, and you also want to go 23 it. So I think that that's an important point, because I do believe that weight is placed by the 24 so then we can talk about lobbying Parliament, because we can't do it without them. We need a focus 25 so before anybody jumps for it, think about it, because it's boring. Now down to business I would like 30

A sample concordance of ‘political/Hong Kong’ in the Western Media Corpus 2006-2008

31

‘Asia/world/city’

32

Semantic categories/fields Semantic preferences: The relation between an individual word/ a lexical item and semantically-related words (Sinclair, 1996, 2004)

Corpus program 33

Wmatrix: A web-based software tool for corpus analysis and comparison Developed by Paul Rayson since 1998 A web interface to the USAS and CLAWS corpus annotation tools for automatic semantic tagging and part-of-speech (POS) tagging Standard corpus linguistic methodologies, e.g. Word frequency profiles Concordances Key grammatical categories and key semantic domains: part-of-speech annotation and semantic content analyses

34

UCREL* Semantic Analysis System (USAS) A framework for undertaking the automatic semantic analysis of text • 21 major discourse fields • 232 semantic categories • 453 tagsets

Semantic field • By grouping related words and multi-word expressions into a conceptual category – * UCREL: University Centre for Computer Corpus Research on Language, Lancaster University 35

21 Discourse fields

Source: http://ucrel.lancs.ac.uk/usas/ 36

Constituent Likelihood Automatic Word-tagging System (CLAWS) Part-of-speech (POS) tagging Latest version: CLAWS4, used to POS tag c.100 million words of the British National Corpus (BNC) 137 tagsets Extract different word classes automatically • Assign a part-of-speech (POS) tag to each word

37

38

A corpus-driven study of the language of hotel websites: Semantic fields and phraseology

English for Professional Purposes (EPP)

39

Three hotel homepage corpora Hotel category Number of hotels Type/ Token ratio

3-star hotels 147 (60.8%)

4-star hotels 77 (31.8%)

5-star hotels 18 (7.4%)

1,571/ 7,386

1,604/ 6,532

1,328/ 4,949

Total (%) 242 (100%)

18,867

40

Semantic tagsets

No. of semantic tagsets Number of hotels

3-star corpus 247 (34.7%)

4-star corpus 244 (34.3%)

5-star corpus 221 (31%)

147 (60.8%)

77 (31.8%)

18 (7.4%)

41

Semantic tagsets of the corpora

No. of semantic tagsets Number of hotels

3-star 4-star corpus corpus 247 (34.7%) 244 (34.3%) 147 (60.8%) 77 (31.8%)

5-star corpus 221 (31%) 18 (7.4%)

42

Top 20 semantic tagsets of the 3-star corpus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Semantic tagset Z5 Grammatical bin Z2 Geographical names Z8 Pronouns H4 Residence Z99 Unmatched M6 Location and direction M7 Places A3+ Existing M1 Moving, coming and going Z1 Personal names H2 Parts of buildings Z3 Other proper names I2.1 Business: Generally O4.2+ Judgement of appearance: Beautiful S8+ Helping A9Giving I2.2 Business: Selling N1 Numbers N5.1+ Entire; maximum H1 Architecture, houses and buildings

Frequency 1899 297 223 201 192 179 160 150 133 111 105 92 88 81 77 77 75 73 67 66 43

Top 20 semantic tagsets of the 4-star corpus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Semantic tagset Z5 Grammatical bin Z2 Geographical names Z8 Pronouns Z99 Unmatched H4 Residence M6 Location and direction M7 Places A3+ Existing M1 Moving, coming and going H2 Parts of buildings O4.2+ Judgement of appearance: Beautiful N1 Numbers Z1 Personal names F1 Food I2.2 Business: Selling Z3 Other proper names S8+ Helping I2.1 Business: Generally A5.1+ Evaluation: Good A9Giving

Frequency 1729 240 221 167 153 134 134 109 109 108 98 96 93 80 75 72 70 65 56 54 44

Top 20 semantic tagsets of the 5-star corpus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Semantic tagset Z5 Grammatical bin Z2 Geographical names Z99 Unmatched Z8 Pronouns H4 Residence O4.2+ Judgement of appearance: Beautiful M6 Location and direction M7 Places A3+ Existing Z1 Personal names N1 Numbers F1 Food H2 Parts of buildings Z3 Other proper names H1 Architecture, houses and buildings M1 Moving, coming and going A9Giving N5.1+ Entire; maximum T1.3 Time: Period W3 Geographical terms

Frequency 1320 183 160 145 125 110 105 103 90 83 82 80 73 54 52 52 47 45 44 42 45

Comparison of the top 20 semantic tagsets Semantic tagset of Wmatrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Z5 Z2 Z8 H4 Z99 M6 M7 A3+ M1 Z1 H2 Z3 I2.1 O4.2+ S8+ A9I2.2 N1 N5.1+ H1 F1 A5.1+ T1.3

Grammatical bin Geographical names Pronouns Residence Unmatched Location and direction Places Existing Moving, coming and going Personal names Parts of buildings Other proper names Business: Generally Judgement of appearance: Beautiful Helping Giving Business: Selling Numbers Entire; maximum Architecture, houses and buildings Food Evaluation: Good Time: Period

Ranking in each corpus 3-star corpus 4-star corpus 1 1 2 2 3 3 4 5 5 4 6 6 7 7 8 8 9 9 10 13 11 10 12 16 13 18 14 11 15 17 16 20 17 15 18 12 19 N/A 20 N/A N/A 14 N/A 19 N/A N/A

24

W3

Geographical terms

N/A

N/A

5-star corpus 1 2 4 5 3 7 8 9 16 10 13 14 N/A 6 N/A 17 N/A 11 18 15 12 N/A 19

20

46

Comparing frequency lists in Wmatrix Reference corpus: British English 2006 (BE06): 929,862 words from published general written British English. It has the same sampling frame as the LOB and FLOB corpora Types of comparisons o 3-star corpus & BE06 o 4-star corpus & BE06 o 5-star corpus & BE06 o 3-star corpus & 5-star corpus o 4-star corpus & 5-star corpus 47

Screenshot of Wmatrix (comparing frequency lists)

48

Log-likelihood LL cut-off The default LL cut-off is 6.63. Changed it to 7 because ‘to be statistically significant you should look at items with a LL value over about 7, since 6.63 is the cut-off for 99% confidence of significance.’ (Wmatrix http://ucrel.lancs.ac.uk/llwizard.html )

49

Key semantic fields Study corpus

Reference corpus

3-star hotel corpus

British English 2006 (BE06): 929,862 words from published general written British English. It has the same sampling frame as the LOB and FLOB corpora.

4-star hotel corpus 5-star hotel corpus

BE06 BE06

3-star hotel corpus

5-star hotel corpus

4-star hotel corpus

5-star hotel corpus 50

Key semantic fields Study corpus

Reference corpus British English 2006 (BE06): 929,862 words from published general written British English. It has the same sampling frame as the LOB and FLOB corpora. BE06 BE06

51

Study corpus: 3-star corpus Reference corpus: BE06 (1/5) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Item H4 Z2 Z8 M7 H2 I2.1 O4.2+ A12+ K1 I2.2 H1 Z6 Z3 N3.3A9E3+ M3 Q2.1 P1 M6

Tagset Residence Geographical names Pronouns Places Parts of buildings Business: Generally Judgement of appearance: Beautiful Easy Entertainment generally Business: Selling Architecture, houses and buildings Negative Other proper names Distance: Near Giving Calm Vehicles and transport on land Speech: Communicative Education in general Location and direction

52

Study corpus: 3-star corpus Reference corpus: BE06 (2/5) 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Item Y2 A5.1+++ S1.2.1+ Q2.2 M4 S8+ A1.2+ M1 A7+ F1 N4 S4 A9 S6A13.3 F2 M8 X5.2+ N5 Q4.3

Tagset Information technology and computing Evaluation: Good Informal/Friendly Speech acts Sailing, swimming, etc. Helping Suitable Moving, coming and going Likely Food Linear order Kin Getting and giving; possession No obligation or necessity Degree: Boosters Drinks and alcohol Stationary Interested/excited/energetic Quantities The Media: TV, Radio and Cinema

53

Study corpus: 3-star corpus Reference corpus: BE06 (3/5) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Item G1.1 S9 C1 T2++ G2.1 E3N5.1+ N3.3+ W3 Q1.2 A9+ B3 Z4 B1 A1.8+ E2+ A2.1+ T2+ X2.4 I1.3-

Tagset Government Religion and the supernatural Arts and crafts Time: Beginning Law and order Violent/Angry Entire; maximum Distance: Far Geographical terms Paper documents and writing Getting and possession Medicines and medical treatment Discourse Bin Anatomy and physiology Inclusion Like Change Beginning Investigate, examine, test, search Cheap

54

Study corpus: 3-star corpus Reference corpus: BE06 (4/5) 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

Item Z5 S1.2.4+ Q3 S2.2 S7.1+ A2.2 O1.3T1.1.1 N5++ W5 A1.1.1 A1.1.2 A8 I1.1 E6X3.4 T2T1 S1.2.2A11.1+

Tagset Grammatical bin Polite Language, speech and grammar People: Male In power Cause&Effect/Connection Gasless Time: Past Quantities: many/much Green issues General actions / making Damaging and destroying Seem Money and pay Worry Sensory: Sight Time: Ending Time Generous Important

55

Study corpus: 3-star corpus Reference corpus: BE06 (5/5) 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97

Item O4.6+++ I1.2 B5 N1 O2 S3.1 M5 A1.7+ W2 A13.1 X7+ A4.2+ S8Q1.1 A6.1T1.3A6.1+

Tagset Temperature: Hot / on fire Money: Debts Clothes and personal belongings Numbers Objects generally Personal relationship: General Flying and aircraft Constraint Light Degree: Non-specific Wanted Detailed Hindering Linguistic Actions, States And Processes; Communication Comparing: Different Time period: short Comparing: Similar

56

Study corpus: 4-star corpus Reference corpus: BE06 (1/5) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Item H4 Z2 O4.2+ H2 Z8 M7 A5.1+++ I2.2 I2.1 F1 Z6 M4 S1.2.1+ Q2.2 Z3 Q2.1 F2 K1 E2+ S8+

Tagset Residence Geographical names Judgement of appearance: Beautiful Parts of buildings Pronouns Places Evaluation: Good Business: Selling Business: Generally Food Negative Sailing, swimming, etc. Informal/Friendly Speech acts Other proper names Speech: Communicative Drinks and alcohol Entertainment generally Like Helping

57

Study corpus: 4-star corpus Reference corpus: BE06 (2/5) 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Item H1 A12+ O4.2++ N5 A9C1 A13.3 E3+ Y2 A9+ M6 M1 A9 W3 S6X5.2+ X2.4 X3.4 Q1.2 P1

Tagset Architecture, houses and buildings Easy Judgement of appearance: Beautiful Quantities Giving Arts and crafts Degree: Boosters Calm Information technology and computing Getting and possession Location and direction Moving, coming and going Getting and giving; possession Geographical terms No obligation or necessity Interested/excited/energetic Investigate, examine, test, search Sensory: Sight Paper documents and writing Education in general

58

Study corpus: 4-star corpus Reference corpus: BE06 (3/5) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Item A1.2+ S1.1.3+ N4 A1.1.1 N3.3M8 M3 A3+ G1.1 N3.6 X5.2T3Z4 A7+ S9 S1.2.4+ Q3 A6.1S4 N5---

Tagset Suitable Participating Linear order General actions / making Distance: Near Stationary Vehicles and transport on land Existing Government Measurement: Area Uninterested/bored/unenergetic Time: New and young Discourse Bin Likely Religion and the supernatural Polite Language, speech and grammar Comparing: Different Kin Quantities: little

59

Study corpus: 4-star corpus Reference corpus: BE06 (4/5) 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

Item N6+ F3Z7 K2 A2.1+ B1 S2.1 G2.1 A1.8+ O2 I1.1 B5 H5 A1.1.2 A8 N3.3 T2++ Q1.1 Q4.3 Z5

Tagset Frequent Non-smoking / no use of drugs If Music and related activities Change Anatomy and physiology People: Female Law and order Inclusion Objects generally Money and pay Clothes and personal belongings Furniture and household fittings Damaging and destroying Seem Measurement: Distance Time: Beginning Linguistic Actions, States And Processes; Communication The Media: TV, Radio and Cinema Grammatical bin

60

Study corpus: 4-star corpus Reference corpus: BE06 (5/5) 81 82 83 84 85 86

Item N3.2+ I3.2 I1.2 A1.2 A2.2 G2.1-

Tagset Size: Big Work and employment: Professionalism Money: Debts Suitability Cause&Effect/Connection Crime

61

Study corpus: 5-star corpus Reference corpus: BE06 (1/4) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Item H4 O4.2+ Z2 Z8 M7 H2 F1 A5.1+++ M4 O4.2++ H1 Z6 Q2.2 Q2.1 W3 A7+ Z3 A9S1.1.3+ E3+

Tagset Residence Judgement of appearance: Beautiful Geographical names Pronouns Places Parts of buildings Food Evaluation: Good Sailing, swimming, etc. Judgement of appearance: Beautiful Architecture, houses and buildings Negative Speech acts Speech: Communicative Geographical terms Likely Other proper names Giving Participating Calm

62

Study corpus: 5-star corpus Reference corpus: BE06 (2/4) 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Item K1 C1 S1.2.1+ N5 A1.2+ I2.1 A1.1.1 N4 M6 A1.8+ P1 X5.2+ I3.1 Z4 A12+ A13.3 A9 A2.2 B2I2.2

Tagset Entertainment generally Arts and crafts Informal/Friendly Quantities Suitable Business: Generally General actions / making Linear order Location and direction Inclusion Education in general Interested/excited/energetic Work and employment: Generally Discourse Bin Easy Degree: Boosters Getting and giving; possession Cause&Effect/Connection Disease Business: Selling

63

Study corpus: 5-star corpus Reference corpus: BE06 (3/4) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Item M5 G2.1 N5.1+ X5.2G1.1 Q1.2 E3Z7 S4 F2 M8 O2 S9 X3.4 A2.1+ Z99 N3.6 T1 E2+ X2.2+++

Tagset Flying and aircraft Law and order Entire; maximum Uninterested/bored/unenergetic Government Paper documents and writing Violent/Angry If Kin Drinks and alcohol Stationary Objects generally Religion and the supernatural Sensory: Sight Change Unmatched Measurement: Area Time Like Knowledgeable

64

Study corpus: 5-star corpus Reference corpus: BE06 (4/4) 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

Item O1.1 L2 A3+ M2 A8 Y2 S8+ O4.3 S7.2+ Q4.1 I1 A13.2 S6+ N5+ I1.1 S1.2.4+

Tagset Substances and materials: Solid Living creatures: animals, birds, etc. Existing Putting, pulling, pushing, transporting Seem Information technology and computing Helping Colour and colour patterns Respected The Media: Books Money generally Degree: Maximizers Strong obligation or necessity Quantities: many/much Money and pay Polite

65

Top ten key semantic fields 3-star

4-star

5-star

Residence

Residence

Residence

Geographical names

Geographical names

Pronouns Places

Judgement of appearance: Beautiful Parts of buildings

Judgement of appearance: Beautiful Geographical names

Parts of buildings

Pronouns

Places

Business: Generally

Places

Parts of buildings

Judgement of appearance: Beautiful Easy

Evaluation: Good

Food

Business: Selling

Evaluation: Good

Entertainment generally

Business: Generally

Sailing, swimming, etc.

Business: Selling

Food

Judgement of appearance: 66 Beautiful

Pronouns

Study corpus: 3-star corpus Reference corpus: 5-star corpus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Item O4.2+ S1.1.3+ O4.2++ F1 I2.1 N3.3M1 A12+ N1 T2N3.3+ T1.1.1 I2.2 M3 X5.2I1 S6O4.6+ I3.1

Tagset Judgement of appearance: Beautiful Participating Judgement of appearance: Beautiful Food Business: Generally Distance: Near Moving, coming and going Easy Numbers Time: Ending Distance: Far Time: Past Business: Selling Vehicles and transport on land Uninterested/bored/unenergetic Money generally No obligation or necessity Temperature: Hot / on fire Work and employment: Generally 67

Study corpus: 4-star corpus Reference corpus: 5-star corpus Item

Tagset

1

I2.2

Business: Selling

2

A9+

Getting and possession

3

O4.2+

Judgement of appearance: Beautiful

4

I2.1

Business: Generally

5

M1

Moving, coming and going

68

Top ten key semantic fields: 3-star and 4star compared to 5-star 3-star

4-star

Judgement of appearance: Beautiful

Business: Selling

Participating

Getting and possession

Judgement of appearance: Beautiful

Judgement of appearance: Beautiful

Food

Business: Generally

Business: Generally

Moving, coming and going

Distance: Near

Moving, coming and going Easy Numbers Time: Ending

69

70 70

Two-word concgrams list of the 3-star corpus

71

Top 20 two-word concgrams of the 3-star corpus Two-word concgram 1 Kong Hong 2 Hong Kong 3 Hotel Hong 4 Hong Hotel 5 Kong Hotel 6 Hotel Kong 7 business hotel 8 business Hong 9 Hotel Our 10 Hong business 11 Hotel business 12 Hong Kowloon 13 Kowloon Hong 14 Our Hotel 15 Kong business 16 Kong Kowloon 17 Kowloon Kong 18 located Hong 19 Bay Causeway 20 business Kong Total no. of two-word concgrams

Frequency 222 209 101 100 97 94 31 30 28 26 26 25 25 25 24 24 24 23 22 22 30,607

72

Top 20 two-word concgrams of the 4-star corpus Two-word concgram 1 Kong Hong 2 Hong Kong 3 Hotel Hong 4 Hong hotel 5 Kong hotel 6 Hotel Kong 7 Bay Causeway 8 Causeway Bay 9 business Hong 10 rooms suites 11 suites rooms 12 Bay Hong 13 Causeway Hong 14 Hong Bay 15 Kong business 16 Kong Bay 17 Bay Kong 18 business Kong 19 Hong business 20 Hong Causeway Total no. of two-word concgrams

Frequency 168 151 75 74 72 67 29 28 21 21 21 20 20 19 19 19 18 18 18 18 27,775

73

Top 20 two-word concgrams of the 5-star corpus Two-word concgram 1 Kong Hong 2 Hong Kong 3 hotel Hong 4 hotel Kong 5 Hong hotel 6 Kong hotel 7 Kowloon Hong 8 Kowloon Kong 9 Harbour Plaza 10 Hong Kowloon 11 Kong Kowloon 12 Plaza Harbour 13 Room Harbour 14 Harbour room 15 Hong Place 16 Langham Hong 17 Place Hong 18 Place Kong 19 Grand Hong 20 Harbour Hong Total no. of two-word concgrams

Frequency 174 164 74 72 68 68 23 23 21 20 20 20 20 19 18 18 18 18 17 17 19,862

74

Shared two-word concgrams Hong/Kong Hong/hotel Kong/hotel

75

Unique two-word concgrams 3-star

4-star

5-star

Hong/business Kong/business Causeway/Bay business/hotel hotel/our Hong/Kowloon Kong/Kowloon Hong/located

Hong/business Kong/business Causeway/Bay Causeway/Hong rooms/suites Hong/Bay Kong/Bay

Hong/Kowloon Kong/Kowloon Harbour/Plaza Harbour/room Harboutr/Hong Hong/Place Kong/Place Hong/Grand Hong/Langham

76

Screenshot of three-word concgrams list of the 3-star corpus

77

Top 20 three-word concgrams of the 3-star corpus Three-word concgram 1 Hong Kong 2 Hotel Kong 3 Kong Hong 4 Kong Hotel 5 Hotel Hong 6 Hong Hotel 7 Kong business 8 Hong business 9 Kong Kowloon 10 Hong Express 11 Hong Kowloon 12 Kong Express 13 business Kong 14 Hong Kong 15 Hong Kong 16 Kong Island 17 Kong located 18 Kong Hong 19 Kong Hong 20 Kowloon Hong Total no. of three-word concgrams

Hotel Hong Hotel Hong Kong Kong Hong Kong Hong Kong Kong Hong Hong business Kowloon Hong Hong business Kowloon Kong

Frequency 100 99 95 92 91 87 30 29 29 28 28 28 27 26 25 25 25 24 24 24 355,996

78

Top 20 three-word concgrams of the 4-star corpus Three-word concgram 1 Kong hotel 2 Hong Kong 3 Hotel Kong 4 Kong Hong 5 Hong hotel 6 Hotel Hong 7 Bay Hong 8 business Kong 9 Causeway Hong 10 Kong Bay 11 Causeway Bay 12 Kong business 13 Kong shopping 14 Bay Causeway 15 Bay Kong 16 Bay Kong 17 Hong Kong 18 Kong Hong 19 Kong Hong 20 business Hong Total no. of three-word concgrams

Hong hotel Hong hotel Kong Kong Causeway Hong Bay Hong Hong Hong Hong Hong Causeway Hong Bay business Bay Kong

Frequency 77 74 74 72 70 67 21 21 21 21 20 20 20 19 19 19 19 19 19 18 305,509

79

Top 20 three-word concgrams of the 5-star corpus Three-word concgram 1 Kong hotel 2 Hong hotel 3 hotel Kong 4 hotel Hong 5 Hong Kong 6 Kong Hong 7 Kong Kowloon 8 Hong Langham 9 Hong star 10 Kong Langham 11 Hong Kowloon 12 Kowloon Kong 13 Kong Grand 14 Kong star 15 Kowloon Hong 16 Hong Grand 17 Hong Kong 18 Kong Airport 19 Kong Hong 20 Hong Hotels Total no. of three-word concgrams

Hong Kong Hong Kong hotel hotel Hong Kong Kong Hong Kong Hong Hong Hong Kong Kong Kowloon Hong Kowloon Kong

Frequency 80 79 72 69 68 68 25 24 24 24 23 23 22 22 22 20 20 20 20 19 210,406

80

Shared three-word concgrams

Hong/Kong/hotel

81

Unique three-word concgrams 3-star

4-star

5-star

Hong/Kong/business Hong/Kong/Kowloon Hong/Kong/Express Hong/Kong/Island Hong/Kong/located

Hong/Kong/business Hong/Causeway/Bay Kong/Causeway/Bay Hong/Kong/Bay Hong/Kong/shopping

Hong/Kong/Kowloon Hong/Kong/Langham Hong/Kong/star Hong/Kong/Grand Hong/Kong/Airport Hong/Kong/Hotels

82

Next step of data analysis Analysis of sample concordance lines of frequent concgrams: To describe the extended units of meaning of the concgrams by using Sinclair’s (1996, 2004) five categories of co-selection of meaning: 1. 2. 3. 4. 5.

Collocation Colligation Semantic preference Semantic prosody The core

83

30 concordance lines for Hong/Kong/hotel in the 3-star corpus

84

30 concordance lines for Hong/Kong/hotel in the 4-star corpus

85

30 concordance lines for Hong/Kong/hotel in the 5-star corpus

86

Conclusion To report on comparative findings of a corpusdriven study of the way in which hotels of different star categories in Hong Kong describe and promote the entities (hotel products and services) in the English introductory texts of the hotel homepages by examining the semantic fields (using Wmatrix) and phraseological patterns (using ConcGram) characteristic of the three corpora to have an informed understanding of the ‘aboutness’ of the groups of hotels. 87

References Sinclair, J. McH. (1996). The search for units of meaning. Textus, 9(1), 75-106. Sinclair, J. McH. (2004). Trust the text. Language, corpus and discourse. London: Routledge. Tognini-Bonelli, E. (2001). Corpus linguistics at work. Amsterdam; Philadelphia: John Benjamins.

88