Building on an existing D3.js timeline library and the principle of small multiples, we therefore visualised dispersion and duration: where in the lecture each.
Abstract: The Engineering Lecture Corpus - Visualising cross-cultural difference in discourse function Sian Alsop, Hilary Nesi, Joel Priestley ICAME 35: Corpus Linguistics, Context and Culture, University of Nottingham, UK; 04/2014 Available in Book of Abstracts http://www.nottingham.ac.uk/conference/facarts/english/icame-35/book-of-abstracts.aspx
Although powerful, corpus query tools are not the most helpful for certain kinds of investigation into discourse features. The questions that drive the creation of specialised corpora often necessitate the use of specific systems of annotation, which can benefit from specific visualisation techniques. This is certainly the case with the Engineering Lecture Corpus (ELC), a small specialised corpus of lectures from the UK, Malaysia, and New Zealand. The ELC is designed to investigate the discourse of lectures, and to address whether discourse differences exist in lectures delivered in different parts of the world, but in the same language medium (English) from the same discipline (Engineering) at the same level of study (undergraduate). The ELC lectures are annotated for pragmatic features (c.f. Simpson-Vlach and Leicher, 2006), through inline XML annotation. Chunks of text within the transcriptions have been identified as elements (or discourse functions): explaining, housekeeping, humour, storytelling, and summarising. Subcategories are attributed to some; summarising, for example, identifies two types of preview and two types of review. One useful way of identifying and analysing patterns within ELC data is through a dashboard view across categories and cultural sub-corpora. We know that "[t]he power of the unaided mind is highly overrated" (Norman 1993: 43) and that computational corpus techniques can reveal certain patterns unseen by the naked eye (Flowerdew 2013: 161). Yet that the culture of visualisation has not been widely explored by corpus linguists (Rayson et al. 2009). We wanted to see the ELC annotation in a way that addresses the central research question of cultural similarity/difference. Building on an existing D3.js timeline library and the principle of small multiples, we therefore visualised dispersion and duration: where in the lecture each pragmatic function occurs and how many tokens it contains. The javascript uses normalised start and end indices to plot each category along the x axis, which can be displayed according to variables of element or subcorpus. This allows us to build a picture of what is happening and where in selected lectures, viewing the corpus in terms of any combination of pragmatic category or cultural component. Each lecture is presented in vertical alignment to facilitate comparison. This overview will also be interactive, allowing full text to be accessed via visual representation. Early findings suggest that significant differences occur in lectures crossculturally. The narrative storytelling and bawdy and ironic humour types, for example, occur significantly more frequently in the UK lectures compared to those from Malaysia, whereas selfdeprecating humour is most heavily used in the New Zealand component. Summaries of previous lectures cluster strongly towards the beginning of the speech events in general. Patterns of co-occurring summary types are also evident: previews of current lecture content are commonly immediately followed by previews of future content in the subcorpora from both New Zealand and the UK. By visualising our annotation system in this way, we notice that indications of crosscultural difference and some phasal structuring (cf. Young 1994) are beginning to emerge. References Flowerdew, J. (2013) Discourse in English Language Education. Routledge: New York Norman, D. A. (1993). Things That Make Us Smart: Defending Human Attributes in the Age of the Machine. Reading, MA: Addison Wesley Rayson, P. and Mariani, J. (2009) "Visualising corpus linguistics". In (Eds.) Mahlberg, M., González-Díaz, V. and Smith, C. Proceedings of the Corpus Linguistics Conference (CL2009), University of Liverpool UK. Simpson-Vlach, R. and Leicher, S. (2006) The MICASE Handbook: A Resource for Users of the Michigan Corpus of Academic Spoken English. Ann Arbor: University of Michigan Press Young, L. (1994) "University lectures - macro-structure and micro-features". Academic Listening. (ed.) Flowerdew, J. Cambridge: Cambridge University Press, 159-176