Strategic and unpressured within-task planning and ... - SAGE Journals

0 downloads 0 Views 426KB Size Report
Abstract. This study investigated the comparative effects of strategic and unpressured within-task planning on second language (L2) Chinese oral production ...
684367 research-article2016

LTR0010.1177/1362168816684367Language Teaching ResearchLi and Fu

Article

Strategic and unpressured within-task planning and their associations with working memory

LANGUAGE TEACHING RESEARCH Language Teaching Research 1­–24 © The Author(s) 2016 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/1362168816684367 ltr.sagepub.com

Shaofeng Li and Mengxia Fu University of Auckland, New Zealand

Abstract This study investigated the comparative effects of strategic and unpressured within-task planning on second language (L2) Chinese oral production and the role of working memory in mediating the effects of the two types of planning. Twenty-nine L2 Chinese learners at a large New Zealand university performed a narrative task after watching a 6-minute silent movie, followed by an operation span test gauging the learners’ working memory capacity. The results revealed that (1) strategic planning enhanced fluency and unpressured within-task planning led to greater accuracy and syntactic complexity, (2) strategic planning facilitated the production of a syntactically transparent structure, while unpressured within-task planning showed an advantage for opaque, complex structures, and (3) working memory was drawn upon in unpressured within-task planning, but barely so in strategic planning. The data show that strategic planning benefits the Conceptualizer while unpressured within-task planning favors the Formulator. The data also suggest that the role of cognitive abilities in task performance is contingent upon the processing demands of different task conditions.

Keywords Second language acquisition, strategic planning, task-based language teaching and learning, within-task planning, working memory

I Introduction In the methodology of task-based instruction, whether to allow learners to plan the language and content for their speech performance is an option for implementing tasks in the classroom. In the research on task-based instruction, issues surrounding planning Corresponding author: Shaofeng Li, University of Auckland, Auckland, 1142, New Zealand Email: [email protected]

2

Language Teaching Research 

have been investigated for the purpose of verifying the postulated theoretical links between task-related variables and second language (L2) production (Robinson, 2011; Skehan, 2014). One active stream of research has centered on the impact of strategic (planning before performing a task) and unpressured within-task planning (planning while performing a task) on L2 speech production. However, the robustness of previous findings is compromised by methodological limitations such as lack of language specific measures and failure to control online planning. Furthermore, most studies have been conducted with learners of Indo-European languages, and to date there has been no published research on how the two types of planning affect L2 Chinese oral production. There has also been insufficient research on the role of working memory in mediating the effects of planning, which is somewhat surprising given the importance attached to working memory in Skehan’s (2014) Limited Capacity Model, which is the theoretical framework on which most planning research is premised. While there have been two studies (Ahmadian, 2012b; Gilabert & Munoz, 2010) examining the associations of working memory with within-task planning, no study has investigated how it fares in other planning conditions. This study seeks to contribute to task-based research by examining (1) whether strategic and unpressured within-task planning has differential effects on L2 Chinese oral production, and (2) whether the role of working memory is constrained by the different cognitive demands of the two planning conditions.

II Background 1  Theoretical background Levelt’s (1989) model of speaking has been drawn on as one of the most influential theoretical frameworks of language production. According to Levelt, the production of spoken language involves three stages: conceptualizing the message, formulating the language representation, and articulating the message. Levelt further points out that the speech module also consists of a Speech Comprehension System that ‘listens to’ one’s own internal speech (prior to articulation) and external speech (after articulation), monitors its accuracy and appropriacy, and performs self-repairs. Levelt’s theory, however, was proposed to account for the cognitive processes of speaking in one’s native language, and some of the claims do not perfectly fit into speech production in a second language (Kormos, 2006). For example, whereas formulation and articulation are automatic and parallel in first language (L1) speaking, retrieving the linguistic items that match the conceived message and transforming the phonetic plan into overt speech in a second language are often effortful and are likely to happen in more of a ‘serial and effortful’ (Skehan, 2014, p. 5) than simultaneous and automatic manner. Whereas Levelt’s theory concerns the process of speech production, Skehan’s Limited Capacity Model (2014) sheds more light on the product. In Skehan’s framework, speech performance is divided into three aspects: complexity (the willingness to take risks to use more advanced language), accuracy (the ability to use target-like language), and fluency (the extent to which one speaks without undue pauses and hesitations). Skehan argued that due to their limited cognitive resources, it is difficult for speakers, especially L2 speakers who have limited linguistic resources, to focus on all three aspects. Consequently

Li and Fu

3

a better performance in one area is achieved at the sacrifice of another. One way to alleviate learners’ cognitive burden and enhance their task performance is to allow them some time to plan their speech. Ellis (2009) identified two broad types of planning: pre-task planning and within-task planning. Pre-task planning can be further divided into rehearsal planning where learners are given the chance to rehearse before the main task, and strategic planning which allows learners to plan the language and content without rehearsing. Within-task planning occurs when learners are allowed to perform a task without time constraint, and therefore within-task planning can be restricted by imposing a time limit on the task performer. Clearly the occurrence of within-task planning stands in a continuum rather than dichotomy, and human speech, be it in one’s native or second language, always involves a certain degree of online planning. This led Ellis (2009) to make a further distinction between pressured and unpressured within-task planning, with the former referring to online planning that happens during pressured performance and the latter to the kind of planning a learner engages in during performance without time pressure. What is the mechanism through which planning affects task performance? It has been argued that different types of planning may favor different speech processors in Levelt’s model, which in turn affect different aspects of speech performance (Ellis, 2009; Skehan, 2014). For example, during strategic planning, learners often prioritize meaning over form (Sagarun, 2005), so strategic planning eases the burden on the Conceptualizer, which may facilitate fluency. Strategic planning may also lead to more complex language because the messages conceptualized during the pretask stage must be matched linguistically in subsequent performance. It can also be hypothesized that during unpressured within-task planning, learners are able to take fuller advantage of the Formulator to retrieve and select advanced linguistic forms and the Speech Comprehension System to monitor the wellformedness of their production. Complexity and accuracy can thus be enhanced by unpressured within-task planning, but it may have detrimental effects on fluency due to the additional time it takes to encode the content and language and monitor their appropriacy. Skehan’s (2014, 2015) model posits a central role for working memory, a cognitive device with limited capacity that performs the dual function of information storage and processing. Working memory impacts on speech production through controlled processing, during which memory resources are drawn on to retrieve, store, and manipulate semantic and linguistic information (Payne & Whitney, 2002). In Levelt’s model (1989), the role of working memory is restricted to message construction, and formulation and articulation, which are ‘underground processes’ (p. 22), happen without awareness and are therefore not subject to conscious processing. However, as noted earlier, while it is perhaps true that formulation and articulation are automatic in L1 oral production, controlled processing appears to be central for lemma retrieval and articulation in L2 oral production, at least before learners reach a highly automatized stage of speech production, which suggests a more crucial role for working memory in L2 production (Kormos, 2006; Skehan, 2014). Although Skehan has repeatedly emphasized the importance of working memory in his Limited Capacity Model on various occasions, he used the concept of working memory primarily to account for the trade-off between different aspects of task performance. He has not made predictions on the differential effects of working memory under different

4

Language Teaching Research 

planning conditions, and to date there has been no research examining the interface between planning type and working memory. We argue that strategic planning may alleviate learners’ processing burden and reduce the influence of working memory on subsequent performance. Unpressured within task planning, however, may draw heavily on learners’ working memory resources because learners must utilize all speech processors simultaneously in real-time performance and because they have additional time to monitor their performance. Thus, instead of taking an absolute approach and viewing the role of working memory as fixed across learning conditions, it is necessary to consider its role in situ, that is, its associations with task performance are contingent upon the processing demands imposed on the learner (Robinson, 2011). In this study, we venture to explore these hypotheses with a view to contributing to the theory and pedagogy of task-based learning.

2  Planning and L2 speech performance a  Strategic planning. Strategic planning has been reported as facilitating lexical and structural complexity when (1) learners were allowed longer planning time (Mehnert, 1998), (2) guidance was provided (Foster & Skehan, 1996), (3) higher proficiency learners were involved (Kawauchi, 2005; Ortega, 1999), and (4) global complexity measures were used (Sasayama & Izumi, 2012). Nevertheless, the results relating to complexity are difficult to compare because the studies adopted different units of analysis (e.g. C-unit, T-unit and AS-unit) and the validity of some complexity measures (e.g. mean segmental type–token ratio or MSTTR) is uncertain (Silverman & Ratner, 2002). Previous studies showed inconsistent results regarding the effects of strategic planning on accuracy. It was reported that accuracy could be increased if the task was not cognitively demanding (Skehan & Foster, 1997), planning time was increased (Mehnert, 1998), the grammatical rule of the target linguistic structure was transparent (Ortega, 1999), or the learners’ language proficiency was low (Kawauchi, 2005). Some studies, however, failed to identify any differences between planners and non-planners (Sasayama & Izumi, 2012). Noteworthy is the fact that most previous studies provided little or no information about whether learners’ unpressured within-task planning was restricted. Therefore, the effects of strategic planning could have been confounded by within-task planning. Unlike the conflicting results for complexity and accuracy, strategic planning has been consistently found to be beneficial in enhancing fluency irrespective of the length of planning time (Mehnert, 1998), learners’ language proficiency (Kawauchi, 2005), or the difficulty of the task. b  Strategic planning vs. unpressured within-task planning.  Compared with strategic planning, there has been limited research on the effects of unpressured within-task planning. Ahmadian and Tavakoli (2011) discovered that an unpressured, within-task planning group produced more accurate and complex language than non-planners, but the nonplanners were more fluent. These results were further confirmed in Ahmadian (2012a), where both guided and unguided unpressured within-task planners, who sacrificed fluency, produced more accurate and complex L2 speech than non-planners. Two studies have compared the effects of strategic and unpressured within-task planning on L2 oral production. Yuan and Ellis (2003) showed that both planning types had

Li and Fu

5

a positive influence on syntactic complexity in an oral narrative task. However, there was a trade-off effect between accuracy and fluency: Strategic planning favored fluency as well as lexical variability, while unpressured, within-task planning led to better accuracy. Nakakubo (2011) examined the effects of six combinations of three types of planning – pre-task planning, unpressured within-task planning, and trained within-task planning – on L2 Japanese oral performance. As in Yuan and Ellis (2003), imposing a time pressure increased fluency, but overall little between-group variation was found in the outcome measures, which was attributed to, among other factors, the learners’ high proficiency and the relative ease of the task. c Summary.  Overall there has been much research on strategic planning but very little on unpressured within-task planning and the comparative effects of strategic and unpressured within-task planning. In general, strategic planning showed a positive effect on fluency but mixed effects on accuracy or complexity, which suggests a possible trade-off between the two components and/or a possible influence of the idiosyncratic methods used in different studies, such as the variation in task type, outcome measure, and learner proficiency. The limited research on unpressured within-task planning shows that it enhanced both accuracy and complexity but not fluency, compared with no planning. In terms of the comparative effects of the two planning types, the findings are less clear. Only Yuan and Ellis’s (2003) study showed a somewhat meaningful pattern: Strategic planning led to higher fluency and lexical variety while unpressured within-task planning advantaged accuracy. Previous planning research showed the following issues. First, most studies on strategic planning failed to control online planning or did not report whether it was controlled, which may be partly responsible for the disparities between the findings of these studies. Second, most studies investigated learners of English as a second language (ESL), and there is a lack of research on L2 production in less commonly taught languages, such as Chinese, which is typologically different from Indo-European languages. More research is needed on other languages to verify previous findings and yield findings that can be generalized to other settings. Third, as reported by Norris and Ortega (2009), in previous research syntactic complexity has been measured primarily through subordination, and there has been a scarcity of global and phrasal measures. Another limitation concerns the overuse of general measures such as length of utterance or percentage of error-free clauses, and the inclusion of specific linguistic structures as indicators of structural sophistication has been rare. Some studies did include specific linguistic structures; for example, Mochizuki & Ortega (2008) counted the number of relative clauses as a proxy of complexity, and Nitta (2007) included the number of correctly used articles as a measure of accuracy. However, most of the studies only included one structure, and to date there has been no study that investigated multiple linguistic structures.

3  Working memory Baddeley (2007) proposed a componential model where working memory consists of a central executive and three slave systems: a phonological loop, a visuospatial sketchpad, and an episodic buffer. The central executive coordinates different components, controls attentional shifts between meaning and form, and between information retrieval and task

6

Language Teaching Research 

performance, and inhibits irrelevant information. The phonological loop is responsible for storing and rehearsing verbal information. The visuospatial sketchpad deals with visuospatial information such as images, shapes, and locations. The episodic buffer integrates information from the slave systems and long-term memory. Notwithstanding a large body of literature on working memory, there has been limited research on how working memory affects L2 oral production in task-based research. Two studies have investigated the mediating effects of working memory, measured via reading span tests, on L2 narrative production, both of which concern online planning. Ahmadian (2012b) found that working memory capacity was related to speech accuracy and fluency but not complexity in unpressured within-task planning. Gilabert and Munoz (2010) reported a study where ESL learners performed a narrative task with no pretask planning and no time limit for task performance. It was found that working memory was significantly correlated with fluency and lexical variety, but when the learners were split into high and low proficiency levels, working memory was only predictive of the highproficiency learners’ lexical variety. However, in Gilabert and Munoz (2010), planning was not a variable, and the learners simply performed the task without time constraint without being encouraged to engage in online planning, so the findings were not completely attributable to unpressured within-task planning. Overall these two studies seem to indicate that the role of working memory is evident when learners perform an oral task without time pressure. However, to date there has been no research on the role of working memory in mediating the effects of strategic planning, and no study has investigated the interface between working memory and planning type, that is, whether the role of working memory varies as a function of the different cognitive demands of different planning conditions.

4 Research questions The review of the literature shows that research is needed that (1) controls online planning under the strategic planning condition, (2) adopts improved measures of oral production to increase the robustness of results, (3) examines non-Indo-European languages, and (4) investigates the differential relationships between working memory and oral production under different task conditions. This study is undertaken to address these issues by answering the following research questions: 1. Do strategic planning and unpressured within-task planning have differential effects on L2 Chinese oral production operationalized in terms of complexity, accuracy, and fluency? 2. Does working memory mediate L2 production differently under the two planning conditions?

III Methods 1 Participants The participants were 29 Chinese (Mandarin) majors at a New Zealand university, including 22 Year 2 and 7 Year 3 students (see Supplementary Information for the

Li and Fu

7

participants’ bio-data). They were 6 males and 23 females aged 17–31 (M = 21.3) with diverse L1s (17 English, 5 Korean, 2 Philippine, 1 Samoan, 1 Russian, 1 Vietnamese, 2 Cantonese). The students were assigned to a strategic planning group and an unpressured within-task planning group. To ensure comparability of oral and general proficiency between the two groups so that any difference in their narrative performance is only attributable to the difference in the type of planning they engaged in, scores on their final examination (see Supplementary Information), including a written test and two speaking tests, were used for group assignment. An independent t test showed no significant difference between the two groups of learners in their general proficiency t(27) = .23, p = .39, d = .09, and oral ability t(27) = .41, p = .85, d = .15. Aiming to gauge the learners’ vocabulary and grammar knowledge as well as their writing and reading abilities, the written test consisted of four sections: gap-filling, reading comprehension, translation, and writing. During each speaking test, the learners were required to speak about a topic randomly selected from a list of prompts provided by the teacher. The written and speaking tests, which had been developed by several experienced instructors, had been used for years and were believed to be accurate measures of the learners’ achievements. The written test took around two hours to complete, and the speaking tests lasted approximately 10 minutes in total.

2 Instruments a Task.  The participants were asked to retell the content of a 6-minute silent movie under two task conditions. The silent movie, called The Pear Story, was created by Chafe (1980) to elicit narrative language samples. It is full of foreground and background events and has proven to be a valid instrument for eliciting different Chinese linguistic features (e.g. classifiers and aspect markers) from L2 Chinese learners in a number of studies (e.g. Erbaugh, 2002). After watching the movie, participants in the strategic planning group were given 10 minutes (Mehnert, 1998) to plan what they would say about the video. In order to ensure participants’ attention was focused on planning within the 10 minutes, they were given a piece of paper to write notes, but in order to avoid scripted speech, they were not allowed to write down whole sentences. The notes were taken away before they started to tell the narrative. To limit rehearsal planning, the learners were told to plan by organizing the information and language that they might use in subsequence performance but not to plan aloud or rehearse the study. To restrict within-task planning during task performance, they were informed that the task must be completed within five minutes. To establish the time limit for the strategic planning group to complete the task, a small-scale pilot study was carried out with three Year 2 students who performed the task without time constraint, and it was found that they were able to complete the task within four minutes. Consequently, five minutes were allowed for the narrative performance to ensure that the learners had enough time to complete the task. One caveat is that allowing five minutes for the strategic planners may have provided them an opportunity for within-task planning. However, to restrict within-task planning, perhaps what is important is not how much time is allowed but whether the task is performed under pressure, which is enforced by imposing a time limit. In the literature on task planning, imposing

8

Language Teaching Research 

a time limit has been the only way to restrict online planning (Ahmadian, 2012a; Yuan & Ellis, 2003). Unlike the strategic planners, the unpressured within-task planners were required to start their movie description as soon as they finished viewing the video clip, and they were not allowed to plan before they started the narrative. However, in order to encourage online planning, they were informed that they could take as much time as they wanted to think about the content and language of the film while retelling the story. Other requirements for the within-task planning group were the same as those for the strategic planning group. b  The working memory test. To gauge the learners’ working memory capacity, the researchers developed an operation span test using DMDX (Forster & Forster, 2003), software that has been widely used in psychological research to record reaction time to visual and auditory stimuli. An operation span test was used (instead of a reading or listening span test where the processing component relates to sentence comprehension) because it was language neutral and was therefore appropriate for a group of learners with diverse L1 backgrounds. Operation span tests have been widely used in L1 (e.g. Unsworth et al., 2009) and L2 research (Goo, 2012; Yilmaz, 2013) and have been proven to be valid measures of working memory. During the test, each participant was presented with some simple math problems, each followed by an English letter (see the example below). The participant was asked to determine whether the solution for the math problem was correct and to remember the letter in the subsequent display following the math problem. Each math equation consisted of two operations and a solution. The first operation involved simple multiplication or division and the second a simple addition or subtraction. The math question in each item aimed to tap the processing component of working memory rather than measure the learners’ math ability and therefore involved very simple calculation. If the participant failed to respond to the math problem within five seconds, the software would show the letter automatically and that item was dropped from the analysis. The letter following each math equation stayed on the screen for one second before the next math problem was displayed. An example math–letter combination in the working memory test:

(3 × 6)

− 4 = 6 ? T

There were 12 sets of stimuli in the test, each of which was composed of a different number of math–letter combinations, ranging from 4 to 7 (referred to as ‘span size’). The 12 sets were evenly distributed among span sizes 4, 5, 6, and 7, that is, the stimuli consisted of three sets of each of the four span sizes. Altogether the test had 66 math–letter combinations. At the end of each set, there was a prompt on the screen asking the participant to write down all the letters in the sequence in which they appeared. Each participant’s working memory score consisted of three components: the number of correctly recalled letters, the number of correctly answered math problems, and the

Li and Fu

9

mean reaction time. Plausibility judgment (accuracy in the judgments about the math equations) and reaction time tap the processing component of working memory and letter recall indexes the storage component. Both processing and recall scores were included in the calculation because learners may trade off between processing and recall accuracy, that is, they sacrifice the speed and accuracy of processing to achieve better recall scores (Waters & Caplan, 1996). The scores based on the three measures were transformed into z scores (standardized scores for which the mean is 0 and standard deviation 1), which were averaged to obtain a composite score for each participant (Leeser, 2007). The Cronbach’s alpha – an index of internal reliability – for the three components of the working memory test, was .98, .72, and .82 for reaction time, letter recall, and plausibility judgment respectively.

3 Procedure During the study, each participant (in the following sequence) (1) completed a background questionnaire, (2) performed the narrative task under the strategic or unpressured within-task planning condition, and (3) took the working memory test. The study lasted about 1.5 hours for each participant and was carried out in a laboratory setting.

4 Analyses a  The AS-unit.  The AS-unit was adopted as the coding unit for the learners’ narrative performances. Foster, Tonkyn, and Wigglesworth (2000, p. 365) defined an AS-unit in English as consisting of ‘an independent clause, or sub-clausal unit, together with any subordinate clause(s) associated with either.’ This unit is applicable to the Chinese language, where syntactic units can be segmented into dependent and independent clauses in similar ways as in English. However, due to the obvious differences between the two languages, the way AS-units were identified in this study had to be adapted. First, because tense in Chinese is mainly encoded through time expressions rather than morphological inflections (Li & Thompson, 1981), the distinction between finite and nonfinite verbs was not considered in identifying a dependent or independent clause. A second unique feature of Chinese that affects the coding of the AS-unit is the topic– comment chain structure (Li, 2005), where a topic, which could be a noun or a noun phrase, is followed by a comment sentence. Thus there are two subjects in the topic– comment chain: a topic subject and a comment subject. In this study, the subject in the comment sentence was considered as a further explanation of the topic, and the topic and the nearest comment sentence were coded as one AS-unit. b  Complexity, accuracy, and fluency (CAF) measures.  Eighteen measures of complexity, accuracy, and fluency (CAF) were utilized to gauge the learners’ task performance (Appendix 1). The decision to include multiple measures for each of the three aspects was made in response to Norris and Ortega’s call (2009) for an integrated view of language production by utilizing an assortment of measures that tap different dimensions of the construct and that reflect the evolving nature of L2 development. Therefore, the multitude of CAF measures included in this study, whose details are provided below,

10

Language Teaching Research 

represents an effort to strive for methodological rigor and testifies to the importance of multidimensional measurement in task-based research. a Complexity.  Complexity was investigated through lexical and structural measures. Drawing on Skehan (2014), we included two types of lexical complexity indexes: lexical diversity and lexical sophistication. 1. Lexical diversity: D value, an index ‘that is calculated through a series of type– token ratio (TTR) samplings and curve fittings’ (McCarthy & Jarvis, 2007, p. 460), was used as a measure of lexical diversity because of its greater strength for describing lexical characteristics than TTR or mean segmental type–token ratio (MSTTR) (Silverman & Ratner, 2002). 2. Lexical sophistication: This was measured through the percentage of words in HSK II. The HSK (hànyǔ shuǐpíng kǎoshì in Chinese Pinyin) is a validated test that has been recognized as one of the most influential tests of L2 Chinese proficiency (further information is available at http://www.chinesetest.cn). The HSK syllabus, which is composed of six levels, provides a list of target vocabulary for each level of the test. In this study, initially the percentage of words in HSK I were computed, but the results showed no difference between the strategic and unpressured within-task planners. When the words in HSK II were counted, however, it was found that the strategic planners used more sophisticated words than the unpressured within-task planners. As a result, the lexical sophistication of each participant was calculated by dividing the number of words from HSK II by the total number of words of the speech sample. Structural complexity was gauged through the following six measures, among which (1) concerns phrasal complexity, (2) taps clausal complexity, (3) assesses overall complexity, and (4), (5) and (6) are language specific grammatical features. 1. Mean length of clauses: the number of characters of an oral narrative divided by the total number of clauses. 2. Percentage of subordination: the number of subordinate clauses divided by the total number of clauses. 3. Mean length of AS-units: the number of characters of a speech sample divided by the total number of AS-units. 4. Use of classifiers: the number of classifiers divided by the number of effective characters, which is the total number of characters minus repair disfluencies such as false starts, repetition, etc. A classifier is a morpheme that is used between a determiner and a count noun. Its function is to categorize objects that share similar physical properties and to mark countability. Classifiers are challenging for L2 learners, and the difficulty is due to (1) the large number of classifiers the language has, (2) the anomalies in the semantic correspondence between a classifier and its accompanying noun, and (3) for speakers of non-classifier languages such as English, the simple fact that a classifier must be used between a determiner and a count noun.

Li and Fu

11

5. Use of the aspect marker le: the frequency of the aspect marker le divided by the number of effective characters. The aspect marker le is grammatically demanding for L2 Chinese learners partly because it has two different variants (Li & Thompson, 1981): the verbal le that encodes completion and the sentence-final le that marks inchoativity or change of state of affairs. 6. Use of the ba construction: the number of times the ba construction was used divided by the total number of characters. The ba construction, which appears in the template S ba O V, is deviant from the canonical SVO structure. Syntactically ba serves as a case marker that licenses the movement of a direct object to the preverbal position. Semantically it encodes how a person/matter/ object is dealt with/handled/disposed of, and it is used with events that are bounded or telic (i.e. with an endpoint). The acquisition sequence for the ba construction has been found to be positively correlated with learners’ general L2 proficiency (Wen, 2006). As can be seen, in addition to the general measures validated in previous research, this study included three specific linguistic structures (classifiers, le, and ba), with a view to examining the interface between planning type and the nature of the linguistic structure. The classifier is largely item-based and the related form–meaning mapping is transparent, while the aspect marker le and the case marker ba involve complicated linguistic projections and opaque form–meaning mapping. The three structures are different types of morphemes posited to be activated at different stages of speech production. MyersScotton and Jake (2000) distinguished content and system morphemes. Content morphemes such as nouns, most verbs, and descriptive adjectives are activated by ‘the semantic and pragmatic content of the speaker’s intentions’ (Wei, 2000, p. 106), while system morphemes such as determiners, inflectional affixes, and auxiliary verbs are activated to fulfill certain grammatical requirements of the language and to represent the relationships between content morphemes at higher levels of the hierarchy of a linguistic projection. Among system morphemes, some (e.g. the plural of a noun) are activated together with content morphemes at the conceptual level to encode the semantic and pragmatic content of the speaker’s intention, and these are called early system morphemes. Some morphemes are not called upon until later stages where they connect smaller units into larger constituents, and these are called late system morphemes. For example, the possessive of as in the legs of the table connects two noun phrases into a larger linguistic unit, and the morpheme is not activated until the two noun phrases are ready for further linguistic projections. Content and early system morphemes, according to Wei (2002), are activated in the Conceptualizer of the speech production module, while late system morphemes are activated in the Formulator. In Mandarin Chinese, classifiers constitute an early system morpheme because, like the plural form of a noun in English, a classifier is an inherent feature of a count noun and is activated in the Conceptualizer together with the content or concept the noun represents. In contrast, the aspect marker le and the case marker ba are late system morphemes because they are not inseparable from, or exclusive to, particular content morphemes and are ‘structurally assigned at the formulator level to satisfy grammatical requirements’ (Wei, 2000, p. 111). Strategic and unpressured within-task

12

Language Teaching Research 

planning may have differential effects on the production of the three structures depending on whether they favor the conceptualizer or the formulator. b Accuracy.  One global and three specific measures were selected for accuracy. The global measure is operationalized as the percentage of error-free clauses, which refers to the number of error-free clauses divided by the total number of clauses. Errors refer to the incorrect use of lexical items and grammatical structures. The target-like use of classifiers, the aspect marker le, and the ba construction served as accuracy measures of language specific structures. Accuracy for the three specific structures was calculated by means of dividing the number of correct cases for each of the three structures by the total number of attempts at using each structure. Recall that the learners’ attempts at using these three structures (regardless of whether they were accurate), or the mere incidence of the structures, were used as measures of complexity. c Fluency.  Skehan (2014) distinguished three broad categories of fluency measures: (1) breakdowns in the flow of speech such as pauses, (2) speech rate, and (3) repairs in the flow of speech such as reformulations, false starts, and so on, which can be collectively called ‘dysfluencies’. Among the following measures of fluency, two (1 and 2) relate to speech breakdown, three to speed (3, 4, and 5), and one to repair disfluency (6). 1. Number of pauses: number of pauses per minute of speech. 2. Mean length of pauses: the total length of pauses divided by the total number of pauses. 3. Speech rate A: number of syllables per minute, which equals the total number of Chinese syllables divided by the total number of seconds used to complete the task. Because each Chinese character consists of one syllable, the number of Chinese syllables is equal to the number of Chinese characters. 4. Speech rate B: number of syllables per minute. Self-repairs (repetitions and reformulations) and metacognitive/self-regulatory comments (e.g. ‘I don’t know how to say “drop” ’) were excluded. 5. Articulation rate: rate of speech excluding length of pauses, which is the total number of produced syllables divided by the total number of seconds for speaking. 6. Number of repair disfluencies per minute: the total number of repair disfluent words divided by the total number of seconds for completing the task. Following Foster and Skehan (1996), disfluencies were defined as reformulations, false starts, replacements, repetitions, and L1 use. c  Statistical analysis.  Given the large number of dependent variables, a multivariate analysis of variance (MANOVA) rather than Independent Samples T-test was conducted for each of the three sets of measures of complexity, accuracy, and fluency, respectively, to minimize Type I error and take account of the correlations between the outcome measures. Given that each dependent variable only involved two groups and one comparison, the results of the follow-up univariate F tests were used to determine whether significant group differences existed on each outcome measure, and the alpha

Li and Fu

13

level was not adjusted. Cohen’s d was calculated to show the magnitude of group difference. Cohen’s d is primarily based on mean difference and thus overcomes the limitation of the sole reliance on the results of null-hypothesis significance testing, which is sensitive to sample size and has low statistical power when the sample size is small. Based on the standards proposed by Cohen (1988), an effect size of .20 was considered small, .50 was interpreted as medium, and .80 was taken as a large effect. Pearson’s correlation analyses were conducted to identify the relationships between working memory and the learners’ CAF performance. Correlation coefficients were interpreted not only on the basis of the related p values, but also by examining the overall pattern and the magnitudes of the correlations. According to Cohen’s criteria for interpreting the magnitude of a correlation coefficient, .10 is considered small, .30 medium, and .50 large. d  Interrater reliability.  To establish interrater reliability, a native speaker of Chinese who was getting a master’s degree in Applied Linguistics was trained to code the data. Eight samples (four from each group), which made up 27.6% of the data, were randomly selected for coding. The interrater reliability was calculated, and the agreement rate reached 88.97%. Disparities and controversies were discussed and resolved, and the solutions were applied to the rest of the data.

IV Results 1  Evidence for task validity In response to Révész’s (2014) call to provide evidence for task validity, we start the results section by examining whether the independent variable ‘planning type’ was successfully manipulated such that task performers in the two groups displayed the expected behaviors. Our data showed that (1) the mean length of pauses of the unpressured withintask planning group was significantly larger than that of the strategic planning group, t(28)= 3.58, p = .01, d = 1.32, and (2) the average length of the speech data (in seconds) for the unpressured planners (M = 430.79, SD = 222.12) was significantly larger than that for the strategic planners (M = 280.47, SD = 73.58), t(28)= 2.48, p = .00, d = −0.92. These results constitute evidence for successful manipulation of task planning, that is, the within-task planners indeed engaged in more online planning and spent more time on task in comparison with the strategic planners.

II  Research question 1: Does planning type make a difference in L2 oral production? a Complexity. The descriptive and inferential statistics about lexical and structural complexity appear in Table 1. It can be observed that in comparison with the strategic planning group, the unpressured within-task planning group showed a greater variety of vocabulary (higher D value), produced more subordinate clauses and longer AS-units, and used the aspect marker le and the ba construction more frequently. However, the strategic planning group used more sophisticated vocabulary, longer clauses, and a

14

Language Teaching Research 

Table 1.  Descriptive and inferential statistics for complexity. Complexity measures

Lexical Structural

D valuea HSKb Clausesc Subordination AS-unitsd Classifierse lef bag

Strategic planning

Unpressured within-task planning

MANOVA

n

Mean

SD

n

Mean

SD

F

p

d#

15 15 15 15 15 15 15 15

25.73 23.53 7.97 22.23 10.35 6.26 .74 2.20

9.55 6.63 .81 8.08 1.47 1.99 .67 3.26

14 14 14 14 14 14 14 14

27.62 21.23 7.62 28.54 10.83 4.47 1.13 2.97

8.17 4.59 .83 8.67 2.00 .88 1.37 3.31

.33 1.17 1.37 4.12 .54 9.55 .81 .39

.57 .28 .25 .05* .47 .01* .37 .53

−.21 .40 .43 −.75 −.28 1.15 −.37 −.23

Note. *p < .05; #effect size = Cohen’s d; a. lexical variety; b. lexical sophistication; c. mean length of clause; d. mean length of AS-units; e. use of classifiers; f. use of the aspect marker le; g. use of the ba construction.

higher percentage of classifiers. The MANOVA showed that overall there was a significant difference between the two groups on the measures of complexity, F(8, 20) = 3.54, p = .01. The post hoc univariate analysis showed that the unpressured within-task planners used a significantly larger number of subordinate clauses than the strategic planners, F(1, 29) = 4.12, p = .05; the effect size was close to large, d = −.75. The strategic planners used a significantly larger number of classifiers than the unpressured withintask planners, F(1, 29) = 9.55, p = .01, and the difference was associated with a large effect size, d = 1.15. b Accuracy. Table 2 displays the results for accuracy. The descriptive statistics showed that the strategic planners demonstrated a slightly higher accuracy rate in using classifiers and the aspect marker le than the unpressured within-task planners, while the latter showed a greater number of error-free clauses and more accurate use of the ba construction. The MANOVA revealed a significant effect for group when all measures of accuracy were considered, F(4, 24) = 4.85, p = .01. The follow-up univariate analysis showed a significant difference between the two groups in terms of the general accuracy measure, namely percentage of error-free clauses, in favor of the within-task planners, F(1, 28) = 8.91, p = .01, and a large effect size was observed for the difference, d = −1.10. c Fluency. The results for fluency are presented in Table 3. The descriptive statistics revealed that the strategic planning group showed a greater number of pauses, higher speech rate A, higher speech rate B, and more repair dysfluencies, while the unpressured within-task planners demonstrated a longer mean length of pauses and higher articulation rate. The MANOVA revealed an overall significant group effect for all fluency measures, F(6, 22) = 2.64, p = .04, and the follow-up univariate analysis demonstrated that the within-task planners produced significantly longer pauses than the strategic planners: F(1, 28) = 12.85, p = .00, and a large effect size was found for the difference, d = −1.32.

15

Li and Fu Table 2.  Descriptive and inferential statistics for accuracy. Accuracy measures

Strategic planning

Error-free clausesa Correctly used classifiersb Correctly used lec Correctly used bad

Unpressured within-task planning

MANOVA

n

Mean

SD

n

Mean

SD

F

p

d#

15 15 15 15

26.77 90.13 66.67 20.22

10.53 8.50 44.99 35.73

14 14 14 14

43.09 86.07 49.80 32.14

18.18 10.04 46.24 46.44

8.91 1.38 .99 3.84

.01* .25 .33 .06

−1.10 .44 .37 −.29

Note. *p < .05; #effect size = Cohen’s d; a. percentage of error-free clauses; b. percentage of correctly used classifiers; c. percentage of correctly used le; d. percentage of correctly used ba.

Table 3.  Descriptive and inferential statistics for fluency. Fluency measures

Number of pauses Mean length of pauses Speech rate Aa Speech rate Bb Articulation ratec Number of repair disfluencies per minute

Strategic planning

Unpressured within-task planning

MANOVA

n

Mean

SD

n

Mean

SD

F

p

d#

15 15 15 15 15 15

7.06 3.19 78.05 67.52 50.73 16.18

2.80 .47 31.40 27.73 14.01 11.07

14 14 14 14 14 14

6.44 4.12 65.05 54.68 53.15 13.30

2.12 .88 27.72 25.91 10.19 4.46

.44 12.85 1.39 1.65 .28 .82

.51 .00* .25 .21 .60 .37

.25 −1.32 .43 .48 −.20 .34

Note. *p < .05; #effect size = Cohen’s d; a. speech rate A (number of syllables/minute); b. speech rate B (number of effective syllables/minute); c. articulation rate (rate of speech excluding length of pauses).

III  Research question 2: Is working memory correlated with the two planning conditions differently? Under the unpressured within-task planning condition (Appendix 2), significant positive correlations were found between the learners’ working memory capacity and two accuracy measures: percentage of error-free clauses, r = .55, p = .04, and correct use of classifiers, r = .61, p = .02. In terms of fluency, significant positive correlations were found for speech rate A (number of syllables per minute), r = .62, p = .02, and speech rate B (number of effective syllables per minute), r = .63, p = .02. A significant negative correlation was found for the mean length of pauses r = −.57, p = .04, which means that learners with higher working memory abilities paused less. No significant correlations were found for complexity measures. These results suggest that under the unpressured within-task planning condition, working memory was predictive of the accuracy and fluency, but not complexity, of the learners’ narrative performance. The results for the strategic planning group (Appendix 3) revealed that out of the 18 correlation coefficients, only one was significant: A positive correlation was found

16

Language Teaching Research 

between the learners’ working memory capacity and their articulation rate, r = .62, p = .01. Overall the results seem to suggest that the mediating effects of working memory were marginal under the strategic planning condition. Also, the results showed one clear pattern: most of the correlation coefficients were in the negative direction for strategic planning, which stood in contrast with the overall positive correlations for unpressured within-task planning. The disparate results point to a possible differential relationship between working memory and task performance under the two planning conditions.

V Discussion 1  Planning and CAF The first research question concerns the effects of strategic planning and unpressured within-task planning on the complexity, accuracy and fluency of the learners’ oral production. The results showed that the strategic planners produced a significantly larger number of classifiers and shorter pauses than the unpressured within-task planners. The unpressured within-task planners, however, produced more subordinate clauses and error-free clauses. In the following, the results are interpreted by drawing on the related theories and through comparison with previous findings. a Complexity. One striking finding is that unpressured within-task planning led to higher incidence of subordination than strategic planning. The only two studies (Nakakubo, 2011; Yuan & Ellis, 2003) that compared strategic and unpressured withintask planning showed inconsistent effects of the two types of planning on syntactic complexity. Nakakubo (2011) found that neither planning type influenced syntactic complexity, while Yuan and Ellis (2003) reported that compared with the non-planning group, both strategic and unpressured within-task planning had positive effects on syntactic complexity but the difference between these two groups was not significant. Our study, however, showed that the unpressured within-task planners produced a significantly greater number of subordinate clauses than the strategic planners. The disparity in the findings could have been caused by the symbiotic effects of variables relating to task procedures (e.g. whether planning is allowed) and variables relating to information manipulation (e.g. whether a task involves information integration). The two types of variables are referred to as resource-dispersing and resource-directing variables respectively in Robinson’s Cognition Hypothesis (2011). Among the tasks utilized in the three related studies including our study, the one in Nakakubo’s study (2011), where no effect was found on syntactic complexity, contained the least information and was completed in less than two minutes (81.34 and 116.31 seconds by pressured and unpressured within-task planners respectively). Yuan and Ellis (2003) used a task that took longer to complete: 189.29 seconds for the strategic group and 243.57 seconds for the unpressured group. They found that both planning groups showed greater syntactic complexity compared with the non-planning group but there was no significant difference between the two planning groups. The task used in this study took the longest time to complete − 280.47 seconds for the strategic planners and 430.79 seconds for the

Li and Fu

17

unpressured planners – and we found that unpressured within-task planning was more powerful than strategic planning in eliciting the production of subordinate clauses. We would like to make two arguments here. First, it would seem that the effects of planning on syntactic complexity are more evident when the task contains more information and imposes more processing demands on the task performer, who must synthesize the large amount of information and make sense of the connections between the various events in order to make his/her speech more meaningful. When the demand for information manipulation is limited, as in Nakakubo (2011), there are limited opportunities for encoding the information with complex linguistic structures and consequently the effect of planning does not surface. Second, unpressured within-task planning has an advantage over strategic planning in eliciting more complex sentences when the task is cognitively challenging because in the former condition, the learners are allowed to make fuller use of their Formulator to search for more complex structures and compose longer sentences. Under the strategic planning condition, however, due to the time pressure for task completion, learners may prioritize fluency over syntactic complexity and accuracy, a point to be revisited in later sections. Another interesting finding is that the strategic planners used a significantly greater number of classifiers than the unpressured within-task planners, but the latter used the aspect marker le and the case marker ba more frequently. A large positive effect size was found for the strategic planners’ classifier use while negative effect sizes were found for their use of le and ba. The finding that the two planning conditions had differential effects on the production of classifiers on one hand and le and ba on the other fits in well with the distinction between content and system morphemes and between early and late system morphemes (Wei, 2000). Recall that we argued that the classifier is an early system morpheme activated together with content morphemes in the Conceptualizer, and le and ba are late system morphemes which are not activated until the Formulator stage, where activated lemmas are organized into larger linguistic units. Therefore, the fact that strategic planning led to more use of classifiers suggests that this type of planning advantages the Conceptualizer, and the higher incidence of le and ba in the unpressured within-task planning condition constitutes proof that this type of planning is favorable to the Formulator. The conclusion is subject to the caveat that the differences in the production of le and ba are not statistically significant, which might be due to the relatively small sample size. b Accuracy. Similar to Yuan and Ellis (2003), this study found that the unpressured within-task planners produced a higher percentage of error-free clauses than the strategic planners. It would seem that unpressured within-task planning allowed more time for the learners to make use of the Formulator to search, select, retrieve, and immediately access the syntactic properties relating to the activated lemma to match the preverbal message conceived in the Conceptualizer. On the other hand, in the event that the learners failed to retrieve the correct forms to the satisfaction of the monitor (called the Speech Comprehension System in Levelt’s (1989) nomenclature), parts or the whole of the preverbal message were likely sent back from the Formulator to the Conceptualizer to be revised to make it better fit the available linguistic repertoire. Therefore the higher level of accuracy in the unpressured within-task planning was also due

18

Language Teaching Research 

to the fact that there was more communication and coordination between the different components of the speech generator. On the other hand, production was less accurate under the strategic planning condition mainly because it allowed more offline planning – to the advantage of the Conceptualizer – and less online planning – to the disadvantage of the Formulator, which is responsible for selecting linguistic forms for the generated message and is therefore critical for accuracy. Also, while the preverbal messages conceived during the pretask stage may serve as input to be taken advantage of by the Formulator during the subsequent oral performance, this may have had detrimental effects on the accuracy of production because the messages were so well-conceived during the pre-task stage and lacked the adaptability characterizing the preverbal messages improvised during the unpressured within-task planning. Consequently, message took precedence over form, which increased the chances for the occurrence of erroneous forms. Furthermore, due to the time pressure imposed on the strategic planners during the performance stage, the Formulator must have prioritized the conceptual information over the syntactic properties of the activated lemmas when developing a phonetic plan to be articulated, thus sacrificing accuracy for fluency and task completion. c Fluency.  The strategic planners spoke more fluently than the unpressured within-task planners when fluency was measured through the mean length of pauses, and the effect size was the largest among all significant effects. Previous studies found that in comparison with no planning, strategic planning led to greater fluency (Foster & Skehan, 1996; Mehnert, 1998) but unpressured within-task planning resulted in less fluency (Ahmadian, 2012a; Ahmadian & Tavakoli, 2011). Yuan and Ellis (2003) compared strategic and unpressured within-task planning in one study and found that strategic planners spoke more fluently than unpressured within-task planners in terms of pruned speech rate. The greater fluency for the strategic planners is likely attributable to two factors: (1) pretask planning mitigated the burden on the Conceptualizer and smoothed formulation and articulation, and (2) the imposed time pressure prompted the learners to accelerate their speech. In contrast, the within-task planners had to organize both content and language, and they also took advantage of the time available to access more advanced language and monitor the accuracy of the language and content, hence sacrificing fluency for accuracy and complexity.

2 Planning and working memory The second research question asks whether working memory showed differential associations with the learners’ task performance under the two planning conditions because of the different processing demands imposed on the learners’ cognitive resources. The answer is positive. It was found that whereas working memory was significantly correlated with the narrative production of the unpressured within-task planners on two accuracy measures and three fluency measures, it did not show significant correlations with the strategic planners’ performance (except one significant correlation for fluency). All the significant correlation coefficients were above .5, which are considered strong based on Cohen’s (1988) benchmarks (note that the benchmarks for d and r are different).

Li and Fu

19

Furthermore, the correlation coefficients for the within-task planning group were overwhelmingly positive while those for the strategic planning group were mostly negative. Overall the results attest to the importance of adopting a dynamic view when considering the functions of working memory, that is, its role may be activated or inhibited depending on the properties of a particular task or condition. a Working memory and unpressured within-task planning.  First and foremost, the positive role of working memory in affecting the learners’ narrative performance in unpressured within-task planning can be ascribed to the match between the functions of this cognitive device and the processing demands of the task condition. Primarily, working memory is a cognitive system responsible for the temporary storage and processing of information in real-time, online processing. These functions are crucial to speech production, which requires storing and holding retrieved information in an accessible state, passing on the information to the Formulator which selects the appropriate lemma, and creating a phonetic plan to be articulated. These processes were made possible in the untimed task condition where the learners were allowed additional time to draw on their working memory resources to plan what to say during task performance. Noteworthy is the finding that working memory was more saliently correlated with accuracy and fluency than complexity, which is consistent with Ahmadian (2012b). Working memory affects accuracy through attention control which coordinates the three processors of speech production and monitors the well-formedness of the output. The impact of working memory on fluency is attributable to its facilitative role in processing efficiency, which was measured through reaction time and veracity judgment in the operation span test. During the task performance, those with larger working memory capacities were faster in organizing ideas, accessing the linguistic forms, and articulating the phonetic plan, and therefore produced more fluent speech. While working memory affected accuracy and fluency because of the importance of attention control and speed of processing, it is difficult to explain why it did not mediate complexity. Ahmadian (2012b) argued that this was because working memory was not related to learners’ willingness to take risks, which underlies complexity. However, recall that Gilabert and Munoz (2010) found significant effects for working memory on lexical variety – a measure of complexity – and fluency, but not accuracy, although in their study within-task planning was not rigorously manipulated. The mixed findings suggest the need for more research on the relationship between working memory and linguistic complexity. b Working memory and strategic planning.  Whereas working memory correlated with the narrative performance of the unpressured within-task planners, it did not seem to be associated with the performance of the strategic planners. One contributing factor is perhaps the shifting of message conceptualization from the within-task phase to the pretask stage, thus relieving the burden for the Conceptualizer during task performance. This in turn eased the onerous processing load of working memory and marginalized its role in the learners’ oral production. Another equally, if not more, important factor is that the time pressure for the strategic planners limited their chances of using their working

20

Language Teaching Research 

memory to conduct controlled processing. In other words, the role of working memory capacity was inhibited or minimized in pressured production because the condition that triggered its effect did not exist.

VI Conclusions As the first attempt to investigate the effects of strategic planning and unpressured within-task planning on L2 Chinese oral production and the interaction between planning type and working memory, this study obtained some valuable findings. First, it was found that strategic planning favored the Conceptualizer while unpressured within-task planning facilitated the Formulator. The conclusion is supported by the following results: (1) unpressured within-task planners allocated more attention to form (complexity and accuracy) and strategic planners to meaning (fluency); (2) strategic planners used an early system morpheme (classifiers) more frequently while unpressured within-task planners used two late system morphemes (the aspect marker le and the case marker ba) more frequently. Second, the mediating effects of working memory were more prominent in unpressured within-task planning than in strategic planning. These findings have valuable theoretical and pedagogical implications. Theoretically, the study afforded empirical evidence for an interaction between planning type and the different processing components in Levelt’s speech model. The results pertaining to working memory help clarify the mechanism through which this cognitive variable affects L2 oral production. It was found that learners resorted to their working memory for resources required for conscious processing when planning their speech, and that its role may be activated or inhibited (and facilitative or deleterious) depending on the processing demands of the task (Robinson, 2011). Pedagogically, the results indicate that strategic planning is beneficial for the production of item-based and easy structures. Unpressured within-task planning, however, is ideal for eliciting complex, advanced structures. Furthermore, based on the results relating to working memory, it seems reasonable to argue that other things being equal, strategic planning is more suitable for learners with low working memory and unpressured within-task planning is favorable for those with high working memory. Funding The author(s) received no financial support for the research, authorship, and/or publication of this article.

References Ahmadian, M. (2012a). The effects of guided careful online planning on complexity, accuracy and fluency in intermediate EFL learners’ oral production: The case of English articles. Language Teaching Research, 16, 129–149. Ahmadian, M. (2012b). The relationship between working memory capacity and L2 oral performance under task-based careful online planning condition. TESOL Quarterly, 46, 165–175. Ahmadian, M., & Tavakoli, M. (2011). The effects of simultaneous use of careful online planning and task repetition on accuracy, fluency, and complexity of EFL learners’ oral production. Language Teaching Research, 15, 35–59.

Li and Fu

21

Ahmadian, M., & Tavakoli, M. (2013). Investigating what second language learners do and monitor under careful online planning conditions. The Canadian Modern Language Review, 70, 50–75. Baddeley, A. (2007). Working memory, thought, and action. Oxford: Oxford University Press. Chafe, W. (1980). The pear stories: Cognitive, cultural and linguistic aspects of narrative production. Norwood, NJ: Ablex. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. 2nd edition. Hillsdale, NJ: Lawrence Erlbaum. Ellis, R. (2009). The differential effects of three types of task planning on the fluency, complexity, and accuracy in L2 oral production. Applied Linguistics, 30, 474–509. Erbaugh, M. (1986). Taking stock: The development of Chinese noun classifiers historically and in young children. In C. Craig (Ed.), Noun classes and categorization (pp. 399–436). Philadelphia, PA: John Benjamins. Forster, K., & Forster, J. (2003). DMDX: A Windows display program with millisecond accuracy. Behavior Research Methods, Instruments, and Computers, 35, 116–124. Foster, P., & Skehan, P. (1996). The influence of planning and task type on second language performance. Studies in Second Language Acquisition, 18, 299–324. Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. Applied Linguistics, 21, 354–375. Gilabert, R., & Munoz, C. (2010). Differences in attainment and performance in a foreign language: The role of working memory capacity. The International Journal of English Studies, 10, 19–42. Goo, J. (2012). Corrective feedback and working memory capacity in interaction-driven L2 learning. Studies in Second Language Acquisition, 34, 445–474. Kawauchi, C. (2005). The effects of strategic planning on the oral narratives of learners with low and high intermediate proficiency. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 143–166). Philadelphia, PA: John Benjamins. Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Lawrence Erlbaum. Leeser, M. (2007). Learner-based factors in L2 reading comprehension and processing grammatical form: Topic familiarity and working memory. Language Learning, 57, 229–270. Levelt, W. (1989). Speaking: From intention to articulation. Cambridge, MA: The MIT Press. Li, W. (2005). Topic chains in Chinese: A discourse analysis and application in language teaching. Munich: Lincom Europa. Li, C., & Thompson, S. (1981). Mandarin Chinese: A functional reference grammar. Berkeley, CA: University of California Press. McCarthy, M. P., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24, 459–488. Mehnert, U. (1998). The effects of different lengths of time for planning on second language performance. Studies in Second Language Acquisition, 20, 83–108. Mochizuki, N., & Ortega, L. (2008). Balancing communication and grammar in beginning-level foreign language classrooms: A study of guided planning and relativization. Language Teaching Research, 12, 11–37. Myers-Scotton, C., & Jake, J. (2000). Testing the 4-M model: An introduction. International Journal of Bilingualism, 4, 1–8. Nakakubo, N. (2011). The effects of planning on second language oral performance in Japanese: Processes and production. Unpublished PhD dissertation, University of Iowa, Iowa, IA. Nitta, R. (2007). The focus-on-form effects of strategic and on-line planning: An analysis of Japanese oral performance and verbal reports. Ph.D. dissertation. University of Warwick, Coventry, UK.

22

Language Teaching Research 

Norris, J.M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30, 555–578. Ortega, L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language Acquisition, 21, 109–48. Payne, J.S., & Whitney, P.J. (2002). Developing L2 oral proficiency through synchronous CMC: Output, working memory and interlanguage development. CALICO Journal, 20, 7–32. Révész, A. (2014). Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes. Applied Linguistic, 35, 87–92. Robinson, P. (2011). Second language ask complexity, the Cognition Hypothesis, language learning, and performance. In P. Robinson (Ed.), Second language task complexity: Researching the Cognition Hypothesis of language learning and performance (pp. 3–38). Amsterdam: John Benjamins. Sangarun, J. (2005). The effects of focusing on meaning and form in strategic planning. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 111–141). Philadelphia, PA: John Benjamins. Sasayama, S., & Izumi, S. (2012). Effects of task complexity and pre-task planning on Japanese EFL learners’ oral production. In A. Shehadh & C. Coombe (Eds), Task-based language teaching in foreign language contexts. Research and implementation (pp. 23–42). Amsterdam: John Benjamins. Silverman, S., & Ratner, B.N. (2002). Measuring lexical diversity children who stutter: application of VOCD. Journal of Fluency Disorders, 27, 289–304. Skehan, P. (Ed.) (2014). Processing perspectives on task performance. Amsterdam: John Benjamins. Skehan, P. (2015). Working memory and second language performance: A commentary. In Z. Wen, M. Mota, & A. McNeil (Eds.), Working memory in second language acquisition and processing (pp. 189–200). Bristol: Multilingual Matters. Skehan, P., & Foster, P. (1997). Task type and task processing conditions as influences on foreign language performance. Language Teaching Research, 1, 1–27. Unsworth, N., Redick, T., Heitz, R., Broadway, J., & Engle, R. (2009). Complex working memory span tasks and higher-order cognition: A latent-variable analysis of the relationship between processing and storage. Memory, 17, 635–54. Waters, G., & Caplan, D. (1996). The measurement of verbal working memory capacity and its relation to reading comprehension. The Quarterly Journal of Experimental Psychology, 49A, 51–79. Wei, L. (2000). Unequal election of morphemes in adult second language acquisition. Applied Linguistics, 21, 106–140. Wei, L. (2002). The bilingual mental lexicon and speech production process. Brain and Language, 81, 691–707. Wen, X. (2006). Acquisition sequence of three constructions: An analysis of the interlanguage of learners of Chinese as a foreign language. Journal of the Chinese Language Teachers Association, 41, 89–111. Yilmaz, Y. (2013). Relative effects of explicit and implicit feedback: The role of working memory capacity and language analytic ability. Applied Linguistics, 34, 344–368. Yuan, F., & Ellis, R. (2003). The effects of pre-task and on-line planning on fluency, complexity and accuracy in L2 monologic oral production. Applied Linguistics, 24, 1–27.

23

Li and Fu Appendix 1.  Measures of complexity, accuracy, and fluency adopted in this study. Complexity

Lexical Structural

Accuracy

Fluency

1.  2.  1.  2.  3.  4.  5.  6.  1.  2.  3.  4.  1.  2.  3.  4.  5.  6. 

D value (lexical diversity) Percentage of words in HSK IIa or above (lexical sophistication) Mean length of clauses (phrasal structural complexity) Percentage of subordination (clausal structural complexity) Mean length of AS-units (overall structural complexity) Use of classifiers (grammatical complexity) Use of the aspect marker le (grammatical complexity) Use of the ba construction (grammatical complexity) Percentage of error-free clauses (global) Percentage of the correct use of classifiers (specific) Percentage of the correct use of the aspect marker le (specific) Percentage of the correct use of the ba construction (specific) Number of pauses Mean length of pauses Speech rate A (number of syllables/minute) Speech rate B (number of effective syllablesb/minute) Articulation rate (rate of speech excluding length of pauses) Number of repair disfluencies/minute

Note. a. HSK: Chinese proficiency test; b. number of effective syllables: the total number of syllables minus repair disfluencies such as false starts, repetition, etc.

Appendix 2.  Correlations between working memory and unpressured within-task planning. Measures

Complexity

Lexical Structural

Accuracy

Fluency

valuea

D HSKb Clausesc Subordination AS-unitsd Classifierse lef bag Error-free clausesh Correctly used classifiersi Correctly used lej Correctly used bak Number of pauses Mean length of pauses Speech rate Al Speech rate Bm Articulation raten Number of repair disfluencies per minute

Correlations r

p

.28 .47 .36 .23 .40 .06 .44 .17 .55* .61* .16 .17 −.30 −.57* .62* .63* −.51 .23

.33 .09 .21 .44 .16 .83 .12 .57 .04 .02 .59 .57 .29 .04 .02 .02 .06 .42

Note. *Significant at the 0.05 level; a. lexical variety; b. lexical sophistication; c. mean length of clause; d. mean length of ASunits; e. use of classifiers; f. use of the aspect marker le; g. use of the ba construction; h. percentage of error-free clauses; i. percentage of correctly used classifiers; j. percentage of correctly used le; k. percentage of correctly used ba; l. speech rate A (number of syllables/minute); m. speech rate B (number of effective syllables/minute); n. articulation rate (rate of speech excluding length of pauses).

24

Language Teaching Research 

Appendix 3.  Correlations between working memory and strategic planning. Measures   Complexity

Lexical Structural

Accuracy

Fluency

valuea

D HSKb Clausesc Subordination AS-unitsd Classifierse lef bag Error-free clausesh Correctly used classifiersi Correctly used lej Correctly used bak Number of pauses Mean length of pauses Speech rate Al Speech rate Bm Articulation raten Number of repair disfluencies per minute

Correlations r

p

.11 −.11 −.45 −.06 −.37 −.32 −.36 .01 −.13 −.22 .33 −.01 .12 .14 −.40 −.35 .62* −.30

.69 .70 .09 .84 .18 .25 .19 .98 .66 .44 .24 .99 .68 .61 .14 .20 .01 .28

Note. *Significant at the 0.05 level (two-tailed); a. lexical variety; b. lexical sophistication; c. mean length of clause; d. mean length of AS-units; e. use of classifiers; f. use of the aspect marker le; g. use of the ba construction; h. percentage of error-free clauses; i. percentage of correctly used classifiers; j. percentage of correctly used le; k. percentage of correctly used ba; l. speech rate A (number of syllables/minute); m. speech rate B (number of effective syllables/minute); n. articulation rate (rate of speech excluding length of pauses); o. number of repair disfluencies per minute.