The Role of Feedback Timing in Synchronous Computer

1 downloads 0 Views 309KB Size Report
differences between the two feedback groups in a grammaticality judgment test. All study materials and data are made available through Supporting Information ...
An Open for Replication Study: The Role of Feedback Timing in Synchronous ComputerMediated Communication Diana C. Arroyo and Yucel Yilmaz Indiana University

Correspondence concerning this article should be addressed to Yucel Yilmaz, Indiana University, Ballantine Hall 869, Bloomington, IN 47405-7103, United States. E-mail: [email protected]

This study investigated the role of corrective feedback timing in the acquisition of Spanish noun– adjective gender agreement. Forty-five learners completed a communicative task via synchronous computer-mediated communication (SCMC) in one of three groups (immediate, delayed, control). The immediate group received error reformulations immediately after their errors. The delayed group did not receive feedback; however, at the end of the task, they were provided with an electronic document showing the errors they had made during the task with error reformulations. The control group performed the task without receiving feedback. The immediate group outperformed the delayed group in an oral production test, but there were no differences between the two feedback groups in a grammaticality judgment test. All study materials and data are made available through Supporting Information online to encourage future replications.

1

Introduction Corrective feedback refers to the reactions that second language (L2) learners receive from their interlocutors, indicating that the learners’ language production is not targetlike. Many L2 acquisition researchers (e.g., Ellis, 1991; Gass & Mackey, 2006; Long, 1996) hold that evidence indicating what is not possible in the target language (i.e., negative evidence), as conveyed through corrective feedback, plays at least a facilitative role in L2 acquisition. Evidence supporting this position has been obtained in studies investigating learners’ immediate responses to feedback, such as repair (Sheen, 2004), and their performance on individualized posttest items (e.g., Loewen & Philp, 2006). Supporting evidence has also been found in experimental studies investigating the relative performance of feedback groups versus non-feedback groups exposed to models of the targeted structures (e.g., Leeman, 2003) or no-feedback control groups (e.g., Yilmaz, 2012). Previous research has also shown that various factors, such as feedback type (e.g., Ellis, Loewen, & Erlam, 2006), type of linguistic target (e.g., Yilmaz, 2012), cognitive individual differences (e.g., Yilmaz & Granena, 2016), task characteristics (e.g., Révész & Han, 2006), and communication mode (e.g., Yilmaz & Yuksel, 2011), change the level of effectiveness of feedback. Another possible moderating factor that has not attracted much attention to date is feedback timing. One can distinguish between at least two kinds of feedback with respect to timing: immediate and delayed. Immediate feedback is provided immediately after learners’ nontargetlike productions. Delayed feedback, on the other hand, is provided at the end of the task, the end of the lesson, or even several days after the lesson. An investigation into the role of timing in the effectiveness of feedback is pedagogically important, because it could provide an answer to the question of whether corrective feedback should be provided during a communicative task or be reserved for a later time. Although several

2

methodologists (e.g., Harmer, 2007) have advocated for the use of delayed feedback in communicative contexts to avoid interrupting the flow of interaction, research to date has mostly concentrated on immediate feedback and has largely ignored the question of whether delayed feedback can be just as effective. In addition, given the theoretical claims—for instance, those made in focus on form literature (e.g., Doughty, 2001)—regarding a link between certain feedback timing conditions and the effectiveness of various cognitive processes (e.g., cognitive comparison), research on feedback timing could help advance theory by shedding light on the cognitive processes that learners engage in when decoding corrective feedback under different timing conditions. The present study therefore aimed to investigate the impact of timing on the effectiveness of feedback provided on Spanish noun–adjective gender agreement errors that arise in text-based synchronous computer-mediated communication. An additional purpose of the current study was methodological. Recently, researchers have called for replication studies to advance scientific knowledge by establishing the generalizability of L2 research findings (e.g., Marsden, Mackey, & Plonsky, 2016; Porte, 2012). Replication research has been valued in other fields, such as psychology, regardless of its outcome. For example, Open Science Collaboration (2015) stated that “replication can increase certainty when findings are reproduced and promote innovation when they are not” (p. 943). Despite these benefits of replication, little replication research has taken place in the area of corrective feedback over the years. This could be related to the fact that neither the instruments used in nor the data produced by the original study are typically available to researchers for replication (Derrick, 2016). The present study thus aimed at promoting future replication efforts in this area by including the full set of data and the instruments used in the data collection, all available through Supporting Information online.

3

Background Literature Synchronous Computer-Mediated Communication Text-based synchronous computer-mediated communication (SCMC) can be defined as real-time communication between people using text-based instant messaging software. In this study, we investigated the effects of feedback timing in text-based SCMC because previous research had identified several pedagogical benefits of using text-based SCMC. For example, compared to faceto-face communication, (a) learners participate more in SCMC (e.g., Kern, 1995); (b) the amount of participation is more equally distributed across different learners (Kern; Warschauer, 1997); (c) learners produce more language (e.g., Chun, 1994), and the quality of their production is greater because they use a wider variety of discourse functions (Sotillo, 2000; Warschauer); and (d) SCMC increases motivation and decreases anxiety (e.g., Warschauer). It has been argued that even though communication in text-based SCMC takes place in the written mode, “the interactional features of the discourse are similar to an oral conversation” (González-Lloret, 2014, p. 289). Features of oral communication, such as short turns, real-time communication, informality of discourse, and many unnoticed grammar errors are also present in SCMC (e.g., Smith, 2003; Sotillo, 2000). Other features, such as longer processing time, visual saliency, and re-readability of messages set SCMC apart from face-to-face communication. It has been argued that these distinctive features of SCMC work as a cognitive amplifier (Warschauer, 1997), creating a suitable condition for learners to notice L2 forms, a process deemed facilitative of L2 acquisition (Schmidt, 2001). These theoretical claims have motivated researchers to explore the extent to which learners focus on the formal aspects of language in SCMC. Research focusing on learner-learner interaction has shown that learners do, in fact, pay attention to form in SCMC, as evidenced by instances in

4

which they correct their own or other learners’ errors, ask each other the meanings of words, and discuss metalinguistic rules (R. Blake, 2000; Iwasaki & Oliver, 2003; Yilmaz, 2011; Yilmaz & Granena, 2010). In addition, Ziegler’s (2016) meta-analysis has shown that SCMC leads to more L2 gains than face-to-face communication interaction, but the magnitude of this difference is small (d = .13). It has also been shown that even though SCMC takes place in the written mode, learners can apply the knowledge they gain through SCMC to activities carried out in the oral mode (C. Blake, 2009; Payne & Whitney, 2002; Yilmaz, 2012). This finding is attributed to the fact that language production through SCMC develops through the same cognitive mechanisms that underlie oral speech (e.g., C. Blake; Payne & Whitney). Reformulations Previous observational studies (e.g., Lyster & Ranta, 1997) investigating the occurrence of corrective feedback in language classrooms have identified various feedback types (e.g., metalinguistic clues, clarification requests, recasts). These feedback types have been classified into overarching categories based on the properties they share (e.g., implicit/explicit, prompts/reformulations). Reformulations, which are one such category, include feedback types that rephrase learners’ non-targetlike productions into a targetlike form (e.g., recast, explicit correction, metalinguistic correction). They are often contrasted with prompts, which do not provide the targetlike form. Reformulations can vary in implicitness. Those conveying metalinguistic information (e.g., metalinguistic corrections) and/or directly informing learners about the accuracy of their production (e.g., explicit correction) are considered explicit, whereas those that provide neither type of information are considered implicit. However, as applied to corrective feedback, the implicit/explicit distinction qualifies the information provided through corrective feedback, not the type of cognitive processes that corrective feedback engages. In

5

addition, some reformulations are considered immediate because they are provided immediately after the learner’s error, whereas others are considered delayed because they are provided at the end of a task or lesson. The following review mainly focuses on the literature that addresses implicit reformulations (immediate and delayed), as they are the feedback strategies used in the instructional treatment of this study. Recasts (hereafter, immediate reformulations) constitute a type of reformulation that is immediately contingent on learners’ off-target productions.1 Immediate reformulations have generally been classified as an implicit form of feedback (e.g., Yilmaz, 2016) and have received theoretical support from the Interaction Hypothesis (Long, 1996), which suggests that conversational interaction facilitates language acquisition by providing learners with linguistic data indicating what is possible (i.e., positive evidence) and what is not possible (i.e., negative evidence) in the target language. Researchers operating within the cognitive–interactionist research paradigm (e.g., Doughty, 2001; Gass & Mackey, 2006; Long, 2007) have argued that immediate reformulations constitute an ideal form of feedback because they are unobtrusive; that is, they provide negative evidence without compromising the meaning-based nature of the interaction, and they do so without being face threatening. Long (2007) has also noted that the negative evidence presented through immediate reformulations during conversational interaction could be effective because reformulations provide information about the L2 at the precise moment the learner needs it, and the immediate contingency of reformulations on the erroneous learner output allows learners to observe the contrast between the two forms. There is evidence from studies investigating native speaker–learner (e.g., GurzynskiWeiss & Baralt, 2015) or learner–learner (Yilmaz, 2011) interaction through SCMC showing that implicit immediate reformulations are the most frequent feedback type. However, the results

6

of the studies that investigated the effectiveness of such immediate reformulations were mixed. Sachs and Suh (2007) showed that both textually enhanced and textually unenhanced reformulations led to pretest-to-posttest improvement. Yilmaz (2012) and Yilmaz and Yuksel (2011) investigated the role of communication mode in the effectiveness of implicit reformulations and found that SCMC reformulations led to more gains than reformulations delivered in face-to-face interaction. Baralt (2013) examined the role of task complexity in addition to the role of communication mode in the effectiveness of immediate reformulations. She found that the SCMC group outperformed the control group regardless of task complexity. When feedback was provided during cognitively less complex tasks, SCMC reformulations were more effective than reformulations provided in face-to-face interaction, but when the feedback was provided during more complex tasks, SCMC reformations were less effective than those provided in face-to-face interaction. Other studies have reported negative results for immediate reformulations. Loewen and Erlam (2006) and Sauro (2009) investigated the relative effectiveness of implicit reformulations versus metalinguistic feedback. Loewen and Erlam found no differences between feedback types, and neither feedback type group outperformed the control group. Sauro’s results also showed no differences between feedback types, and only the metalinguistic feedback group, not the reformulation group, outperformed the control group. Much less attention has been paid to implicit delayed reformulations. Only one study (Bower & Kawaguchi, 2011) has documented the use of implicit delayed reformulations in SCMC contexts. Bower and Kawaguchi (2011) investigated the extent to which Japanese/English tandem partners provided immediate and delayed feedback for their partner’s errors that occurred during SCMC. Native speakers of Japanese and English, paired to learn their partner’s language while tutoring their own language, met online three times to perform

7

discussion tasks with each other. They were instructed to review the logs of their chat sessions and send feedback to their partners using e-mail within a week after the last chat session. The study showed that implicit delayed reformulations were one of the feedback strategies native speakers used when they provided feedback in a delayed fashion. However, the study did not provide any information regarding the effectiveness of delayed reformulations. To summarize, even though the research investigating delayed reformulations is admittedly limited, the existing evidence indicates that implicit delayed reformulations are relevant for online language learning because they are among the feedback strategies native speakers use to correct errors. Feedback Timing Even though L2 researchers have paid little attention to the effectiveness of delayed feedback, there is a pedagogical need to determine its effectiveness. Some teacher trainers (e.g., Harmer, 2007) have recommended using delayed feedback in oral activities that require learners to use language communicatively and fluently because they believe that immediate feedback can interrupt the flow of communication. However, it is possible that this type of interruption might be less detrimental in SCMC than in face-to-face communication because “in SCMC interlocutors are free to read or skip messages, or they may read them in an order different from that of not only composing but also posting” (Ortega, 2009, p. 228). In addition, as some researchers (Doughty, 2001; Long, 2007) argue, certain feedback forms, such as implicit reformulations, are less likely to interrupt the flow of the conversation since they do not provide metalinguistic information or require a response from the learner. Instead, a more compelling reason why delayed instead of immediate feedback might be preferred in SCMC is related to limitations in human resources. Some online language programs do not offer whole-group synchronous sessions, making it difficult for teachers to provide immediate feedback. Teachers’

8

only option to provide immediate feedback is to interact with students individually using chat software. However, these interactions are unlikely because of the time commitment that meeting with each student requires. In such contexts, teachers might prefer to pair students for assignments that ask them to interact with each other (Granena, 2016) and correct their errors later using the recorded chatscripts of the interaction. To determine whether delayed feedback can constitute an alternative to immediate feedback, however, it is necessary to empirically demonstrate the extent to which learners can benefit from delayed feedback. Although the L2 acquisition literature is rich with regard to theoretical perspectives that can be used to justify the provision of immediate or delayed feedback, probably the only theoretical framework that has made an explicit claim about the relationship between feedback timing and the effectiveness of feedback is the focus on form (FonF) perspective. According to this perspective, learners’ attention should be briefly drawn to formal elements of language when the need for it arises during a communicative or meaning-based activity (Long, 1991; Long & Robinson, 1998). The provision of corrective feedback has been viewed as one of the major ways of implementing FonF. Doughty (2001) has claimed that learning through feedback, especially through implicit reformulations, occurs when learners are engaged in a special type of monitoring in which they mentally compare their intention, their output, and the input conveyed through the feedback in their working memory. Doughty has maintained that this process, called cognitive comparison, is subject to the limitation that the elements to be compared cognitively should be held in working memory long enough to enable a comparison. She has further argued that there is a cognitive window of opportunity during which this comparison can be made most effectively, which is somewhere around 40 seconds, based on a review of empirical and theoretical literature on short-term and working memory. Doughty has also predicted that “with

9

regard to the timing of the information to be compared, the most efficient means to promoting cognitive comparison would seem to be provision of immediately contingent recasts [i.e., reformulations]” (p. 253). Doughty and Long (2003) have argued that the timing of feedback is also important in distance language teaching contexts stating that “effectiveness tends to diminish as distance between triggering event and feedback increases” (p. 65). To the best of our knowledge, no studies have investigated the effects of feedback timing on L2 acquisition in the context of text-based SCMC interaction. However, studies have been conducted in other contexts. For example, Shintani and Aubrey (2016) investigated timing effects using feedback that was provided during or after web-based writing tasks. English as a foreign language (EFL) learners carried out two writing tasks through the text-editing tool of Google Docs. The target form was the hypothetical conditional structure in English. The researchers provided feedback to the immediate and delayed groups by reformulating learners’ errors using the comment box function of Google Docs and marking the errors on the original document. The results showed that both experimental groups outperformed the control group in the immediate posttest, but only the immediate feedback group outperformed the control group in the delayed posttest. However, the writing task used in Shintani and Aubrey did not require the researchers to respond to or understand the content of the learner’s output. This lack of communicative interaction might have facilitated the recognition of the corrective function of the researchers’ messages because the researchers sent messages to the learner only with the purpose of correcting his or her error. Two studies have investigated the role of timing in contexts where feedback was provided through face-to-face communication (Li, Zhu, & Ellis, 2016; Quinn, 2014). In Li et al., EFL learners were asked to perform two narration tasks. The target form was the English past

10

passive. The immediate group received a prompt first (“the driver was arrest?”), followed by a reformulation (“the driver was arrested”) after each error. The delayed group received feedback at the end of the second task. First, they were reminded of their error with an explicit statement such as “You said ‘the driver wanted to run away, but he stopped by a policeman.’ Can you say it correctly?” Later, they were provided a reformulation (“he was stopped”). The study showed advantages for the immediate group on one of the outcome measures (i.e., grammaticality judgment test). Quinn investigated the effects of feedback timing on the acquisition of the English passive construction. All learners received a 10-minute mini lesson on the target structure and then participated in three 10-minute communicative tasks. The immediate group received feedback after each error, whereas the delayed group received feedback at the end of each task. The feedback was a hybrid form including both a prompt and a reformulation. In the delayed feedback stage, the researcher tried to elicit the same error learners made during their initial task performance by presenting the stimulus material that triggered the error and by providing a prompt (“Can you try to tell me about this one again?”). Next, he provided a reformulation, regardless of whether learners made the error that they had made during the task. An oral production test, an auditory grammaticality judgment test, and an error correction test were administered in the pretest, immediate posttest, and delayed posttest. The results showed no effect for either timing or feedback. However, the potential of the findings reported by Li et al. and Quinn to inform the role of feedback timing in text-based SCMC contexts is limited because, as previously discussed, text-based SCMC represents a distinct learning environment, with its unique features providing possible advantages for noticing and subsequent L2 acquisition. The Present Study

11

The above literature review shows that previous empirical investigations of the role of feedback timing (Li et al., 2016; Quinn, 2014; Shintani & Aubrey, 2016) have produced mixed findings. While Li et al. and Shintani and Aubrey produced positive results for immediate feedback, Quinn showed no difference between the conditions. In addition, for at least three reasons, it is unclear whether the previous research can inform the relative effectiveness of immediate and delayed reformulations provided in response to errors arising during text-based SCMC. First, in none of the previous studies did researchers rely on reformulations alone as their feedback strategy. Instead, they used hybrid feedback strategies including reformulations and some other feature. For example, in Shintani and Aubrey, in addition to the reformulations provided through the comment function of Google Docs, learners’ errors were marked using the text-editing function of the online program. In Li et al. and Quinn, the researchers provided prompts in addition to reformulations. It is possible that these hybrid feedback strategies are more salient than reformulations alone because there is some evidence indicating that reformulations accompanied by other features, such as metalinguistic comments (Sheen, 2004), are more effective than reformulations alone. Second, in some of the previous studies (e.g., Shintani & Aubrey), feedback was not provided during interaction. Feedback provided during a task that requires interaction between interlocutors might be more challenging to interpret than feedback provided during a task that does not require interaction. Finally, some of the previous studies (Li et al.; Quinn) were carried out in face-to-face communication contexts. It is not clear whether the results of these studies can be generalized to contexts where feedback is provided during or after text-based SCMC tasks because of the differences between face-to-face communication and SCMC with respect to the availability of processing time and the salience of input due to the permanence and modality (visual vs. auditory) of messages. As some (Yilmaz, 2012) but not all

12

(Baralt, 2013) previous research has shown, SCMC reformulations can be more salient than faceto-face reformulations, possibly due to the availability of more processing time and the permanence of the messages. All these methodological discrepancies and the mixed findings reported considerably decrease the value of previous research in estimating the role of feedback timing in the current research context. For this reason, new research investigating feedback timing in contexts where reformulations are provided in response to errors arising during textbased SCMC is necessary. Such an investigation will provide a suitable venue to test the prediction of FonF as it relates to the relative effectiveness of immediate and delayed reformulations. If immediate reformulations are found to be more effective than delayed reformulations, this would support Doughty’s (2001) and Doughty and Long’s (2003) prediction that implicit reformulations are most effective when they immediately follow learners’ errors. This research can also shed light on an important pedagogical issue. Given the worldwide increase in online language courses, instructors might be interested to know whether delaying feedback is a viable alternative to providing feedback immediately after errors. No previous study has investigated whether immediate and delayed reformulations would be differentially effective in response to errors that arise during a communicative task carried out via text-based SCMC. The present study thus aims to fill this gap by exploring the relative effects of immediate and delayed reformulations on the acquisition of Spanish noun–adjective gender agreement. Finally, since the current project is the first study to investigate this issue, future replications are necessary to assess the generalizability of its findings. We facilitate future work in this area by providing a detailed description of our methodology and by making our data and data collection tools available to researchers. The following research question was addressed: To what extent can feedback timing affect L2

13

learners’ development of Spanish noun–adjective gender agreement as measured by (a) an oral production test and (b) an untimed grammaticality judgment test? Method The current study followed a pretest–posttest–delayed posttest design. The independent variable was feedback timing, and the dependent variable was learners’ development in Spanish noun– adjective gender agreement as measured by oral production and grammaticality judgment tests. Participants were randomly assigned to one of three groups: immediate feedback (n = 15), delayed feedback (n = 15), and no-feedback control (n = 15). Participants Forty-five (37 women, 8 men) learners of Spanish whose L1 was English volunteered to participate in this study (Mage = 20.35 years, SD = 2.27, range = 18–28). All participants were undergraduate students who were attending or had attended the courses offered in the Spanish language program at a large Midwest university in the United States. The participants had taken an average of 2.53 semesters of Spanish classes (SD = 0.82) at a university level, and 17% reported having a beginner to intermediate level knowledge of a third language (e.g., Vietnamese, Italian, French, American Sign Language, German, or Hebrew). One of the questions in the background questionnaire asked learners to rate their ability to use the instant text-messaging software used in this study (i.e., Skype) on a scale of 1 to 5. An analysis of variance (ANOVA) comparing learners’ self-ratings across the three experimental conditions revealed no significant differences between the conditions, F(2, 42) = 0.03, p = .97, η2 = .002 (Mcontrol = 2.33, SD = 0.81; Mimmediate = 2.40, SD = 0.82; Mdelayed = 2.40, SD = 0.82). A video retell task asking learners to watch a 90-second video and retell its events in Spanish in one minute was administered as an independent measure of oral proficiency (see Appendix S1 in

14

the Supporting Information online for a link to the video, the scoring rubric, and interrater reliability information). A one-way ANOVA comparing learners’ proficiency across the three experimental groups (Mcontrol = 5.51, SD = 1.16; Mimmediate = 5.56, SD = 1.67; Mdelayed = 5.00, SD = 1.24) revealed no differences between the groups, F(2, 42) = 0.83, p = .443, η2 = .10, indicating that any possible effect for feedback timing cannot be attributed to proficiency differences between groups. Target Structure Spanish noun–adjective gender agreement was the target linguistic structure of the study. In Spanish, nouns can be either masculine (M) or feminine (F). In nouns with overt gender morphology, gender is marked with the ending –o for masculine nouns or –a for feminine nouns. Other nouns are not morphologically marked for gender (e.g., pared, “wall-F”). Gender is also marked on adjectives and determiners through agreement with the noun they modify. Agreement with adjectives can take place within the noun phrase (intraphrasal agreement, as illustrated in Example 1a) or between different phrases (interphrasal agreement, as shown in Example 1b).

Example 1. Spanish Noun–Adjective Gender Agreement a.

El carro bonito. The-M car-M beautiful-M “The beautiful car.” La casa bonita. The-F house-F beautiful-F “The beautiful house.”

b.

El libro es rojo

15

The-M book-M is red-M “The book is red.” La camisa es roja. The-F shirt-F is red-F “The shirt is red.”

Gender agreement was selected for the study because although it is highly frequent in the input, it is not easily acquired even by advanced leaners of Spanish, especially when learners’ L1 is not a gender-marking language (e.g., Granena, 2014). The acquisition of Spanish noun– adjective gender agreement can be a challenging task for the participants of the present study because their L1 is English, which is a non-gender-marking language. Gender agreement has also been considered problematic for L2 learners because it has low perceptual salience (as it is realized through bound unstressed morphemes) and lacks communicative value (FernándezGarcía, 1999; Granena; Leeman, 2003). It has been shown that noun–adjective gender agreement is particularly challenging because learners show less accuracy in agreement with adjectives than with determiners (e.g., Bruhn de Garavito & White, 2002). According to some theoretical positions (e.g., Hawkins & Chan, 1997), full acquisition of certain grammatical features, such as gender, is only possible after the critical period is passed if the same feature is instantiated in learners’ L1. Other theoretical positions (e.g., Schwartz & Sprouse, 1996) consider the acquisition of gender agreement possible regardless of whether the learners’ L1 lacks the feature. There is evidence supporting both positions. For example, Franceschina (2005) has shown that proficient late bilinguals of a L1 without gender are less accurate in producing gender agreement than those of a gendered L1, whereas White, Valenzuela, Kozlowska-Macgregor, and Leung

16

(2004) have demonstrated that learners can reach nativelike accuracy levels in gender agreement regardless of their L1. In addition, there is some evidence suggesting that several linguistic cues might affect learners’ accurate production of gender agreement. For example, Alarcón (2010) has reported that learners are more accurate in producing correct gender agreement with masculine, animate, and overtly marked nouns than with feminine, inanimate, and non-overtly marked nouns. Materials and Tasks Treatment Task Each learner and one of the researchers (hereafter, the experimenter) carried out a one-way information gap task for the treatment (see Appendix S2 for the treatment task). The learner and the experimenter met at a research lab, sat at different computers facing away from each other, and logged onto two different Skype accounts. It was explained to the learner that s/he would interact with the experimenter using the text chat feature of Skype to carry out a language task. Both interlocutors were provided with a Microsoft PowerPoint file including 12 pairs of pictures. The learner’s picture pairs differed from each other with respect to the physical characteristics (e.g., color, shape, size) of four objects (12 × 4 = 48 differences in total). These objects were missing from the experimenter’s pictures, as if they had been cut out. Instead of these objects, the experimenter saw empty boxes (in each picture) and, under each box, three alternative versions of the relevant object varying with respect to size, shape, or color (e.g., blue car, white car, yellow car). The learner was asked to describe the pictures in each slide while focusing on the differences in order to help the experimenter choose the correct version of the object. The target items were balanced for the morphological marking (overt or non-overt) and gender (masculine or feminine) of the noun.

17

Feedback Treatment Although all learners carried out the treatment task with the experimenter regardless of their group, their errors on the linguistic target were treated according to their group assignment. The immediate and delayed groups received partial reformulations of their erroneous productions, whereas the control group did not receive any feedback. The immediate feedback group received the reformulations of their non-targetlike productions as they were performing the treatment task. To ensure that the feedback was provided within the putative window of opportunity (40 seconds) and with relative consistency, the experimenter provided the feedback always in the turn following the learner’s error and as quickly as possible before the learner could start a new turn. Example 2 illustrates an episode containing immediate feedback (with an asterisk indicating an erroneous utterance and the source of the error marked in bold).

Example 2. Immediate Feedback During the Treatment Task Learner:

*En el imagen 1: La guitarra es amarillo in the-M picture 1: the-F guitar-F is yellow-M “In picture 1, the guitar is yellow.”

Experimenter: guitarra es amarilla Guitar-F is yellow-F “Guitar is yellow.” Experimenter: Qué más? what else “What else?” Learner:

Uno… el libro es negro one… the-M book-M is black-M “(In picture) one, the book is black.”

18

As can be seen in Example 2, the experimenter sent another message asking the learner to talk about the other parts of the picture immediately after the feedback. This was done to prevent learners from modifying their initial production in their turn after the feedback. Modified output, a discourse feature shown to be related to accurate noticing of L2 feedback in text-based SCMC (Gurzynski-Weiss & Baralt, 2015), was only natural for the immediate group, because when the delayed feedback group received feedback they were not interacting with the experimenter. For this reason, modified output opportunities were blocked during the interaction in order not to give an advantage to the immediate group. The delayed feedback group did not receive any feedback during the treatment task. Instead, the experimenter either asked them to talk about the other parts of the picture (Qué más? “What else?”) or directed them to the next picture (Siguiente imagen, “Next picture”), regardless of the accuracy of their productions. At the end of the task, learners were given a Microsoft Word document that included a list of error-reformulation pairs. The document was prepared by the experimenter during the task by copying and pasting learners’ non-targetlike productions and by adding the targetlike partial reformulation below the line including the error. Learners were given 5 minutes to read the document. Examples 3 and 4 illustrate how the non-targetlike productions of the delayed feedback group were treated during and after the treatment task.

Example 3. Errors in the Delayed Feedback Condition During the Treatment Learner:

*En dos, la guitarra es negro. in two, the-F guitar-F is black-M “In (picture) two, the guitar is black.”

Experimenter: Siguiente imagen

19

Next picture “Next picture.” Learner:

Imagen uno tiene un libro Picture one has a-M book-M “There is a book in picture one.”

Example 4. Delayed feedback after the treatment task Learner:

*Guitarra es negro guitar-F is black-M “Guitar is black.”

Experimenter: Guitarra es negra guitar-F is black-F “Guitar is black.”

As can be seen in Examples 3 and 4, the delayed reformulation was preceded by the repetition of the erroneous part of the learner’s original production. Even though a delayed reformulation that fully matched the immediate reformulation in Example 2 should only include the partial reformulation of the error without the error itself, such a reformulation would disadvantage the delayed group. It would be highly unlikely for learners to construe such delayed reformulations as negative evidence, as it would be difficult for learners to link the targetlike form to the error they had made during the task. These delayed reformulations would still be potentially useful for learners as positive evidence, but their value as a source of negative evidence (i.e., corrective feedback) would be questionable. Thus, to promote the interpretation of

20

the delayed reformulations as both negative and positive evidence rather than as positive evidence only, we presented the relevant error before each reformulation. Both the error and the reformulation were preceded with the usernames of the interlocutors chosen during the chat interaction to indicate that the error was the learner’s own error and the line below the error was the experimenter’s response to it. It should be noted that neither in the treatment task nor in the delayed feedback stage did the experimenter use any textual enhancement techniques such as bolding, italicization, or asterisks. Outcome Measures A grammaticality judgment test and an oral production test were used as outcome measures on the pretest, posttest I, and posttest II. The nouns in both measures were balanced for gender (masculine or feminine) and morphological marking (overt or non-overt). Half of the nouns were carried over from the treatment tasks (i.e., repeated items), and the other half were used only in the tests (i.e., novel items). The target test items in the grammaticality judgment test were balanced for grammaticality. Three different versions of each measure were created, and a different version was used at each time (pretest, posttest I, posttest II). The order of these versions was counterbalanced using a Latin square design within each experimental condition to control for possible order effects (see Appendix S3 and Appendix S4 for the testing materials). A spot-the-differences task was used as the oral production test. This task was designed to measure learners’ knowledge of the target form that could be deployed when their attention was primarily on meaning and when monitoring one’s speech was relatively less likely. Ellis (2015) has argued that a task requiring freely constructed responses in oral discourse “has high face validity for language pedagogy as it provides evidence that the instruction has had an effect on the kind of use required for communication” (p. 434). In the oral production test, learners

21

were shown sets of two pictures through a PowerPoint file and were instructed to describe how the object in one picture was different from the object in the other picture. To make monitoring less likely, learners were asked to respond as fast as they could, but a specific time limit was not imposed. There were 24 target items and 24 distractor items. The target items were objects that differed from each other with respect to their physical features (i.e., shape, color, size). A grammaticality judgment test was used to measure learners’ knowledge of the target form under conditions allowing learners to plan their responses and requiring them to focus primarily on the linguistic form. In this task, learners were given a sheet including a list of 48 sentences. Of these, 24 were target items and 24 were distractor items. Learners were instructed to decide whether each sentence was grammatically correct or incorrect in Spanish, and (if it was incorrect) to cross out the error and write the correction on the line below it. Procedure The study was conducted in a research lab over two sessions. The first session began with an object-naming task (see Appendix S5) and continued with the pretest, the treatment, and posttest I. Two measures were taken to prevent learners’ lack of vocabulary knowledge from interfering with their performance on the treatment and assessment tasks: (a) All of the vocabulary items that were used in the experiment were selected from learners’ textbooks; and (b) an untimed object-naming task was administered at the beginning of the first session. During the objectnaming task, learners were required to say the Spanish word corresponding to the object depicted in a PowerPoint slide and were provided with the correct vocabulary item in the written form regardless of the accuracy of their response. In order not to prime learners to overuse the default/masculine form of the adjectives during the experimental stage, the adjectives and determiners were excluded from this task. At the end of this task, learners proceeded to the

22

experimental stage and took the pretest. The time between pretest and posttest was controlled for all the groups. Pilot testing revealed that the maximum time spent on the treatment task in the immediate group was 35 minutes, compared to 30 minutes for the delayed and control groups. Based on this information and in order to ensure that learners had enough time to complete the task, the immediate group was given 35 minutes to complete the treatment task, whereas the delayed and control groups were given 30 minutes. The 5-minute difference in time allocated to complete the task between the groups was established as the time limit for the delayed group to read the error–feedback pairs at the end of the treatment task. Since the control group did not receive any feedback, a 5-minute waiting period was added to the end of their task before they took posttest I. Learners did not always need to use the time allocated for the treatment, in which case they were asked to wait until the end of the predetermined time period before they moved on to the next stage. Ten days after the first session, the experimenter met with learners again and administered posttest II and a background questionnaire. In each testing time, the oral production test was administered before the grammaticality judgment test. Data Analysis Learners’ audio-recorded oral responses to the oral production test items were first transcribed and then coded for obligatory contexts and the accuracy of noun–adjective gender agreement. Each occurrence of a morphologically marked adjective that appeared in the same utterance as the noun it modified was coded as an obligatory context. Correctly marked adjectives were coded 1, and incorrectly marked ones were coded 0. Coding reliability was established by having two native Spanish-speaking coders listen to and code 10% of the data. The percentage agreement between the two coders was 100% for the identification of obligatory contexts and 99.2% for the accuracy of noun–adjective gender agreement. An accuracy rate score was

23

calculated for each student by dividing the number of accurate responses by the number of obligatory contexts. Grammatical sentences in the grammaticality judgment tests that were accurately identified as correct received one credit. Ungrammatical sentences that were labeled incorrect and properly corrected also received one credit. When the learner labeled an ungrammatical sentence incorrect but corrected a feature other than noun–adjective gender agreement, no credit was given. Two separate accuracy rate scores, one for the grammatical items and one for the ungrammatical items, were calculated by dividing the sum of accurate responses by the total number of items in that category (grammatical or ungrammatical). Reliability coefficients were computed for the grammatical and ungrammatical items of each version of the grammaticality judgment test using Cronbach’s alpha (see Appendix S6 for the relevant data). The coefficients for the grammatical items were very low, indicating that these items did not discriminate well among learners (Version A = .07; Version B = .18; Version C = .14). Therefore, the grammatical item scores were not submitted to inferential statistical analyses. The coefficients for the ungrammatical items were acceptable (Version A = .71; Version B = .62; Version C = .67).2 The following statistical procedures were followed to determine the effect of feedback timing on learners’ development of Spanish noun–adjective gender agreement. First, a one-way ANOVA was computed on pretest scores in order to establish whether there were differences between the groups. Next, a mixed-design ANOVA was carried out, in which group served as a between-subjects factor and time served a within-subjects factor. To follow up on interaction effects, two gain scores, one for the difference between posttest I and the pretest (Gain 1) and one for the difference between posttest II and the pretest (Gain 2) were computed, and a one-way ANOVA was carried out using each gain score. The results of these analyses are reported

24

separately for each outcome measure below. The data used for the statistical analyses are available in Appendix S7. Results Before running any statistical tests, all data were checked for normality using the recommended ±1 range for skewness and kurtosis (Phakiti, 2014). The skewness values ranged from −.29 to .62, and the kurtosis values ranged from −1.39 to .68, indicating no severe departure from

normality. Therefore, we assumed a normal distribution and carried out parametric tests. To determine whether the two feedback groups were comparable regarding the amount of feedback they received, an independent-samples t test was carried out on the mean frequency of feedback instances (see Appendix S7 for the relevant data). The test revealed that the delayed group (M = 17.07, SD = 4.33) received a slightly higher number of feedback instances than the immediate group (M = 15.53, SD = 3.23), but this difference was not statistically significant, t(28) = −1.10, p = .281, d = .40, 95% CI [−.90, 1.74]. In addition, our strategy to prevent the immediate group from modifying their output was successful because none of the learners modified their output after receiving feedback. Oral Production Test Descriptive statistics for the oral production test scores appear in Table 1. The ANOVA carried out on the pretest scores did not reveal a statistical difference between the groups, F(2, 42) = 2.07, p = .14, η2 = .09. The mixed-design ANOVA showed that the effect for group was not significant, F(2, 42) = 1.73, p = .19, η2 = .08, but the effect for time, F(2, 84) = 19.57, p < .001, η2 = .27, and the interaction between time and group, F(2, 84) = 5.69, p < .001, η2 = .16, were significant. This significant interaction indicates that changes in scores over time varied among the groups. Two separate ANOVAs carried out on each gain score to follow up on this

25

interaction revealed significant group effects for Gain 1, F(2, 42) = 9.14, p = .001, η2 = .30, and for Gain 2, F(2, 42) = 7.71, p = .001, η2 = .27. Tukey’s post hoc multiple comparisons indicated that the immediate group significantly outperformed both the control group (Gain 1: Mdiff =.20, p = .001, d = 1.74, 95% CI [.86, 2.52]; Gain 2: Mdiff = .20, p = .002, d =1.38, 95% CI [.55, 2.13]) and the delayed group (Gain 1: Mdiff = .13, p = .022, d = .99, 95% CI [.21, 1.72]; Gain 2: Mdiff = .16, p = .015, d = 1.14, 95% CI [.34, 1.88]). There were no differences between the delayed and the control groups in either gain score (Gain 1: Mdiff = .07, p = .337, d = .52, 95% CI [−0.23, 1.23]; Gain 2: Mdiff = .04, p = .689, d = .28, 95% CI [−.45, .99]). < Place Table 1 near here> Grammaticality Judgment Test Table 2 presents the descriptive statistics for grammaticality judgment test scores using the ungrammatical items only (see Appendix S8 for the descriptive statistics for the grammatical items). The one-way ANOVA carried out on pretest scores on the ungrammatical items revealed no statistical difference between the groups, F(2, 42) = .01, p = .994, η2 < .001, indicating that learners’ knowledge of the target structure was similar at the start of the study. The mixed-design ANOVA showed significant effects for group, F(2, 42) = 6.90, p = .003, η2 = .25, for time, F(2, 84) = 33.91, p < .001, η2 = .39, and the interaction between time and group, F(2, 84) = 6.04, p < .001, η2 = .14. The significant interaction indicates that changes in scores over time varied among the groups. Two separate ANOVAs carried out to follow up on the interaction revealed significant group effects for Gain 1, F(2, 42) = 11.26, p < .001, η2 = .35, and for Gain 2, F(2, 42) = 8.19, p = .001, η2 = .28. Tukey’s post hoc multiple comparisons indicated that both the immediate group (Gain 1: Mdiff = .26, p < .001, d = 1.44, 95% CI [.60, 2.20]; Gain 2: Mdiff = .23, p = .002, d =1.45, 95% CI [.61, 2.21]) and the delayed group (Gain 1: Mdiff = .24, p = .001, d =

26

2.09, 95% CI [1.15, 2.91]; Gain 2: Mdiff = .21, p = .005, d = 1.37, 95% CI [.58, 2.13]) outperformed the control group. There were no differences between the immediate and delayed groups in either gain score (Gain 1: Mdiff = .02, p = .959, d = .06, 95% CI [−.66, .77]; Gain 2: Mdiff = .02, p = .932, d = .11, 95% CI [−.61, .82]). < Place Table 2 near here> Discussion This study set out to investigate the effect of timing on the effectiveness of corrective feedback. Our research question asked about the extent to which feedback timing could affect L2 learners’ development of Spanish noun–adjective gender agreement. To answer this question, we first analyzed oral production test scores. The gains on the oral production test ranged from .04 to .11 points for the delayed group, and from .20 to .24 points for the immediate group. The inferential statistics showed that there was an effect for feedback timing, because the immediate group improved significantly more than the control and delayed groups, both from pretest to posttest I and from pretest to posttest II, whereas the delayed and control groups did not differ from each other in terms of their gains. The magnitude of difference in gains between the immediate and the delayed group (Gain 1: d = .99; Gain 2: d = 1. 14) and the immediate and the control group (Gain 1: d = 1.74; Gain 2: d = 1.38) were always large, according to Plonsky and Oswald’s (2014), where .40 ≤ d < .70 implies a small effect, .70 ≤ d < 1.00 suggests a medium effect, and d ≥ 1.00 indicates a large effect. Next, we analyzed grammaticality judgment test scores. The gains on the grammaticality judgment test ranged from .24 to .26 points for the delayed group, and from .25 to .28 points for the immediate group. The inferential statistics revealed no effect for feedback timing because there were no significant differences between the gains of the two feedback timing conditions,

27

even though each feedback timing condition improved significantly more than the control group both from the pretest to posttest I and from the pretest to posttest II, with large differences between the gain scores for the immediate group (Gain 1: d = 1.44; Gain 2: d = 1. 45) and for the delayed group (Gain 1: d = 2.09; Gain 2: d = 1.37). The d values reported above, which indicate the magnitude of the differences between the feedback and control groups (ranging from 1.37 to 2.09), are larger than the d values reported in Li’s (2010) meta-analysis for implicit reformulations (d = .51) and for immediate posttests (d = .44). This discrepancy might be due to the difference in modality. Implicit reformulations were provided through face-to-face communication in most of the studies synthesized in Li’s study, whereas they were provided through text-based SCMC in this study. The d values reported in this study are also higher than the d value (1.13) reported in Ziegler’s (2016) meta-analysis, which indicates the magnitude of the difference between SCMC interaction and control groups. This difference might be related to the fact that a predetermined feedback type was provided on a specific target structure in this study, whereas neither the interactional feature nor the target structure was controlled in some of the studies (e.g., Payne & Whitney, 2002) synthesized in Ziegler (2016). However, caution must be taken when comparing the effect sizes obtained in this study to the effect sizes reported in Li (2010) and Ziegler (2016) because (a) unlike in the current study, no effect sizes for delayed posttests in Ziegler and no effect sizes for delayed feedback in both Ziegler and Li were reported, and (b) both Li’s and Ziegler’s effect sizes were calculated based on posttest scores, whereas those in the current study were calculated based on gain scores. The advantage found for the immediate feedback on the oral production test might be related to the memory benefits experienced by the immediate group when making a cognitive comparison. As suggested by Doughty (2001), a cognitive comparison would take place within a

28

cognitive window that is open for about 40 seconds after the feedback as long as the learner could hold a representation of the propositional message, his/her own non-targetlike utterance, and the utterance carrying the feedback. A post hoc analysis using the time stamps of the sent messages revealed that the immediate group received feedback with 12 seconds (SD = 10) of delay on average, whereas the delayed group received feedback with 14 minutes 55 seconds (SD = 09 minutes 59 seconds) of delay on average. This result then confirms Doughty’s prediction about the effectiveness of immediate reformulations. One might wonder, however, how this effect came about despite the fact that the two timing conditions were very similar in the presentation of error–feedback pairs. In both conditions, the error and feedback were in the written mode, adjacent, and available for rereading, which are conditions that are likely to facilitate a comparison through visual inspection and eliminate any need for working memory. However, the finding that the immediate group outperformed the delayed group and that the delayed group did not even outperform the control group indicates that taking advantage of feedback required more than a visual comparison between the two forms on the computer screen. In addition to, or instead of, a visual comparison, it is possible that learners compared the memory traces of their erroneous productions and the information (i.e., negative and positive evidence) they extracted from the feedback. If this was the case, the short delay between the error and feedback in the immediate condition might have allowed the memory traces of the erroneous production to stay active in learners’ working memory until they received the feedback. In the delayed feedback condition, however, the memory traces of errors, which might have been active right after the error was committed, may have decayed by the time learners received the delayed feedback, and seeing their errors again on a computer screen in a decontextualized way might not have been enough to activate the memory of having made the

29

error. In other words, learners may have been unable to link the error re-presented to them in the delayed feedback stage to the error they made during their task performance. Additionally, the advantage found for the immediate feedback over the delayed feedback on the oral production test might relate to the more favorable conditions for hypothesis testing in the immediate condition. Even though both experimental groups had equal opportunities to produce output during the treatment, there were differences between the groups as to whether the feedback could influence these opportunities. The immediate group, depending on when exactly they received the first feedback during the task, had some of their production opportunities before the feedback and some after the feedback. In these opportunities after the feedback, they could use the information provided in the feedback to form and test new hypotheses about how the target form works. The delayed feedback group, however, had all of their production opportunities before they received feedback, and thus could not put their newly formed hypotheses to the test through output production. The finding of this study with regard to the higher effectiveness of immediate feedback in comparison to delayed feedback on the oral production test is in line with the findings of two oral feedback studies (Li et al., 2016; Shintani & Aubrey, 2016) that also reported advantages for immediate feedback, but it diverges from the findings of Quinn (2014), who did not find a difference between immediate and delayed feedback. As mentioned above, previous studies differ methodologically from the current study with respect to communication mode, nature of the treatment tasks, and feedback type. These differences make it difficult to identify the factors that might have contributed to the mixed findings about the role of feedback timing. However, bearing this limitation in mind, it is important to mention a design feature that was shared by the studies that found an advantage for immediate feedback but not by the study that did not find

30

such an advantage. In the studies that found an advantage for immediate feedback (Li et al.; Shintani & Aubrey) including the current study, delayed feedback was given at the end of the treatment task, whereas in the study that did not find such an effect (Quinn), delayed feedback was given at the end of each of the three tasks administered during the treatment. According to Li et al., the provision of feedback between the tasks may have increased the effectiveness of the delayed feedback by serving as a pretask instruction for the subsequent task. Another finding of the study was that the difference between the feedback timing groups changed depending on the type of outcome used to measure learners’ development. This interaction between outcome type and feedback timing might be an indication that the groups’ ability to deploy the knowledge they gained from the treatment was determined by the testing conditions. Since the instructions on the grammaticality judgment test asked learners to consider the accuracy of the test items and to correct the ungrammatical sentences, the task conditions were conducive to directing focal attention toward the target forms and to consciously inspecting the rules by which the forms operate. The instructions in the oral production test, on the other hand, asked learners to describe the differences between two objects as fast as they could. These instructions might have created conditions more favorable for learners’ attention to be primarily on meaning and message creation and less favorable for them to reflect consciously on the target forms. These observations suggest that when task conditions favored a deliberate reflection on forms, both groups were able to deploy their knowledge. However, only the immediate group could deploy the knowledge they gained during the treatment when the task conditions favored a focus on meaning, making it difficult to reflect on forms consciously. Given that the delayed group outperformed the control group only on the grammaticality judgment test, not on the oral production test, it could be that delayed feedback has value for L2 pedagogy only to the extent

31

that performance on tasks that create favorable conditions for conscious reflection is representative of what leaners can do with the language in real-life tasks. However, we cannot make any definitive claims about whether or not the knowledge measured by our tests was conscious, because this aspect was not measured in the study. Limitations and Future Research In this study, we found that immediate feedback was more effective than delayed feedback on an oral production test, whereas feedback timing did not have an effect on learners’ performance on a grammaticality judgment test. The generalizability of these results is subject to certain limitations. For example, the scope of this study was limited in terms of linguistic target, feedback type, and communication mode. The linguistic target, Spanish noun–adjective gender agreement, is a manifestation of morphological dependency in which an adjective depends morphologically on the gender of the noun it modifies. In addition, gender agreement is communicatively redundant and perceptually nonsalient. Therefore, the results of this study may be more comfortably generalized to other communicatively redundant and perceptually nonsalient linguistic targets involving morphological dependency. Another limitation of the study was the short duration of the treatment. The treatment task was completed in 35 minutes in both experimental conditions. Future research using longer treatments needs to be conducted to confirm that feedback timing has an effect on L2 learning when feedback is provided for an extended period. It is also important to recognize the specific features of the feedback strategies used in this study. These strategies were implicit because they neither included metalinguistic terminology nor overtly indicated that there was an error in the learner’s production. They also did not prompt learners to self-repair their non-targetlike productions. It is possible that feedback

32

types that are relatively more explicit (e.g., explicit correction) or the ones that push learners to self-repair their non-targetlike productions (i.e., prompts) might produce different results. Another possible limitation of the study is related to how delayed feedback was operationalized. In this study, we first decided to operationalize immediate feedback through implicit immediate reformulations, and then tried to match the delayed feedback to immediate feedback as closely as possible. The selection of implicit immediate reformulations had implications for how the delayed feedback could be operationalized. Only those feedback strategies that are also implicit and, at the same time, providing input (positive evidence) could be chosen. Therefore, we had roughly two options to operationalize delayed feedback: (a) provide a list of correct forms without the errors associated with them, or (b) provide correct forms with the errors associated with them. We chose the second option under the rationale that the first option would not allow learners to take advantage of the delayed feedback as negative evidence and, in addition, it would give the groups unequal chances in the interpretation of the feedback. If, however, our assumption was wrong, and in fact it was possible to interpret a list of reformulations at the end of a task as negative evidence, then our results might have been confounded with this methodological decision. We have mentioned above that the present study is different from previous research in multiple ways (e.g., feedback characteristics, modality, and nature of treatment tasks) and that these methodological differences make it difficult to draw conclusions about the role of feedback timing. One way to resolve such comparability issues in the future is to promote the use of same materials and procedures across studies. By making our materials fully available and procedures transparent, we expect to contribute to the accumulation of data from studies that are methodologically interconnected. As discussed in the introduction, replication research is

33

necessary for the advancement of scientific knowledge. Replication of this study with different samples is needed to increase certainty in the results reported. Another reason to replicate this study is the above-mentioned methodological decision that concerned delayed feedback. In order to understand the consequences of this methodological decision, a future study can include two delayed feedback conditions, one providing a list of reformulations without providing the errors and another replicating the delayed feedback strategy used in this study. Researchers can also conceptually replicate our study by modifying the design in terms of several key features. For example, instead of implicit reformulations, researchers can use explicit reformulations that reject the error and introduce the reformulation directly (X is incorrect, you should say Y). Such a study can show the extent to which our results are generalizable to more explicit feedback types. It would be advisable for future research to include additional experimental procedures to determine whether learners processed the feedback in the way the researchers intended. For example, in the current study, even though the immediate feedback group received feedback immediately after their errors, it was not possible to know exactly when learners read the feedback. They might have read the messages at a different time from when the feedback appeared on the screen because the software they were using (i.e., Skype) allowed them to look back at previous parts of the interaction. One may argue, therefore, that the lack of enforced immediacy between the error and the time at which the learner read the feedback might have reduced the reliability of the operationalization of immediacy. To remedy this potential problem, future research can use chat software that has the capability of preventing learners from viewing previous parts of the interaction. In contexts where the use of such software is not possible, researchers could collect introspective data (e.g., think-aloud protocols) to shed light on the question of whether the lack of enforced immediacy threatens the validity of their results.

34

Conclusion To conclude, the present study revealed that delayed feedback was not as effective as immediate feedback on a task (i.e., oral production test) requiring learners to be accurate while their primary attention was on meaning. Since gains on these types of tasks have been viewed as a better indicator of L2 acquisition (Doughty, 2003; Ellis, 2015), this result can be taken to mean that delayed feedback may not be a good alternative to immediate feedback in settings where textbased SCMC is used. However, given the practical importance of delayed feedback for practitioners, due to the existence of contexts where the provision of immediate feedback is not feasible because of limited human resources, it is recommended that future research be undertaken to investigate the factors that increase the effectiveness of delayed feedback. For example, it might be possible to achieve such effectiveness if learners are provided with production opportunities after the feedback stage or if a more salient form of feedback is chosen (e.g., explicit correction or metalinguistic feedback). The investigation of the role of such factors in moderating the effectiveness of delayed feedback would eventually help us determine the conditions under which delayed feedback could be an alternative to immediate feedback. Final revised version accepted 3 April 2018 Notes 1 Since recasts are immediately contingent on learners’ errors by definition (Doughty, 2001; Long, 2007), we thought that it would be inaccurate to refer to the delayed feedback we used in this study as delayed recasts. Therefore, we chose a term that is neutral in terms of immediacy (i.e., reformulations) and have referred to our immediate and delayed feedback strategies as immediate and delayed reformulations.

35

2 Nunnally (1978) advocated that an alpha value of .60 can be considered acceptable for newly developed measures, whereas .70 should be the threshold in other cases. References Alarcón, I. V. (2010). Gender assignment and agreement in L2 Spanish: The effects of morphological marking, animacy, and gender. Studies in Hispanic and Lusophone Linguistics, 3, 267–300. https://doi.org/10.1515/shll-2010-1076 Baralt, M. (2013). The impact of cognitive complexity on feedback efficacy during online versus face-to-face interactive tasks. Studies in Second Language Acquisition, 35, 689–725. https://doi.org/10.1017/S0272263113000429 Blake, C. (2009). Potential of text‐based internet chats for improving oral fluency in a second language. The Modern Language Journal, 93, 227–240. https://doi.org/10.1111/j.15404781.2009.00858.x Blake, R. J. (2000). Computer mediated communication: A window on L2 Spanish interlanguage. Language Learning & Technology, 4, 120–136. https://doi.org/10125/25089 Bower, J., & Kawaguchi, S. (2011). Negotiation of meaning and corrective feedback in Japanese/English eTandem. Language, Learning & Technology, 15, 41–71. https://doi.org/10125/44237 Bruhn de Garavito, J., & White, L. (2002). The second language acquisition of Spanish DPs: The status of grammatical features. In A. Pérez-Leroux & J. Muñoz (Eds.), The acquisition of Spanish morphosyntax (pp. 153–178). Norwell, MA: Kluwer. https://doi.org/10.1007/978-94-010-0291-2

36

Chun, D. M. (1994). Using computer networking to facilitate the acquisition of interactive competence. System, 22, 17–31. https://doi.org/10.1016/0346-251X(94)90037-X Derrick, D. (2016). Instrument reporting practices in second language research. TESOL Quarterly, 50, 132–153. https://doi.org/10.1002/tesq.217 Doughty, C. (2001). Cognitive underpinnings of focus on form. In P. Robinson (Ed.), Cognition and second language instruction (pp. 206–257). Cambridge, UK: Cambridge University Press. https://doi.org/10.1017/cbo9781139524780.010 Doughty C. (2003). Instructed SLA: Constraints, compensation, and enhancement. In C. Doughty & M. H. Long (Eds.). Handbook of second language acquisition (pp. 256–310). New York, NY: Blackwell. https://doi.org/10.1002/9780470756492.ch10 Doughty, C. J., & Long, M. H. (2003). Optimal psycholinguistic environments for distance foreign language learning. Language Learning & Technology, 7, 50–80. https://doi.org/10125/25214 Ellis, R. (1991). Instructed second language acquisition: Learning in the classroom. Malden, MA: Wiley-Blackwell. https://doi.org/10.1017/s0272263100010536 Ellis, R. (2015). Form-focused instruction and the measurement of implicit and explicit L2 knowledge. In P. Rebuschat (Ed.), Implicit and explicit learning of languages (pp. 417– 441). Amsterdam, The Netherlands: John Benjamins. https://doi.org/10.1075/sibil.48.17ell Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback and the acquisition of L2 grammar. Studies in Second Language Acquisition, 28, 339–368. https://doi.org/10.1017/S0272263106060141

37

Fernández-García, M. (1999). Patterns of gender agreement in the speech of second language learners. In J. Gutierrez-Rexach & F. Martinez-Gil (Eds.), Advances in Hispanic linguistics: Papers from the 2nd Spanish Linguistics Symposium (Vol. 1, pp. 3–15). Somerville, MA: Cascadilla. https://doi.org/10.2307/3657915 Franceschina, F. (2005). Fossilized second language grammars: The acquisition of grammatical gender. Amsterdam, The Netherlands: John Benjamins. https://doi.org/10.1075/lald.38 Gass, S. M., & Mackey, A. (2006). Input, interaction and output: An overview. AILA Review, 19, 3–17. https://doi.org/10.1075/aila.19.03gas González-Lloret, M. (2014). The need for needs analysis in technology-mediated TBLT. In M. González-Lloret & L. Ortega (Eds.), Technology-mediated TBLT (pp. 23–50). Philadelphia, PA: John Benjamins. https://doi.org/10.1075/tblt.6.02gon Granena, G. (2014). Language aptitude and long-term achievement in early childhood L2 learners. Applied Linguistics, 35, 483–503. https://doi.org/10.1093/applin/amu013 Granena, G. (2016). Individual versus interactive task-based performance through voice-based computer-mediated communication. Language Learning & Technology, 20, 40–59. https://doi.org/10125/44481 Gurzynski-Weiss, L., & Baralt, M. (2015). Does type of modified output correspond to learner noticing of feedback? A closer look in face-to-face and computer-mediated task-based interaction. Applied Psycholinguistics, 36, 1393–1420. https://doi.org/10.1017/S0142716414000320 Harmer, J. (2007). The practice of English language teaching. Harlow, UK: Pearson Longman. https://doi.org/10.1177/003368820103200109

38

Hawkins, R., & Chan, C. Y. (1997). The partial availability of universal grammar in second language acquisition: The ‘Failed Functional Features Hypothesis’. Second Language Research, 13, 187–226. https://doi.org/10.1191/026765897671476153 Iwasaki, J., & Oliver, R. (2003). Chat-line interaction and negative feedback. Australian Review of Applied Linguistics, 17, 60–73. https://doi.org/10.1075/aralss.17.05iwa Kern, R. G. (1995). Restructuring classroom interaction with networked computers: effects on quantity and characteristics of language production. The Modern Language Journal, 79, 457–476. https://doi.org/10.1111/j.1540-4781.1995.tb05445.x Marsden, E., Mackey, A., & Plonsky, L. (2016). The IRIS Repository: Advancing research practice and methodology. In A. Mackey & E. Marsden (Eds.), Advancing methodology and practice: The IRIS Repository of instruments for research into second languages (pp. 1–21). New York, NY: Routledge. https://doi.org/10.4324/9780203489666 Leeman, J. (2003). Recasts and second language development. Studies in Second Language Acquisition, 25, 1, 37–63. https://doi.org/10.1017/S0272263103000020 Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning, 60, 309–365. https://doi.org/10.1111/j.1467-9922.2010.00561.x Li, S., Zhu, Y., & Ellis, R. (2016). The effects of the timing of corrective feedback on the acquisition of a new linguistic structure. The Modern Language Journal, 100, 276–295. https://doi.org/10.1111/modl.12315 Loewen, S., & Erlam, R. (2006). Corrective feedback in the chatroom: An experimental study. Computer Assisted Language Learning, 19, 1–14. https://doi.org/10.1080/09588220600803311

39

Loewen, S., & Philp, J. (2006). Recasts in the adult L2 classroom: Characteristics, explicitness and effectiveness. The Modern Language Journal, 90, 536–556. https://doi.org/10.1111/j.1540-4781.2006.00465.x Long, M. H. (1991). Focus on form: A design feature in language teaching methodology. Foreign language research in cross-cultural perspective, 2, 39–52. https://doi.org/10.1075/sibil.2.07lon Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. Ritchie & T. Bhatia (Eds.), Handbook of second language acquisition (pp 413–468). New York, NY: Academic Press. https://doi.org/10.1016/b978-012589042-7/50003-7 Long, M. H. (2007). Recasts in SLA: The story so far. Problems in SLA (pp. 75–116). Mahwah, N.J.: Lawrence Erlbaum. Long, M. H, & Robinson, P. (1998). Focus on form in classroom second language acquisition. New York, NY: Cambridge University Press. Lyster, R. & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second Language Acquisition, 19, 37–66. https://doi.org/10.1017/s0272263197001034 Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill. Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349. https://doi.org/10.1126/science.aac4716 Ortega, L. (2009). Interaction and attention to form in L2 text-based computer-mediated communication. In A. Mackey & C. Polio (Eds.), Multiple perspectives on interaction (pp. 226–253). New York, NY: Routledge. https://doi.org/10.4324/9780203880852

40

Payne, J. S., & Whitney, P. J. (2002). Developing L2 oral proficiency through synchronous CMC: Output, working memory, and interlanguage development. CALICO Journal, 20, 7–32. http://www.jstor.org.proxyiub.uits.iu.edu/stable/24149607 Phakiti, A. (2014). Experimental research methods in language learning. London, UK: Bloomsbury Academic. http://dx.doi.org/10.5040/9781472593566.ch-002 Plonsky, L., & Oswald, F. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64, 878–912. https://doi.org/10.1111/lang.12079 Porte, G. (2012). Replication research is applied linguistics. Cambridge, UK: Cambridge University Press. Quinn, P. (2014). Delayed versus immediate corrective feedback on orally produced passive errors in English (Unpublished doctoral dissertation). University of Toronto, Toronto. Révész, A., & Han, Z. (2006). Task content familiarity, task type and efficacy of recasts. Language Awareness, 15, 160–179. https://doi.org/10.2167/la401.0 Sachs, R., & Suh, B. (2007). Textually enhanced recasts, learner awareness, and L2 outcomes in synchronous computer-mediated interaction. In. A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 197– 227). Oxford, UK: Oxford University Press. Sauro, S. (2009). Computer-mediated corrective feedback and the development of L2 grammar. Language Learning and Technology, 13, 96–120. https://doi.org/10125/44170 Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3–32). New York, NY: Cambridge University Press. https://doi.org/10.1017/cbo9781139524780.003

41

Schwartz, B. D., & Sprouse, R. A. (1996). L2 cognitive states and the Full Transfer/Full Access model. Second Language Research, 12, 40–72. https://doi.org/10.1177/026765839601200103 Sheen, Y. (2004). Corrective feedback and learner uptake in communicative classrooms across instructional settings. Language Teaching Research, 8, 263–300. https://doi.org/10.1191/1362168804lr146oa Shintani, N., & Aubrey, S. (2016). The effectiveness of synchronous and asynchronous written corrective feedback on grammatical accuracy in a computer‐mediated environment. The Modern Language Journal, 100, 296–319. https://doi.org/10.1111/modl.12317 Smith, B. (2003). The use of communication strategies in computer-mediated communication. System, 31, 29–53. https://doi.org/10.1016/S0346-251X(02)00072-6 Sotillo, S. M. (2000). Discourse functions and syntactic complexity in synchronous and asynchronous communication. Language Learning & Technology, 4, 82–119. https://doi.org/10125/25088 Warschauer, M. (1997). Computer‐mediated collaborative learning: Theory and practice. The Modern Language Journal, 81, 470–481. https://doi.org/10.1111/j.15404781.1997.tb05514.x White, L., Valenzuela, E., Kozlowska–Macgregor, M., & Leung, Y. K. I. (2004). Gender and number agreement in nonnative Spanish. Applied Psycholinguistics, 25, 105–133. https://doi.org/10.1017/S0142716404001067 Yilmaz, Y. (2011). Task effects on focus on form in synchronous computer-mediated communication. The Modern Language Journal, 95, 115–132. http://dx.doi.org/10.1111/j.1540-4781.2010.01143.x

42

Yilmaz, Y. (2012). The relative effects of explicit correction and recasts on two target structures via two communication modes. Language Learning, 62, 1134–1169. http://dx.doi.org/10.1111/j.1467-9922.2012.00726.x Yilmaz, Y., & Granena, G. (2010). The effects of task type in synchronous computer-mediated communication. ReCALL Journal, 22, 20–38. https://doi.org/10.1017/S0958344009990176 Yilmaz, Y., & Granena, G. (2016). The role of cognitive aptitudes for explicit language learning in the relative effects of explicit and implicit feedback. Bilingualism: Language and Cognition, 19, 147–161. https://doi.org/10.1017/S136672891400090X Yilmaz, Y., & Yuksel, D. (2011). Effects of communication mode and salience on recasts: A first exposure study. Language Teaching Research, 15, 457–477. https://doi.org/10.1177/1362168811412873 Ziegler, N. (2016). Synchronous computer-mediated communication and interaction. Studies in Second Language Acquisition, 38, 553–586. https://doi.org/10.1017/S027226311500025X Supporting Information Additional Supporting Information may be found in the online version of this article at the publisher’s website: Appendix S1. Scoring Rubric and Procedures and Interrater Reliability for the Oral Proficiency Task. Appendix S2. Treatment Task. Appendix S3. Testing Materials. Appendix S4. Testing Materials. Appendix S5. Object Naming Task.

43

Appendix S6. Reliability Coefficients. Appendix S7. T-Test: Mean Frequency of Feedback. Appendix S8. Descriptive Statistics for the Grammaticality Judgment Test.

44

Table 1 Descriptive statistics for oral production test scores

Pretest

Posttest I

Posttest II

Gain 1

Gain 2

Group

M

SD

M

SD

M

SD

M

SD

M

SD

Control

.56

.10

.61

.14

.56

.13

.04

.12

.00

.15

Immediate

.49

.10

.73

.12

.69

.12

.24

.11

.20

.14

Delayed

.54

.11

.65

.14

.58

.12

.11

.15

.04

.14

45

Table 2 Descriptive statistics for grammaticality judgment test scores (ungrammatical items)

Pretest

Posttest I

Posttest II

Gain 1

Gain 2

Group

M

SD

M

SD

M

SD

M

SD

M

SD

Control

.20

.14

.20

.13

.25

.13

.00

.11

.05

.12

Immediate

.21

.18

.46

.18

.48

.19

.25

.22

.28

.19

Delayed

.20

.14

.44

.20

.47

.20

.24

.12

.26

.18

46

Suggest Documents