underlying form in Mandarin sandhi tone production

0 downloads 0 Views 1MB Size Report
representation of Mandarin Tone 3 plays a role in the tone sandhi production. In the ..... conditions within a four-item and a three-item set in Experiment 2 ...... 38 .... ―gu[3] shi[4]‖ (股市, ―stock market‖) from one participant in Experiment. 2 . ..... form words and phrases in Mandarin Chinese in connected speech.
分类号

密级

公开

UDC

编号 20090110108

硕士学位论文 普通话第三声底层声调表征 在连读变调产出音韵编码阶段的作用

申请人姓名 导师姓名及职称

陈小聪 王桂珍

教授

申请学位类别

文 学

学科专业名称

外国语言学及应用语言学

培 养 单 位

英语语言文化学院

学位授予单位

广东外语外贸大学

2012 年 6 月 15 日

20

21

分类号 ________ UDC

密级 ___公开____

__

编号 20090110108

广东外语外贸大学硕士学位论文

On the Role of the Underlying Tonal Representation in the Phonological Encoding Stage of Tone Sandhi Production of the Mandarin Third Tone

普通话第三声底层声调表征 在连读变调产出音韵编码阶段的作用

申请人姓名

陈小聪

导师姓名及职称

王桂珍

教授

申请学位类别





学科专业名称

外国语言学及应用语言学

论文提交日期

2012 年 4 月 25 日

论文答辩日期

2012 年 5 月 31 日

答辩委员会

王初明

教授(主席)

郑超 教授 冯蔚 博士 姜琳 副教授 赵晨 副教授 学位授予单位

广东外语外贸大学

22

23

独创性声明

本人郑重声明:所呈交的学位论文是本人在导师指导下进行的研究工作及 取得的研究成果。据我所知,除了文中特别加以标注和致谢的地方外,论文中 不包含其他人已经发表或撰写过的研究成果,也不包含为获得

广东外语外贸

大学 或其他教育机构的学位或证书而使用过的材料。与我一同工作的人对本 研究所做的任何贡献均已在论文中作了明确的说明并表示谢意。

作者签名:

签字日期: 20







学位论文版权使用授权书

本学位论文作者完全了解 广东外语外贸大学 有关保留、使用学位论文的 规定,有权保留并向国家有关部门或机构送交论文的复印件和磁盘,允许论文 被查阅和借阅。本人授权 广东外语外贸大学 可以将学位论文的全部或部分内 容编入有关数据库进行检索,可以采用影印、缩印或扫描等复制手段保存、汇 编学位论文。

作者签名:手写姓名

导师签名:手写姓名

签字日期:20xx 年 x 月 x 日

签字日期:20xx 年 x 月 x 日

24

25

On the Role of the Underlying Tonal Representation in the Phonological Encoding Stage of Tone Sandhi Production of the Mandarin Third Tone Xiaocong Chen

Supervised by Professor Wang Guizhen

Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Arts in Linguistics and Applied Linguistics

Guangdong University of Foreign Studies May 2012 -i-

- ii -

ACKNOWLEDGEMENTS

It is a great opportunity for me to express my gratitude here to all the teachers who have enlightened me and those friends who have helped me during my past seven years in the Guangdong University of Foreign Studies. The completion of this thesis should be credited to many people in my academic career. Without their help, I would have not gone through obstacles and pains in those gloomy days. Firstly, I would like to thank my supervisor, Professor Wang Guizhen, whose benignity and tolerance endowed me with freedom to explore my interests in diverse fields including phonology, phonetics and psycholinguistics, which broadened my horizons during my three-year postgraduate study. She gave me a number of suggestions to my writing, leading to the improvement of this thesis. My deepest indebtedness should go to Dr. Li Liang, the most remarkable genius I have ever seen in my life. I cannot imagine the accomplishment of those experiments without his technical support and insightful advice. His persistence in the pursuit of his dreams, his strong will in the face of adversity, his enthusiasm in the acquisition of knowledge have greatly moved me. He sets an example of a real researcher to me and reminds me that it is never too old to learn. And I strongly believe that he will become a star in the field of linguistics one day. I also want to express my sincere gratitude to Professor Dong Yanping, my undergraduate

research

supervisor,

who

introduced

to

me

the

field

of

psycholinguistics, and taught me a lot how to become a good researcher when I was an undergraduate student. Her earnestness and dedication to the research have a great impact on my academic study. She provided me with many precious suggestions not only in study but also in life, particularly when I felt a little bit lost. She reminded me that learning to live a life is as vital as learning, and health is a perquisite for further learning. My special thanks also go to Dr. Yan Hao, who shared his DMDX tutorials with - iii -

me, helped me overcome difficulties at the beginning of my learning of DMDX, and offered me good advice and encouragement during my experiments. He is an excellent researcher in my eye and I believe that he will make great contributions to the psycholinguistics in future. I also like to thank Dr. Shuai Lan, who had delightful discussion with me on the research topic of my current thesis, which encouraged me a lot and made me realize that my work indeed had some values. My grateful thanks should also go to all the participants in my experiments. I also want to thank my best friends Liu Xu and Maisie Wu, who helped me recruit some participants during the experiments. Particularly, I had to apologize to Maisie Wu, who suffered a lot from my frenzy during the compilation of the lexical items and got extremely hurt by my fierce words last year. I am also grateful to Yanxiu Li for proofreading my thesis and pointing out mistakes. I also thank my best friends Qing Jian, who also gave me some comments on my first draft of this thesis, and Wenwu Yi, who encouraged me to continue to seek my PhD studies and ensured me that I was on the right track. I also need to thank my roommate, Michael Chen, who had to endure some of my bad life habits, but still offered me warm comfort and passionate help when I was psychologically depressed last year. Particularly, I would like to thank those friends in the Syntax Group. It was fun every time I attended the syntax seminar with them. I also want to say ‗thank you‘ to many teachers: Professor Wen Bingli, whose praise made me feel a sense of fulfillment in my academic career; Dr. Lu Shouchu, who first led me into the field of linguistics, and inspired me with his profound philosophical ideas; Professor Qi Luxia, who demonstrated that a good personality was invaluable for the research work; Professor Huo Yongshou, who illuminated me with his acuity of philosophical mind; Dr. Feng Wei, who changed my view of the world with her critical thinking, and often made lots of enlightening comments, particularly on my current thesis; Dr. Liu Ying, who provided me with good advice when I learnt English pronunciation, which cultivated my interests in phonetics later; Professor Zheng Chao, who showed me the beauty of English; Professor Yu - iv -

Xiangming, who taught me the importance of conscientiousness; Professor Hu Zhengmao, who attracted me with his vast range of knowledge; Dr. Wang Yunfeng, who always cheered me up with his smile; Dr. Zengji, who provided financial support for me… I want to thank my sister, who always shows a deep understanding of my work; my brother, who comforted me when I was psychologically down though we often quarrel; and particularly my Mom and Dad, to whom I owed the most. Though my parents sometimes complained about my choice of linguistics, they still support me today. I have to say sorry to them because I fail to fulfill their expectations. My final thanks go to all the committee members of the oral defense and all the teachers who read my thesis and gave me suggestions. Possibly this thesis will mark the end of my academic career, but I never regret my choice. Linguistics is such a fascinating discipline that still drives me crazy. A research life is vivid and pleasant, albeit painstaking sometimes. I wish that I would have the chance to continue it in the future. Xiaocong Chen June 11, 2012

-v-

- vi -

ABSTRACT

Tone sandhi occurs abundantly in some Asian tone languages that use pitch to differentiate lexical meanings. It refers to the systematic change of a lexical tone depending on the specific morphological or phonological contexts. One representative example is the third-tone sandhi in Mandarin, a tone language widely spoken across China nowadays. In Mandarin, there are four lexical tones, and the Tone 3, a low-falling-rising pitch occurring in isolation, changes systematically when it is followed by another lexical tone. Based on the traditional generative phonological framework proposed by Chomsky and Halle (1968), the change of the Mandarin Tone 3 can be described by two rules (Zhang, 2007). One is the third-tone sandhi rule, which stipulates that the third tone will change into a rising tone when followed by another third tone. The other is the half-third sandhi rule, which specifies that the third tone will change into a low tone when preceding the other lexical tones except the third tone. In other words, the Mandarin Tone 3 has two tonal variants when preceding another lexical tone: the third-tone sandhi variant and the half-third sandhi variant. In the traditional generative phonological framework, it is assumed that the two tonal variants are derived from the same underlying tonal representation. Such a derivational view influences some previous proposals on tone sandhi production of Mandarin Tone 3, which contend that the surface tonal variant is serially derived from the underlying tonal representation during the on-line tone sandhi production. However, empirical investigations to probe into the on-line phonological encoding of tone sandhi production are scarce. The current study attempts to examine whether the underlying tonal representation of Mandarin Tone 3 plays a role in the tone sandhi production. In the current research, it is hypothesized that the half-third sandhi variant and the third-tone sandhi variant of Mandarin Tone 3 are serially derived from the same underlying tonal representation during the production. Two experiments were conducted to test this - vii -

hypothesis with the application of the modified implicit priming paradigm, a type of experimental paradigm used in some studies on the phonological encoding in speech production (Meyer, 1990, 1991; Janssen, Roelofs, & Levelt, 2002; Cholin, Schiller, & Levelt, 2004). Previous findings on Mandarin word production using this experimental paradigm showed that spoken reaction times became significantly reduced when participants produced a set of words sharing their first syllables and tones (Chen, Chen, & Dell, 2002; Zhang, 2008). According to the assumed hypothesis, it is predicted that response times will also become reduced when participants produce a set of words sharing their first syllables and the underlying tonal representations, regardless of their surface tonal values. However, the results of two experiments in the current study violated the prediction mentioned above. The results demonstrated that when the underlying tonal representation and surface tonal variant were shared, the reaction times were the shortest. But only sharing the underlying tonal representation did not lead to stronger priming effects than the situations where neither the underlying tonal representation nor surface tonal variant was shared. These results did not support the derivational processing view which assumes that the two surface sandhi variants of Mandarin Tone 3 are constructed from the same underlying tonal representation during the on-line processing of production. The current research, together with a recent study (Chen, Chen, & Dai, to appear), indicates that the sandhi variants of Mandarin Tone 3 are stored in the lexicon in terms of their surface tonal values instead of an underlying tonal representation, which is in line with the ‗surface representation view‘ mentioned in Zhou and Marslen-Wilson (1997). Given the linguistic data and speech errors, the current experimental results are better explained by an interactive processing view, which holds that both surface sandhi variants are stored in the lexicon and simultaneously activated during the production, and the more highly activated variant fitting the corresponding context will be finally selected.

Keywords: tone sandhi, underlying tonal representation, surface tone, Mandarin Tone 3, phonological encoding - viii -





连读变调现象在一些亚洲的声调语言中十分常见。连读变调指的是一个声 调会根据其所在位置的不同条件发生规律性的变化。其中以普通话的三声变调 最具代表性。普通话中有四个声调,而其中第三声,在单独发音时为一个转折 调。但当后面接另外一个声调时,其调型会发生规律性的变化。在 Chomsky 和 Halle (1968) 所创建的传统生成音系学框架下,三声的变调可以用两条规则进行 描述 (Zhang,2007)。一条是三声变调规则,即一个三声后面跟的是另一个三声 的时候,前面的三声要变成升调。另一条为半三声变调规则,即一个三声后面 跟的是除三声以为的其他调类时,该三声会变成一个低降调。也就是说,当三 声后面跟着其它调子的时候有两个调位变体。在传统的生成音系学框架下,这 两个调位变体是由一个共同的底层声调表征推导而来。有些关于三声变调产出 的处理过程受这种推导观的影响,也认为在具体语言产出的音韵编码过程中, 是先激活一个底层声调,再根据规则推导出相应的表层声调变体。然而,关于 变调具体的产出编码过程的实证研究目前还很少。 本研究旨在检验普通话三声底层声调在连读变调的具体产出过程是否起作 用。在推导观的基础上,本文假设在连读变调具体产出过程中,三声的两个调 位变体皆由同一底层声调表征推导而来。为了检验这一假设,本文使用两个改 进后的内隐启动范式实验进行研究。前人使用内隐启动范式对汉语产出的研究 表明,当受试要求进行命名的一串双字词的第一个音节和声调完全一样时,启 动效应最大,反应时最小(Chen,Chen,& Dell,2002;Zhang,2008)。根据 本文的假设,假如三声的两个调位变体在产出过程中皆由同一个底层声调表征 推导而来,当受试要求进行命名的一串双字词的第一个音节和它们的底层声调 相同时,即使表层调值不同,也可以产生启动效应,反应时会比命名一串双字 词第一个音节相同但底层和表层声调都不同时更快。 然而两个实验的结果并未支持前面的假设。实验表明,当底层声调和表层 声调完全一致时,反应时间最快,启动效应最强。然而,当底层声调相同,而 表层声调不同,反应时间并未比底层声调和表层声调都不同的情况下显著加快, 两种情况下的启动效应并无显著差异。实验的结果不支持两个三声的调位变体 在具体产出过程中是由同一个底层声调表征推导而来。这一结果与 Chen, Chen 和 Dai (to appear) 最近的研究一致,表明三声变调可能在词库中是以表层声调 表征存储的,这正好和 Zhou 和 Marslen-Wilson(1997)所提到的“表层表征观” 相吻合。本研究认为,交互处理观能更好的解释实验的结果。交互处理观认为 三声的两个调位变体都同时存储于词库,而且在变调产出过程中都被同时激活 并相互竞争,而和语境相匹配的变体得到更多的激活,从而被最终选择。而这 一观点也可以更好的解释具体语言事实中的一些三声变调的例子和实验中所观 - ix -

察到的三声变调的言语失误。 关键字:连读变调,底层声调表征,表层声调表征,普通话三声,音韵编码

-x-

CONTENTS

ACKNOWLEDGEMENTS ................................................................................... iii ABSTRACT ......................................................................................................... vii 摘要 ...................................................................................................................... ix CONTENTS.......................................................................................................... xi LIST OF TABLES ............................................................................................... xiii LIST OF FIGURES ...............................................................................................xv CHAPTER ONE INTRODUCTION ...................................................................... 1 1.1 Research orientation ................................................................................. 1 1.2 Significance of the research ...................................................................... 3 1.3 Organization of the thesis ......................................................................... 3 CHAPTER TWO LITERATURE REVIEW ........................................................... 5 2.1 Theories of phonological encoding in speech production .......................... 5 2.1.1 Phonological encoding in DSMSG model ........................................ 5 2.1.2 Phonological encoding in the WEAVER++ model ........................... 6 2.2 The implicit priming paradigm ............................................................... 10 2.2.1 Overview of the implicit priming paradigm ................................... 10 2.2.2 Implicit priming paradigm with an odd-man-out .............................11 2.2.3 Implicit priming and phonological encoding studies .......................11 2.3 Lexical Tone, tone sandhi and tone sandhi production in Mandarin ......... 12 2.3.1 Lexical tone and tonal alternations in Mandarin ............................. 12 2.3.2 Phonological representation of sandhi tone in mental lexicon ........ 14 2.3.3 Studies on the on-line processing of third-tone sandhi production .. 15 2.4 Phonological alternations and speech production models ........................ 17 2.4.1 A sketch of phonological alternations ............................................ 17 2.4.2 Phonological alternations and speech production models ............... 17 2.4.3 Phonological variants: Derived by rules vs. selected among competitions .................................................................................. 18 2.5 Motivation of the study ........................................................................... 19 CHAPTER THREE EXPERIMENTS....................................................................21 3.1 Introduction ............................................................................................ 21 3.2 Experiment 1 .......................................................................................... 21 3.2.1 Method .......................................................................................... 21 3.2.2 Results .......................................................................................... 32 - xi -

3.2.3 Discussion ..................................................................................... 35 3.3 Experiment 2 .......................................................................................... 37 3.3.1 Method .......................................................................................... 37 3.3.2 Results .......................................................................................... 41 3.3.3 Discussion ..................................................................................... 43 CHAPTER FOUR GENERAL DISCUSSION .......................................................47 4.1 Summary of the findings of the two experiments .................................... 47 4.2 Explanations ........................................................................................... 48 4.3 Implications for speech production models and phonological variants .... 51 4.4 Limitations and suggestions for future research ...................................... 53 CHAPTER FIVE CONCLUSION .........................................................................55 REFERENCES......................................................................................................57 APPENDICES ......................................................................................................61 Appendix A ..................................................................................................... 61 Appendix B ..................................................................................................... 62

- xii -

LIST OF TABLES

Table 2-1 Table 3-1 Table 3-2

Table 3-3

Table 3-4

Table 3-5 Table 3-6

Table 3-7

Tonal Inventory in Mandarin (adapted from Chao, 2006; Zhou & Marslen-Wilson, 1997) ........................................................................ 13 Examples of response words and prompts presented in the four-item and three-item conditions from Cholin et al. (2004).................................... 22 Exmaples of response words in the totally constant, underlyingly constant and variable conditions within a four-item and a three-item set in Experiment 1 ....................................................................................... 24 Mean reaction times (in ms), standard deviations, percentage error rates (in parentheses), and priming effects (in ms) of the by-participants analyses in Experiment 1 ..................................................................... 33 Mean reaction times (in ms), standard deviations, percentage error rates (in parentheses), and priming effects (in ms) of the by-items analyses in Experiment 1 ....................................................................................... 34 Examples of response words in the underlyingly constant and variable conditions within a four-item and a three-item set in Experiment 2 ...... 38 Mean reaction times (in ms), standard deviations, percentage error rates (in parentheses), and priming effects (in ms) of the by-participants analyses in Experiment 2 ..................................................................... 42 Mean reaction times (in ms), standard deviations, percentage error rates (in parentheses), and priming effects (in ms) of the by-items analyses in Experiment 2 ....................................................................................... 43

- xiii -

- xiv -

LIST OF FIGURES

Figure 2-1 Fragment of lexical network of the DSMSG model of spoken word production adapted from Roelofs (2003a) .............................................. 6 Figure 2-2 The WEAVER++ model in outline (Levelt et al., 1999, Figure 1) .......... 7 Figure 2-3 Stages of phonological encoding in the WEAVER++ model adapted from Roelofs (1997) ....................................................................................... 8 Figure 2-4 Sub-stages of phonological encoding in the WEAVER++ model (Xie, 2007, Figure 3) ...................................................................................... 9 Figure 2-5 Liu‘s (2006) production model for the third-tone sandhi (adapted from Liu, 2006, Figure 15) ........................................................................... 16 Figure 3-1 The correctly triggered RT red line for the syllable ―wu‖ in CheckVocal ............................................................................................................ 32 Figure 3-2 The correctly triggered RT red line for the syllable ―yan‖ in CheckVocal ............................................................................................................ 32 Figure 3-3 The correctly triggered RT red line for the syllable ―gu‖ in CheckVocal ............................................................................................................ 41 Figure 3-4 The correctly triggered RT red line for the syllable ―jian‖ in CheckVocal ............................................................................................................ 41 Figure 3-5 Pronunciation of the half-third sandhi variant of Tone 3 for the word ―gu[3] shi[4]‖ (股市, ―stock market‖) from one participant in Experiment 2 ..................................................................................................... 45 Figure 4-1 Mispronunciation of the half-third sandhi variant of Tone 3 as the third-tone sandhi variant for the word ―wu[3] jiao[4]‖ (午觉, ―nap‖) from one participant in Experiment 1 ........................................................... 50

- xv -

- xvi -

1 CHAPTER ONE INTRODUCTION

1.1

Research orientation

The phonological form of a word may have systematic phonetic alternations in different morphological or phonological contexts (Bürki, Alario, & Frauenfelder, 2011; Bürki, Ernestus, & Frauenfelder, 2010; Zhou & Marslen-Wilson, 1997). For example, the alveolar nasal /n/ in the word green /ɡri:n/ may change to the bilabial nasal /m/ when preceding a word beginning with a bilabial sound (such as pencil), or to the velar nasal /ŋ/ when followed by a word starting with a velar sound (such as car). Such phonological alternations involve not only phonological segments but also suprasegmental aspects like pitch. One representative case is the tone sandhi phenomena in some Asian tone languages, which utilize pitch to contrast word meanings. Tone sandhi refers to the fact that the pitch will change systematically depending on the specific phonological or morphological contexts. One well-known example is the third-tone sandhi in the tone language Mandarin, also called Putonghua, the standard language promoted by the Chinese government and spoken widely across China. In Mandarin, there are four major lexical tones, and the third tone, pronounced as a low-falling-rising pitch in isolation, has different alternations according to its following tone. For example, the third tone of the morpheme ―jiu[3]‖

1

(九, ―nine‖)

turns into a rising tone when preceding another third tone , as in the disyllabic phrase ―jiu[3] dian[3]‖ (九点,―nine o‘clock‖), whereas the tone of the same morpheme 1

In Mandarin, there are four major lexical tones, and they are represented by the number 1, 2, 3 and 4 respectively in this thesis. In this paper, the numbers inside the square bracket stand for the corresponding tone numbers carried by the syllables. For example, 3 here represents that the syllable is associated with a third tone in Mandarin. -1-

becomes a phonetically low tone when followed by the other lexical tones except the third tone2, as in another phrase ―jiu[3] tian[1]‖(九天, ―nine days‖). That is, the third tone has two tonal variants when preceding another tone3, relying on the particular phonological contexts. In the traditional generative phonological framework proposed by Chomsky and Halle (1968), the tone sandhi can be accounted for by assuming that the two tonal variants share the same underlying representation, and the surface sandhi tone is derived from the underlying representation through phonological rules (Zhang, 2007). There are some psycholinguistic studies suggesting that the on-line phonological encoding of sandhi tone production undergoes such derivational process (Wan & Jaeger, 1998; Liu, 2006). The derivational processing bears two basic assumptions: Firstly, the sandhi tonal variants are stored in terms of the underlying tonal representation in the lexicon; secondly, the underlying tonal representation is first activated, and the sandhi tonal variants are serially computed ―on the fly‖ during the production. This suggests that the underlying tonal representation plays a role during the on-line production. However, empirical evidence is needed. In order to investigate this issue, this thesis employs a variant of the implicit priming paradigm (also known as ―form preparation paradigm‖), one of the experimental paradigms widely used in the phonological encoding studies (Meyer, 1990, 1991; Janssen, Roelofs, & Levelt, 2002; Cholin, Schiller, & Levelt, 2004). Previous findings on Mandarin word production using this experimental paradigm demonstrated that vocal reaction times were reduced when participants produced a set of words sharing their first syllables and tones, as a result of preparation of the first tonal syllable in advance (Chen, Chen, & Dell, 2002; Zhang, 2008). With the application of the implicit priming paradigm, the purpose of the current study is to examine whether the underlying tonal representation, as presumed by some previous studies (Wan & Jaeger, 1998; Liu, 2006), plays a role in the phonological 2

3

In Mandarin, some morphemes bear neutral tones, sometimes termed ‗toneless‘, whose phonetic realizations rely on the particular phonological context. In this study, cases of neutral tones are excluded since they complicate the picture of tone sandhi and fall out of the scope of the current research. The application of the tone sandhi rules is also constrained by syntactic structures, which also falls out of the scope of the current research. The present study only focuses on word production. -2-

encoding stage of tone sandhi production. Based on the derivational processing view, it is predicted that response times will also become reduced when participants produce a set of words sharing their first syllables and the underlying tonal representations, regardless of their surface tonal values, because the underlying tonal representation is firstly activated and can be prepared in advance. And this thesis is an attempt to test this prediction.

1.2

Significance of the research

In spite of a great amount of linguistic and psycholinguistic research on Mandarin tone sandhi, a majority of these studies center on spoken word recognition, phonological representations, or acoustic details of sandhi tones. Chronometric studies directly tackling the planning process of tone sandhi production seem to be scarce. Therefore, this study aims to fill the gap. It is also hoped that this study can shed new light on the mental phonological representations and encoding process of phonological alternations in speech production, which may provide insights into the current speech production models. It also tries to improve the behavioral research design in the investigation of speech production, which can be further replicated in other Chinese dialects to reveal more new findings on tone sandhi production.

1.3

Organization of the thesis

This thesis consists of five parts. After a brief introduction in Chapter 1, the following chapters are arranged as follows. Chapter 2 provides an extensive review of several issues relevant to the current research. A theoretical overview of phonological encoding in two most influential speech production models is firstly presented. The implicit priming paradigm employed in this study is also reviewed. Then the background and empirical studies on Mandarin tones, tone sandhi and the on-line processing of the Mandarin tone sandhi production are elaborated, followed by a brief discussion of the problems with phonological alternations within the current speech -3-

production models. At the end of the chapter, I will put forward the hypothesis and discuss its predictions about the experimental results. Chapter 3 introduces two implicit priming experiments. Details of the experimental designs, methods, data analyses, and results are reported, together with discussion of the experimental results after each experiment. I will show that, the experimental results, inconsistent with my predictions, fail to offer support for the hypothesis raised in Chapter 2. Chapter 4 summarizes the major findings of the previous two experiments, and provides explanations for the experimental results. I will discuss the implications for the current speech production models, point out some limitations and directions for future research. Chapter 5 concludes the whole study.

-4-

CHAPTER TWO LITERATURE REVIEW

Since the tone sandhi encoding occurs in the stage of phonological encoding in speech production, in this chapter, firstly, theories of phonological encoding in speech production are reviewed (2.1). The implicit priming paradigm, the experimental technique used in this study, is also introduced (2.2). Then, background on lexical tones and tone sandhi in Mandarin and previous research on the on-line processing of the tone sandhi production are provided (2.4). Problems with phonological alternations are also discussed within the existing speech production models (2.5). Finally, motivation of the study is sketched (2.6).

2.1

Theories of phonological encoding in speech production

Most of current speech production models assume that speech production consists of two major steps: lemma retrieval and word-form encoding (Dell, 1986, 1988; Levelt, 1989; Levelt, Roelofs, & Meyer, 1999; Roelofs, 1992, 1997, 2003). And the phonological encoding is part of the word-form encoding process. Models of speech production differ substantially in details of phonological encoding. Among them there are two influential models: the DSMSG model (Dell, 1986, 1988; Dell, Schwartz, Martin, Saffran, & Gagnon, 1997) and WEAVER++ (Levelt et al., 1999; Roelofs, 1992, 1997).

2.1.1 Phonological encoding in DSMSG model The DSMSG model is a computational model for speech production, first proposed by Dell (1986, 1988). In this model, the lexicon is assumed to be a network in which nodes are linked with weighted bidirectional connections. The network only contains the excitatory connections, without any inhibitory links at all. Nodes in the network -5-

have different levels, including nodes for conceptual features, words/morphemes, and phonemes (and even phonological features, see Dell, 1986). The speech production begins with the activation of conceptual features, and the activation spreads towards the nodes for words, and the most highly activated word node will be selected. Then, the phonological encoding starts by spreading the activation to the nodes for phonemes, and the most highly activated phonemes will then be selected. Links between nodes in DSMSG model are bidirectional, and thus, it is interactive in nature since it allows the activation to spread backward, for example, from phoneme nodes to word nodes, and, thus, direct feedback from phonological encoding to lemma encoding is available in this model. The whole model is illustrated in Figure 2-1.

conceptual features

ANIMATEE

FURRY

FELINE

cat

dog

PET

Word-form encoding

words

phonemes

hat

cap

/onset h/ /onset z/

Figure 2-1

/æ/

/onset k/ /onset d/

/coda t/ /coda p/

Fragment of lexical network of the DSMSG model of spoken word

production adapted from Roelofs (2003a)

2.1.2 Phonological encoding in the WEAVER++ model The WEAVER++ model is the computational implementation of Levelt‘s (1989) theory (Levelt et al., 1999; Roelofs, 1992, 1997). It is also assumed that the mental lexicon is a lexical network of nodes consisting of three major strata: the conceptual stratum, the lemma stratum, and the word-form stratum. The word-form stratum includes nodes for morphemes, syllables, and segments. Like Dell‘s model, nodes are -6-

connected only through excitatory connections and no inhibitory links exist in the network. It is also proposed that lexical access is achieved through the spreading activation from the conceptual nodes, as in Dell‘s model. According to this model, speech production is a staged process, proceeding from conceptual preparation, lemma selection, word-form encoding and finally to the articulation. The word-form encoding can be further subdivided into three stages in order: morphological encoding, phonological encoding, and phonetic encoding. The whole process of speech production in the WEAVER++ model is displayed in Figure 2-2.

Figure 2-2

The WEAVER++ model in outline (Levelt et al., 1999, Figure 1)

During the phonological encoding, the phonological information will be retrieved from the mental lexicon. This includes the retrieval of segmental properties -7-

and metrical properties of the morphemes, termed segmental spellout and metrical spellout respectively in the model. Taking the word ―sleeping‖ for instance, after the selection of the morphemes and , the segments /s/ /l/ /i:/ /p/ /I/ and /ŋ/ are spelled out. The metrical frame, including the number of the syllables and the stress pattern, is also spelled out in parallel (see Figure 2-3). Then, the process, called segment-to-frame association (see Figure 2-4), occurs, in which segments will be inserted into the metrical frame. The phonological (or prosodic) word, the domain for syllabification, is also created during this process and segments are successively combined to form phonological syllables. Then, in the stage of phonetic encoding, the phonetic encoder takes the generated phonological syllables as input and retrieves the corresponding syllabic gestural scores stored in the mind. Such scores are learned motor programs which can be further executed by the articulators.

Lexical concept

SLEEP(X)

lemma retrieval

LEMMA + diacritic(s)

sleep + progressive

Word form encoding

morphological encoding

MOPHEME(S)



phonological encoding

PHONOLOGICAL WORD(S)

phonetic encoding

ARTICULATORY PROGRAM

[sli:] [pIŋ]

Figure 2-3 Stages of phonological encoding in the WEAVER++ model adapted from Roelofs (1997) -8-

phonological encoding

segment spellout

metrical spellout

segment-to-frame association

Figure 2-4 Sub-stages of phonological encoding in the WEAVER++ model (Xie, 2007, Figure 3)

Unlike Dell‘s (1986, 1988) model, the WEAVER++ model postulates that word form encoding is strictly feedforward after the lemma selection and the model is a discrete staged model. Thus, direct feedback from the word-form nodes to the lemma nodes is impossible, and the feedback can be available only through the self-monitoring loops, as proposed by Levelt (1989). There are also two important processing mechanisms in the WEAVER++ model that differ from Dell‘s model. One is called binding-by-checking principle. In Dell‘s model, the selection of the word-form is achieved by selecting the most highly activated nodes, while in the WEAVER++ model, it is assumed that each node has a procedure that checks whether the activated nodes correctly link up to the nodes of the higher level and the selection of word-form is realized through this checking procedure. The other is the suspension-resumption mechanism that supports the incrementality in speech production advocated by Levelt (1989). Incrementality means that the activation of a fragment of input can trigger the speech planning process and there is no need to prepare all the elements of the whole utterance before the start of speaking. In the WEAVER++ model, the encoding can be triggered when partial information is provided, and then the process suspends, and resumes only when further information is given. -9-

2.2

The implicit priming paradigm

2.2.1 Overview of the implicit priming paradigm Implicit priming paradigm (or form preparation paradigm), initially devised by Meyer (1990, 1991), is one of the experimental techniques frequently used in the research on phonological encoding (e.g. Chen, Chen, & Dell, 2002; Chen & Chen, 2006, 2007; Cholin et al., 2004; Meyer, 1990, 1991; Roelofs & Meyer, 1998; Liu, 2006; Zhang, 2008, etc.). The principle that underlies this technique is that the execution of an action can be speeded up by knowing certain aspects of the action in advance (Rosenbaum, Inhoff, & Gordon, 1984). In the classic version of this paradigm, participants are firstly required to learn sets of words, each of which consists of three to five pairs of prompt-response associates. For instance, participants need to remember the following associated pairs in a set: place—local, signal—beacon, tree—maple. Then participants are asked to produce the corresponding response words as quickly and correctly as possible when the prompts in a set are randomly and repeatedly presented on the screen. The sets in the experiment are made up of two conditions: the homogeneous and the heterogeneous condition. In the homogeneous sets, the response words share part of their phonological forms (e.g. local, loner, lotus sharing the first syllable), whereas in the heterogeneous sets, the response words are not phonologically related at all (e.g. local, beacon, maple). The shared phonological part is regarded as the implicit prime. As predicted by the WEAVER++ model, facilitation effect can be obtained when participants can prepare and buffer phonological representations of the response words (Roelofs & Meyer, 1998). Thus, it is said that words with shared phonological forms in the homogeneous condition can yield shorter voice onset latencies than words in the heterogeneous condition. Meyer (1990, 1991) found that this indeed happened when the response words shared their initial segments and the priming effects became larger when the number of shared initial segments increased. However, no such effect was found when only the final segments were shared. The results demonstrate that segments are encoded incrementally, i.e. from left to right during the speech production. - 10 -

Chen et al. (2002) first applied the classic version of this paradigm to investigate the phonological encoding of Mandarin spoken word production. They found that when the first syllables of the response words in the sets only shared one initial segment, or only shared the same tone but not any segments, no priming effects occurred. In contrast, the priming effects were observed only when the initial syllables of the response words in the sets were shared, or both the initial syllables and their tones in the sets were shared. But the latter produced stronger priming effects than the condition in which only the initial syllables were shared. That is, when the initial syllables were shared, additionally sharing the tonal information produced stronger priming effects. These findings were later replicated by Zhang (2008).

2.2.2 Implicit priming paradigm with an odd-man-out The odd-man-out implicit priming paradigm (Cholin et al., 2004; Janssen et al., 2002) is one variant of the original paradigm. An odd-man-out refers to a lexical item whose phonological form differs in a certain phonological feature from a set of response words in the homogeneous condition. The homogeneous set with an odd-man-out is called a variable set, while the homogeneous set without any odd-man-out is a constant set. For example, an homogeneous set like beacon, beatnik, beaker is a variable set because the response words contain an odd-man-out beatnik whose initial syllable differs from the other words in the set (beat vs. bea), though they share initial segments. In contrast, a homogeneous set like beacon, beadle, beaker is a constant set because the initial segments and the initial syllable are the same across the set. Cholin et al. (2004) used this paradigm and found out that words in the constant set with shared initial syllable yielded stronger priming effects than those in the variable set, suggesting that syllable played a vital role in the phonological encoding.

2.2.3 Implicit priming and phonological encoding studies Implicit priming paradigm has been proved quite useful in the studies on phonological encoding. A great number of findings have emerged from the implicit priming - 11 -

experiments in the past years (e.g. Chen et al., 2002; Chen & Chen, 2006, 2007; Cholin et al., 2004; Meyer, 1990, 1991; Roelofs & Meyer, 1998; Liu, 2006; Zhang, 2008). Chen (2007) argued that the priming effects found in the implicit priming should not be attributed to an effect of task strategy nor memory retrieval. Also it should not result from priming effects of semantics or speech organs. It is argued that the implicit priming technique can serve to investigate all the stages of word form encoding (Cholin & Levelt, 2009). Since the tonal sandhi encoding belongs to part of phonological encoding, it is assumed that this paradigm will be appropriate for the investigation into tone sandhi production. Furthermore, the implicit priming paradigm can be employed to examine the extent to which the representations and encoding processes of phonological segments in L1 and L2 are shared (Roelofs, 2003b). As the current study also attempts to investigate the representations and encoding processes of two tonal variants of Mandarin Tone 3, this paradigm constitutes a good tool for the current research. In the present study, the odd-man-out version of the implicit priming was used instead of the classic version. One of the advantages of this variant is that it enables direct comparison of the priming effects between different conditions. Moreover, it allows the systematic manipulation of the phonological property of interest while leaving other irrelevant phonological properties constant (Cholin et al., 2004). Thus, it is a good option for the current research because this study only focuses on the tonal encoding rather than segmental or syllabic encoding. Details about the experimental design and procedure will be further elaborated in the next chapter.

2.3

Lexical Tone, tone sandhi and tone sandhi production in Mandarin

2.3.1 Lexical tone and tonal alternations in Mandarin Pitch that is used to differentiate lexical meaning is called lexical tone (Ladeforged, 2006; Yip, 2002). Based on Chao‘s (1930) proposal, a lexical tone can be represented by a five-point scale, in which number 1 stands for the lowest pitch and 5 the highest. The scale is in line with the fundamental frequency, the acoustic correlate of the pitch. - 12 -

In Mandarin, a lexical tone is associated with a syllable, which also corresponds to a morpheme. There are four lexical tones in Mandarin, illustrated with the syllable /ma/ in Table 2-1.

Tone Number

Tonal Value

Description

Example

Character

Meaning

1

55

High level

ma (1)



mother

2

35

High rising

ma (2)



hemp

3

214

Low dipping

ma (3)



horse

4

51

High falling

ma (4)



scold

Table 2-1 Tonal inventory in Mandarin (adapted from Chao, 2006; Zhou & Marslen-Wilson, 1997)

Table 2-1 only represents the canonical/citation form of the four tones, i.e. the form when a morpheme occurs in isolation. But morphemes are often combined to form words and phrases in Mandarin Chinese in connected speech. When morphemes are combined, the tones usually remain unchanged, except in the phonological context that triggers tone sandhi. Tone sandhi refers to the tonal change when different lexical tones are combined together (Chen, 2000; Yip, 2002). There are two sandhi processes in Mandarin that involve the third tone (Zhang, 2007): One is the third-tone sandhi, in which a Tone 3 (tonal value 214) changes to a Tone 2 (tonal value 35), when preceding another Tone 3; the other is the half-third sandhi, in which a third tone 214 is truncated to 21 when occurring before other tones excluding the third tone. In linguistic descriptions, the sandhi phenomena can be phrased in terms of rewriting rules within the SPE framework proposed by Chomsky and Halle (1968). Examples are seen as follows: Third-tone sandhi: 214  35/__ 214 shui 214 guo 214  shui 35 guo 214 Half-third sandhi: 214  21/__ {55, 35, 51} shui 214 zai 55  shui 21 zai 55 shui 214 ping 35  shui 21 ping 35 shui 214 dao 51  shui 21 dao 51 - 13 -

‗fruit‘ ‗flood‘ ‗level‘ ‗rice‘

It has been argued that the half-third sandhi rule is optional, and it does not result in a different tone (Zhou & Marslen-Wilson, 1997). That is, the tone 21 does not differ from the canonical form 214 perceptually. Thus, the third tone can be considered unchanged when the half-third sandhi rule is applied. In contrast, the third-tone sandhi is ―phonologically conditioned and obligatory‖ (Zhou & Marslen-Wilson, 1997), and the resultant tone is perceptually indistinguishable from Tone 2, and in turn differs significantly from the canonical form of Tone 3 in perception. Nevertheless, it can be thought that the third tone has two tonal variants in the non-final position (i.e. when it is followed by another tone), pronounced as either a Tone 3 or a Tone 2 depending on the following adjacent tone. It should be noted that when the third-tone sandhi rule changes Tone 3 to Tone 2, it may create tonal syllables that do not exist as citation forms in Mandarin. For example, in the word ―jiu[3] dian[3]‖ (九点,nine o’clock), the first tonal syllable changes to ―jiu[2]‖, to which there is no corresponding canonical form, because the syllable ―jiu‖ is never associated with Tone 2 in isolation.

2.3.2 Phonological representation of sandhi tone in mental lexicon In generative phonological accounts, the two phonetic variants of the third tone are both assumed to share the same underlying representation in the lexicon, though the specific nature of this underlying form is disputed (Kuo, Xu, & Yip, 2007). Whatever the details of the underlying representation are, it is widely proposed that the sandhi tone is derived through rules (as in the examples in 2.3.2) in linguistic accounts (e.g. Chen, 2000; Peng, 2000). However, opposite views are also held. It is argued that the surface form of sandhi tone should also be included into the phonological representation of the mental lexicon (Peng, 2000). In studies on spoken word recognition, such divergence also exists. There are at least three views on the phonological representations of the third-tone sandhi variant, presented in Zhou and Marslen-Wilson (1997). The first view is called canonical representation view, which assumes that the tonal variants of the third tone are - 14 -

represented in terms of their canonical, or citation form in the lexicon. The second view is the abstract representation view, which postulates that an abstract form, instead of any particular phonetic form, is shared by the tonal variants of the third tone and stored in the lexicon. The third view, the surface representation view, proposes that the sandhi tones are stored in terms of their surface form in the lexicon. The first two views both assume the existence of a shared underlying form for the tonal alternations of the third tone, while the third view only admits the surface form. Zhou and Marslen-Wilson (1997) used auditory priming techniques to test these views. As they argued, their findings were only in compatible with the canonical representation view, or the surface representation view, but provided no support for the abstract representation view. The problem with phonological representation of sandhi tones still remains unresolved. The representational view is far less clear in the field of speech production, since studies on the on-line processing of tone sandhi production is pretty scarce. But some current available proposals on tone sandhi production (Chen, 1999; Liu, 2006; Wan & Jaeger, 1998) seem to presuppose the existence of the underlying tonal representation.

2.3.3 Studies on the on-line processing of third-tone sandhi production Few experimental studies on the on-line processing of Mandarin tone sandhi production have been carried out. One previous available chronometric investigation was conducted by Deng, Feng and Peng (2003). They found that the planning of the third-tone sandhi production was influenced by the factor of non-word/word. In addition, whether the resulting tonal syllables of the third-tone sandhi rule could exist as a citation form also exerted an effect on the sandhi tone production when participants produced nonce words. But their study did not specifically target at the role of the underlying tonal representation during the tone sandhi production and their conclusions still need to be confirmed by more empirical studies. Some inferential comments on the planning process of tone sandhi production have been made by some previous studies. Chen (1999) suggested that tone sandhi - 15 -

may occur after the syllable/tone spellout and before the phonetic spellout, similar to the view held by Wan and Jaeger (1998). Liu (2006) proposed a similar model to account for the Mandarin tone sandhi, based on the WEAVER++ model. In his model, the segment and tone properties of morphemes are first retrieved and pre-stored in a phonological buffer. The tone sandhi is then achieved by a checking procedure, in which the tones in the phonological buffer will be examined, and modified if the sandhi context is satisfied. The whole process is displayed in Figure 2-5.

Figure 2-5 Liu‘s (2006) production model for the third-tone sandhi (adapted from Liu, 2006, Figure 15)

The above accounts all presume that the third-tone sandhi production is achieved by a series of discrete steps, and the sandhi tone is computed on-line. This view can be termed ‗derivational view‘, reminiscent of the generative phonological explanations of tone sandhi. In contrast, Deng et al. (2003) suggested that the two tonal variants in the non-final position might be both stored and activated respectively, which results in competition during the production. However, there is a scarcity of - 16 -

empirical investigations evaluating these arguments. In general, the derivational view for the tone sandhi production appears to be favored by more studies. The derivational view highlights the role of the underlying tonal representation in tone sandhi production, which is the focus of the current study.

2.4

Phonological alternations and speech production models

2.4.1 A sketch of phonological alternations Phonological alternations here pertain to systematic sound variations according to different morphological or phonological contexts, a phenomenon that occurs abundantly in spoken language production. One typical example is the place assimilation, which means that the place of articulation of a segment varies depending on adjacent segments. For instance, the place of articulation for the final nasal of the English word green /ɡri:n/ will be varied according to the place of articulation for the following segment. It may become a bilabial nasal /m/ when followed by labial segments (e.g. /p/ or /b/), or may change to a velar /ŋ/ when preceding velar segments (e.g. /k/ or /ɡ/). Mandarin tone sandhi is also another example of phonological alternations. The pitch of the third tone has two variants in the non-final position depending on the following tone, as seen in 2.3.1. Other phenomena involving contextual variations include dissimilation, lenition, cliticization, deletion, etc. Many phonological alternations like assimilation have clear phonetic foundations. For example, the case of assimilation can be regarded as a result of coarticulation of articulators. However, not all the phonological alternations have clear phonetic base. Mandarin third-tone sandhi is such a case, and it is difficult to be attributed to well-known phonetic motivations (Zhang & Lai, 2010). Alternations of this kind might be only demonstrations of language-particular phonotactics.

2.4.2 Phonological alternations and speech production models Context-dependent alternations pose challenges to many speech production models which assume the existence of abstract and context-free phonological representations - 17 -

(Dell, 1986, 1988; Levelt, 1989; Roelofs, 1997; Levelt et al., 1999). For instance, Dell‘s (1986, 1988) model cannot provide an adequate account for assimilation and allophonic variations in the final phonetic execution, indicated by Levelt (1989). Levelt (1989) proposed a two-step procedure to cope with the problem like assimilation. First, the morphemes are inserted into the morphological frame, and the segmental and metrical information is spelled out. Secondly, after receiving the segmental and metrical spellout, the Prosody Generator concatenates and modifies the segmental strings inside the prosodic frame, and creates the phonological word. Phonological or allophonic variations like assimilation are also generated during the second step. In the WEAVER++ model (Roelofs, 1997; Levelt et al., 1999), a phonetic level of encoding is proposed to address the assimilation problem. The phonetic encoding stage in the WEAVER++ model retrieves the context-dependent phonetic representations, which consist of abstract gestural scores specifying the ariticulatory gestures and their temporal relationships. And the overlapping of gestural scores automatically leads to assimilation. However, types of phonological alternations like tone sandhi cannot be handled in such a manner in the WEAVER++ model because the tone sandhi phenomenon cannot be simply explained by articulatory factors like assimilation. It seems that additional steps for modification need to be added to account for the tone sandhi phenomenon, as proposed by Levelt (1989). This is in line with Liu‘s (2006) model. In his model, steps for checking and modification were added to the original WEAVER++ model, in order to ensure the generation of the sandhi tone.

2.4.3 Phonological variants: Derived by rules vs. selected among competitions Traditional speech production models like Levelt et al. (1999) usually assume that the phonological variants share an abstract representation, and variants are derived through phonological or phonetic process, like those generative phonological accounts. This is particularly expressed in Bürki et al. (2010) as follows: - 18 -

―Traditional psycholinguistic models are heavily influenced by generative grammar (Chomsky & Halle, 1968), in which words are generally assumed to have only one lexical representation, with their other pronunciation variants being computed by means of phonological or phonetic rules.‖ But recent years have witnessed the development of other representational models such as exemplar models (Johnson, 1997), which argue that all the phonetic details are incorporated into the phonological representations. This new representational view has profound implications for the speech production models. It is possible that the encoding of word-form may not be necessarily a discrete derivational process. Instead, it might involve competitions among different phonological or phonetic variants (Bürki et al., 2010).

2.5

Motivation of the study

Based on the indication of some previous studies, in the present study, it is hypothesized that the two tonal variants of Mandarin Tone 3 are serially derived from the shared underlying tonal representation during the on-line word production. According to this hypothesis, the underlying tonal representation of Mandarin Tone 3 is firstly activated, and the sandhi tonal variants are constructed on-line. Thus, the underlying tonal representation can be prepared. The current research attempts to test the above hypothesis with the application of the implicit priming paradigm. As mentioned in 2.2, in the implicit priming, advanced knowledge of phonological properties of initial elements can speed up the response. If the underlying phonological representation plays a role in the encoding process and surface tonal variants are just derived and constructed ―on the fly‖, participants are supposed to respond faster to the sets that share the underlying phonological representation of initial elements than those that do not. Even if words in the sets do not share the same surface phonetic variant, only sharing the underlying phonological representation can also facilitate the production. In other words, if the tonal variants of Mandarin Tone 3 share the same underlying representation, spoken latencies will - 19 -

also become shorter, irrespective of their surface tonal values. However, only sharing the tonal information caused no priming effects (Chen et al., 2002; Zhang, 2008), as mentioned previously in 2.2.1. Instead, only when the initial syllables in the sets were shared, or both the initial syllables and their tones in the sets were shared, did the priming effects occur. But the latter yielded stronger priming effects. Based on these findings, when the initial syllable in the sets is held constant across the sets, according to the assumed hypothesis, it is predicted that the sets that additionally share the underlying tonal representation of Mandarin Tone 3 yield stronger priming effects than those that do not, if the underlying tonal representation plays a role in the production. Since the third-tone sandhi variant is assumed to be derived from the same underlying tonal representation of the third tone as the half-third sandhi variant, the sets like ―wu[3] dao[3]‖ (舞蹈, dance), ―wu[3] kuai[4]‖ (五块, five blocks), ―wu[3] pian[4]‖ (五片, five pieces) are predicted to lead to larger priming effects than the sets like ―wu[2] sheng[1]‖ (无声, quiet), ―wu[3] tian[1]‖ (五天, five days), ―wu[3] shuang[1]‖ (五双, five pairs), because words in the former sets share both the initial syllable and the underlying tonal representation, even though their surface tones are not shared (the first tonal syllable in ―wu[3] dao[3]‖ should change to Tone 2 according to the third-tone sandhi rule), whereas words in the latter sets only share the initial syllable but not the tone, neither the surface tone nor the underlying tone. In the next chapter, two experiments were conducted to evaluate the hypothesis. Nonce words are excluded in the current study, and only real words are taken into account because the processing of real words is closer to the real processing of the natural language.

- 20 -

CHAPTER THREE EXPERIMENTS

3.1

Introduction

The two experiments in this chapter aim to answer the following two questions respectively: (1) Can the third-tone sandhi variant of Mandarin Tone 3 facilitate the production of its half-third sandhi variant? (2) Can the half-third sandhi variant of Mandarin Tone 3 facilitate the production of its third-tone sandhi variant? According to the hypothesis posited at the end of the last chapter, both surface tonal variants of Mandarin Tone 3 can facilitate the production of each other, since they are assumed to be serially derived from the same underlying tonal representation. In the following parts, each experiment will be introduced respectively, and details of experiments will be fully presented, followed by results of the experiments and discussion of those results.

3.2

Experiment 1

3.2.1 Method Participants Twenty-seven participants including twenty-four Chinese postgraduate students and three office members from the Guangdong University of Foreign Studies in China participated in this Experiment. Though they came from different Chinese regions, they could speak Mandarin fluently and they used it as their major language for communication on the campus. All of them had normal or corrected-to-normal vision, - 21 -

without any hearing deficit or phonological disorders. Design and predictions The odd-man-out version of the implicit priming paradigm was chosen for the present study because this version enabled the systematic manipulation of one phonological property while keeping other phonological properties constant. In this study, only tonal information of the initial morphemes was manipulated while the segmental/syllabic information was held constant. This was crucial for the current investigation. First, the focus of this study was the function of tone in speech production, rather than the segmental/syllabic information. Second, the contribution of tone in Mandarin word production could be traced only when the syllable was kept constant (Chen et al., 2002; Zhang, 2008). Given constant segments/syllable, manipulation of tones in a set could reveal the role of tone in Mandarin word production.

Word Type Prompts

Constant set

Variable set

Four-item set

Three-item set

Four-item set

Three-item set

staan ([to] stand)

lei.den ([to] lead)

lei.den

ro.ken ([to] smoke)

ro.ken

stond (stood)

lei.dde (led)

staander (stand)

lei.der (leader)

lei.der

ro.ker (smoker)

ro.ker

staande (standing)

lei.dend (leading)

lei.dend

ro.kend (smoking)

ro.kend

rook.te (smoked)

Table 3-1 Examples of response words and prompts presented in the four-item and three-item conditions from Cholin et al. (2004)

Instead of presenting the sets in a homogeneous and heterogeneous condition, as in the traditional version of the implicit priming paradigm, this experiment followed the experimental design of Cholin et al. (2004) and changed the homogenous/ heterogeneous condition to the three-item/four-item condition. In their experiments, the four-item condition was formed by constant sets and variable sets that contained four items. The three-item condition was created by extracting three items that shared - 22 -

all the phonological properties from the four-item sets, and the odd-man-outs in variable sets were dropped in the three-item condition. Examples are illustrated in Table 3-1. Compared with the traditional version of the paradigm, the advantage of the presentation of three-item vs. four-item conditions is that it increases the statistical comparability. In the traditional version of the paradigm, comparisons are made between items in the homogeneous conditions and those in the heterogeneous conditions. But the items for comparisons differ in terms of lemmas and syllable onsets, and do not seem directly comparable. In this version of paradigm, the odd-man-out is excluded in the data analyses. Only the voice onset latencies of the derived three items in the three-item condition and their corresponding counterparts in the four-item condition are computed. It is predicted that the spoken latencies of the corresponding three items between the four-item will be larger than those counterparts in the three-item conditions, due to the combinatory influences of the item number load on working memory and/or the spoiling effect of the odd-man-out. Since the comparisons are made between the same words, items become more comparable than those in traditional version of paradigm. In addition, potential effects of the difference of lemmas or syllable onsets on the results can be eliminated. Thus, Experiment 1 in this study adopted the above design, with some slight modifications. The experiment was a 2×3 within-subject design. Two factors were investigated: One was Set Size and the other was Word Type. The factor Set Size consisted of two levels: three vs. four-item. And the factor Word Type was made up of three conditions: ―Totally constant condition‖, ―Underlyingly constant condition‖, and ―Variable condition‖. In the four-item totally constant sets, the initial morphemes of words all shared the half-third sandhi variant of Mandarin Tone 3. In this condition, both underlying tonal representation and surface form were the same for the initial morphemes of all the items. In the four-item underlyingly constant sets, an odd-man-out was included with its initial morpheme associated with the third-tone sandhi variant of Tone 3, and the remaining three items shared the half-third sandhi variant of Tone 3. The underlying tonal representation was shared in the underlyingly - 23 -

constant sets but the surface tonal forms differed. In the four-item variable sets, an odd-man-out was added with its initial morpheme carrying Tone 2, while the remaining three items also shared the half-third sandhi variant of Tone 3. The odd-man-out in this condition differed from the other words in the sets in terms of both underlying and surface tonal forms. Examples can be seen in Table 3-2.

Set Size Word Type Four-item set

Three-item set

Totally

wu[3] dong[4] (舞动, ―to wave‖)

wu[3] dong[4] (舞动, ―to wave‖)

Constant

wu[3] bi[4] (舞弊, ―cheating‖)

wu[3] bi[4] (舞弊, ―cheating‖)

condition

wu[3] ting[1] (舞厅, ―dance hall‖)

wu[3] ting[1] (舞厅, ―dance hall‖)

wu[3] guan[1] (五官, ―sensory organs‖) Underlyingly

wu[3] can[1] (午餐, ―lunch‖)

wu[3] can[1] (午餐, ―lunch‖)

Constant

wu[3] jiao[4] (午觉, ―nap‖)

wu[3] jiao[4] (午觉, ―nap‖)

condition

wu[3] ye[4] (午夜, ―midnight)

wu[3] ye[4] (午夜, ―midnight)

wu[3] dao[3] (舞蹈, ―dance‖) Variable

wu[3] shu[4] (武术, ―martial art‖)

wu[3] shu[4] (武术, ―martial art‖)

condition

wu[3] li[4] (武力, ―force‖)

wu[3] li[4] (武力, ―force‖)

wu[3] duan[4] (武断, ―arbitary‖)

wu[3] duan[4] (武断, ―arbitary‖)

wu[2] chi[3] (无耻, ―shameless‖)

Table 3-2 Examples of response words in the totally constant, underlyingly constant and variable conditions within a four-item and a three-item set in Experiment 1

Since the odd-man-out is supposed to hamper the priming effects, the spoken latencies of the other three items in four-item variable sets will be greater, compared with those in the totally constant sets. The discrepancy of spoken latencies of the corresponding three items between the three-item and four-item sets,termed ―priming effect‖ here, is also expected to be larger in the variable condition than in the totally constant condition. As to the underlyingly constant condition, if the third-tone sandhi - 24 -

variant and the half-third sandhi variant of Tone 3 share the same underlying representation, and this underlying representation is firstly retrieved in the phonological encoding, as indicated in Liu‘s (2006) model, the third-tone sandhi variant of Tone 3 will less spoil the priming effects than the odd-man-out Tone 2 in the variable condition, and the spoken latencies of the remaining three items should be smaller than those in the variable condition but larger than those in the totally constant condition. That is, in the four-item sets, RT (standing for reaction times) (totally constant) < RT (underlyingly constant) < RT (variable). The priming effects (difference of spoken latencies of the corresponding three items between the three-item and four-item sets) in the underlyingly constant condition is also predicted to lie between the priming effects of the variable condition and the totally constant condition, i.e. PE (standing for priming effects) (totally constant) < PE (underlyingly constant) < PE (variable). Materials 24 disyllabic Mandarin words (see Appendix A) were selected as response words for the present experiment. These words were grouped into 6 four-item blocks, with two blocks in each Word Type condition. All the items in each block shared the initial syllable, and two different syllable tokens (wu and yan) were used for the two blocks in every Word Type condition. The reason why the same syllable tokens (wu and yan) were chosen for all the three conditions was because it allowed direct comparisons between the three Word Type conditions. If different syllable tokens were used between different Word Type conditions, as in the experiments of Cholin et al. (2004), it would reduce the comparability between different Word Type conditions, due to the possible influence of the difference of syllable onset (Kessler, Treiman, & Mullennix, 2002), syllable length (Liu, 2006) or syllable type (Mooshammer, Goldstein, Nam, McClure, Saltzman, & Tiede, 2012). In the totally constant condition, all the initial morphemes of four items shared the half-third sandhi variant of Tone 3. For example, a set like ―wu[3] dong[4]‖ (舞动, ―to wave‖), ―wu[3] bi[4]‖ (舞弊, ―cheating‖), ―wu[3] ting[1]‖ (舞厅, ―dance hall‖), ―wu[3] guan[1]‖ ( 五 官 , ―sensory organs‖) was a totally constant set. In the - 25 -

underlyingly constant condition, there was an odd-man-out with its initial morpheme carrying the third-tone sandhi variant of Tone 3, while the initial morphemes of the other three items in the block shared the half-third sandhi variant of Tone 3. Thus the initial morpheme of the odd-man-out shared the underlying tonal representation with those of the other three items but differed in the surface tonal form. For example, in a set like ―wu[3] can[1]‖ (午餐, ―lunch‖), ―wu[3] jiao[4]‖ (午觉, ―nap‖), ―wu[3] ye[4]‖ (午夜, ―midnight), ―wu[3] dao[3]‖ (舞蹈, ―dance‖), the last item ―wu[3] dao[3]‖ was an odd-man-out because its initial syllable should change Tone 3 to Tone 2, resulting in a different surface tonal form from the remaining three items, though they all shared the same underlying tonal representation of Tone 3. In the variable condition, an odd-man-out was the item whose initial morpheme was associated with underlying Tone 2, while the initial morphemes of the remaining three items shared the half-third sandhi variant of Tone 3. Thus, the odd-man-out was different from the other three items in terms of both underlying and surface tonal forms. An example was a set like ―wu[3] shu[4]‖ (武术, ―martial art‖), ―wu[3] li[4]‖ (武力, ―force‖), ―wu[3] duan[4]‖ (武断, ―arbitary‖), ―wu[2] chi[3]‖ (无耻, ―shameless‖). The initial morphemes of all the items in the four-item set were controlled. In the variable blocks, the three items excluding the odd-man-out shared the same initial morpheme with Tone 3, whereas the odd-man-out was manipulated to have a different initial morpheme with Tone 2. In order to keep consistent across the three Word Type conditions, in the underlyingly constant blocks, the initial morpheme of the odd-man-out was also made different from those of the other three items. For example, in the underlyingly constant set like ―wu[3] can[1]‖ (午餐, ―lunch‖), ―wu[3] jiao[4]‖ (午觉, ―nap‖), ―wu[3] ye[4]‖ (午夜, ―midnight), ―wu[3] dao[3]‖ (舞蹈, ―dance‖), the initial morphemes of the first three items were shared while the odd-man-out ―wu[3] dao[3]‖ had a different initial morpheme. In the totally constant blocks, a ―pseudo-odd-man-out‖ was created, with three items sharing the same initial morpheme and the remaining item carrying another homophonous initial morpheme. For instance, in the set like ―wu[3] dong[4]‖ (舞动, ―to wave‖), ―wu[3] bi[4]‖ (舞弊, ―cheating‖), ―wu[3] ting[1]‖ (舞厅, ―dance hall‖), ―wu[3] guan[1]‖ (五官, ―sensory - 26 -

organs‖), the initial element of all the words were homophonous but the item ―wu[3] guan[1]‖ (五官, ―sensory organs‖) contained a different initial morpheme from the other three items. Since Chen and Chen (2006, 2007) adopted the same experimental paradigm as in this study and found out that morphemes were not involved in the Mandarin word production, the difference of morphemes will not affect the spoken latencies but only the phonological properties matter. Thus, the homophonous different morpheme should exert the same boosting effect on the word production as sharing the same morpheme, and this odd-man-out is counted as a ―pseudo‖ one. Nonetheless, the blocks in all the conditions were made to incorporate an odd-man-out with a different initial morpheme from the remaining three items. This made the three conditions more comparable. Since the odd-man-out in all the conditions had a different initial morpheme, if there was any difference of the spoken latencies of the remaining three items between the three conditions, it could not result from the difference of the initial morpheme of the odd-man-out but the difference of its phonological properties. The Chinese character for the initial morpheme of the odd-man-out was also in different shape from those of the other three items in a block. Although Chen et al. (2002) found that orthography did not contribute to Mandarin speech production through the same experimental paradigm in this study, the Chinese character of the initial morpheme of the odd-man-out was still manipulated so that it did not bear too much orthographical similarity to the characters of the initial morphemes of the other three items in the same block. Since the second syllables of the odd-man-outs in the underlyingly constant blocks were associated with Tone 3, the second syllables of the selected odd-man-outs in the variable blocks also carried Tone 3. Thus, the tonal combination of the odd-man-outs in the variable blocks was Tone 2 + Tone 3, making it more comparable to the counterparts in the underlyingly constant blocks (Tone 3 + Tone 3). The second syllables of the other chosen items in the whole experiment carried either Tone 1 or Tone 4. A corresponding three-item block was also derived from each four-item block by - 27 -

removing the odd-man-out in each block, as illustrated in Table 3.2. This resulted in two three-item blocks in every Word Type condition, creating 6 corresponding three-item blocks altogether. Another 24 disyllabic Mandarin words (also see Appendix A) were chosen as prompt items to form an associative pair with each response word. Each response word bore an obvious semantic connection with its prompt. For instance, the response word ―wu[3] dao[3]‖ (舞蹈, ―dance‖) was associated with the semantically related prompt ―chang[4] ge[1]‖ (唱歌, ―sing‖). All the prompts were only composed of morphemes with either Tone 1 or Tone 4, so as to prevent the stimuli from helping participants activate Tone 3 or Tone 2. None of the prompts contained the same syllables or rhymes as the corresponding response words in order to exclude phonological priming from the prompts. The morphemes of each prompt were also kept orthographically distinct from those of its corresponding response word, with the purpose of eliminating the potential impact of orthography. A full list of the prompt-response pairs are presented in Appendix A. Apparatus The whole experiment was run by a Personal Computer (Dell Dimension 2010) with 1 GB internal memory (DDRII 800MHz). All the processes were controlled by the program DMDX 4.0.4.2 (Foster & Foster, 2003) running on the optimized virus-free operating system Microsoft Windows XP. Irrelevant services and programs were shut down during the experiment in case that the running of the experiment was disturbed. Stimuli were presented on a Dell 18.5-inch LCD monitor approximately 50 cm away from the participants, with resolution of 1024 pixels × 768 pixels. All the Chinese characters displayed in the experiment were in 24-points, Songti (宋体, a common type of font for the presentation of Chinese characters), white on a black screen. Participants‘ spoken responses were registered by a microphone attached to a SALAR A17 headphone, which fed into a Realtek HD compatible sound card. The sound card functioned as the voice key. An amplitude threshold was preset in the DMDX Test Vox. When participants‘ vocal responses exceeded the threshold during - 28 -

the experiment, the sound card was triggered and the reaction times were recorded by DMDX. The measurement of the spoken reaction times, also termed voice onset latencies, started when participants saw the prompt words on the screen, and ended when participants began to produce the corresponding target words. At the same time, participants‘ every vocal response was recorded by DMDX for further examination, with a sampling rate of 22 kHz and a 16-bit quantification. The recordings began when each stimulus was presented, and stopped 1000ms later, creating a sound wave file for each response. A reaction time cue for each stimulus was also written into each corresponding recorded sound wave file by DMDX for further RT verification. And the microphone was adjusted in right position in advance to ensure that the sound card could be appropriately triggered by the voices of participants. Procedure Participants were tested individually in a quiet room. The experimenter was seated in the same room, taking notes of hesitation, no responses, wrong responses, stuttering, etc. Before the experiment started, participants first received videotaped instructions on the experiment. Then the experimenter demonstrated a trial to participants. In order to familiarize the participants with the procedure, participants were asked to practice an experiment-unrelated block on their own. Participants received twelve blocks in the experiment, with six four-item blocks and six derived three-item blocks. The four-item condition and three-item condition were presented to participants subsequently and the order of the presentation of these two conditions was counterbalanced across the participants. Within each Set Size condition, blocks of different Word Type conditions were randomly displayed. Every block was made up of three alternating phases: learning phase, practice phase, and test phase. In the learning phase, participants were presented with four pairs (or three pairs) of prompt-response words. Participants were required to memorize these pairs and instructed that they should say out the corresponding response word in the later phases when they saw a prompt. The prompt and response word in each pair were separated by a double-hyphen and positioned at the center of the screen. The pairs were displayed on the screen one by one, controlled by - 29 -

participants. When participants remembered one pair, they pressed the space bar on the keyboard to proceed to the next pair. Every pair was displayed twice in this phase and the time for the learning was freely controlled by the participants themselves. After the participants finished the learning, a practice phase was initiated to test participants‘ results of learning. Participants were told to say out the corresponding response words to the prompts which were randomly shown on the screen. If participants gave no responses or wrong responses, they were asked to remember the pairs again. The test phase began after participants remembered all the words correctly. In the test phase, participants were asked to response loudly to the stimuli as accurately and quickly as possible. Each trial began with a fixation ―+‖ denoting the position where the prompt would appear. The fixation lasted 200ms. Then the screen was cleared with a short pause. The durations of pauses were 500ms, 600ms, 700ms, and 800ms, which were randomly distributed across trials in each block in order to prevent participants from having strategic guessing or preparation in advance. After the pause the prompt was displayed at the center of the screen. Simultaneously the voice key (sound card) was also activated, for a maximum of 3000ms. The prompt disappeared immediately after the response, and the next trial started after 1000ms. If participants elicited no responses after 3000ms, a 500Hz warning sound occurred and lasted 200ms, and then the program automatically jumped to the next trial after 1000ms. If participants‘ reaction times were more than 1000ms, they would also receive a 500Hz warning sound lasting 200ms after the response, and the next trial began after 1000ms. Meanwhile, DMDX also started to record the vocal responses when the voice key was activated, and stopped recording the response with a delay of 1000ms. When the voice key was triggered, a reaction time cue was also written into the corresponding recorded wave file, marking the RT registered by the computer in the sound waves, which could be used to further examine whether the RT was triggered correctly or not. Every prompt in each block was randomly repeated four times, with no immediate repetitions of the same prompt. Thus, a four-item block contained 16 trials and a three-item block contained 12 trials, creating 168 trials in - 30 -

total. The total duration of the whole experiment lasted 40 minutes on average. Data analyses The reaction times and vocal responses were both examined in the program CheckVocal (Protopapas, 2007) using the sound wave files recorded by DMDX. By listening to the sounds in CheckVocal, responses that contained fillers, hesitations, stuttering, mispronunciations, corrected responses, coughs, and other illegal sounds were coded as errors and marked as wrong responses in CheckVocal. Sound files that contained no responses from participants were also marked as errors in the program. In CheckVocal, the recorded sound wave files were also transformed as spectrograms with a window length of 10ms. The mode for wave display was set as 800 pixel (width) × 150 pixel (height). A RT cue written in the sound wave files by DMDX was clearly shown as a red line in the spectrogram, marking the time point at which participants responded. The reaction times were verified with the help of the spectrograms and sound waves. If reaction times were triggered correctly, the red line should be positioned at the beginning of the syllable ―wu‖ or ―yan‖. If the red line was misplaced, the reaction times were counted as mis-triggered. Mis-triggered reaction times were recalculated by CheckVocal through placing the mis-triggered RT red line in the right position in the spectrograms. For the syllable ―wu‖, the mis-triggered RT red line was adjusted and placed at the beginning of the first formants of vowel /u/ in the spectrograms, shown by Figure 3-1. For the syllable ―yan‖, the mis-triggered RT red line was adjusted and placed at the beginning of the first formant for the glide /j/, demonstrated by Figure 3-2. Mis-triggered reaction times were readjusted only when the responses did not contain fillers, hesitations, stuttering, mispronunciations, corrected responses, coughs and other illegal sounds. After the verification of reaction times, reaction times that were more than 1000ms or less than 200ms were regarded as invalid and thus removed in the reaction times analyses. Reaction times from error-coded responses also did not enter into the analyses. As mentioned above, the odd-man-outs in the four-item blocks were also excluded in the analyses.

- 31 -

First formant of /u/

Figure 3-1

The correctly triggered RT red line for the syllable ―wu‖ in CheckVocal

First formant of /j/

Figure 3-2

The correctly triggered RT red line for the syllable ―yan‖ in CheckVocal

Analyses of variance were run with two independent variables: the Set Size and the Word Type. The analyses included by-participants and by-items analyses. In the by-participants analyses, for each participant, reaction times of the remaining items were averaged across trials within each Set Size condition and within each Word Type condition. The mean reaction times were entered into analyses of variance conducted by the statistical analysis software SPSS 13.0. In the by-items analyses, reaction times of each remaining items were averaged across trials and participants within each Set Size condition and within each Word Type condition. The mean reaction times were also transported into SPSS 13.0 for analyses of variance.

3.2.2 Results Analyses of reaction times In the by-participants analyses, the overall main effect of Set Size was significant - 32 -

Set Size Priming Word Type

Four-item condition

Three-item condition effects

M

SD

%Err

M

SD

%Err

Totally constant

672

75

(2.9)

657

87

(2.8)

15

Underlyingly constant

705

69

(7.5)

675

79

(5.7)

30

Variable

699

69

(5.9)

673

73

(8.8)

26

Table 3-3 Mean reaction times (in ms), standard deviations, percentage error rates (in parentheses), and priming effects (in ms) of the by-participants analyses in Experiment 1

(F1 (1, 26) = 6.047, p = .021 < .05). There was also an overall highly significant effect of Word Type (F1 (2, 52) = 7.108, p = .002 < .01). The interaction of Set Size and Word Type was not significant (F1 (2, 52) = 1.002, p = .374 > .05). Further analyses revealed that the main effect of Word Type was highly significant in the four-item condition (F1 (2, 52) = 8.345, p = .001) but no significant effect of Word Type was found in the three-item condition (F1 (2, 52) = 2.293, p = .111 > .05). The main effect of Set Size was significant both in the underlyingly constant condition (F1 (1, 26) = 7.702, p = .01 < .05) and in the variable condition (F1 (1, 26) = 4.963, p = .035 < .05) but not in the totally constant condition (F1 (1, 26) = 1.744, p = .198 > .05). Post-hoc LSD analyses showed that in the four-item condition, there were statistically significant differences between the totally constant condition and the underlyingly constant condition (p < 0.001) and between the totally constant condition and the variable condition (p = 0.004 < .01). No statistically significant difference between the underlyingly constant condition and the variable condition in four-item condition was found in post-hoc analyses (p = .495 > .05). The priming effects (the difference between voice onset latencies of the four-item condition and the corresponding three-item condition within each Word Type condition) in the totally constant condition were much smaller (672 – 657ms = 15ms) than those in the underlyingly constant condition (705- 675ms = 30ms) and in the variable condition (699 – 673ms = - 33 -

26ms). The mean reaction times, standard deviations, error rates, and priming effects of the by-participants analyses were all summarized in Table 3-3.

Set Size Priming Word Type

Four-item condition

Three-item condition effects

M

SD

%Err

M

SD

%Err

Totally constant

673

19

(2.9)

658

24

(2.8)

15

Underlyingly constant

708

24

(7.5)

674

22

(5.7)

34

Variable

702

17

(5.9)

675

17

(8.8)

27

Table 3-4 Mean reaction times (in ms), standard deviations, percentage error rates (in parentheses), and priming effects (in ms) of the by-items analyses in Experiment 1

In the by-items analyses, the interaction of Set Size and Word Type reached significant level (F2 (2, 15) = 4.170, p = .036 < .05). The overall main effect of Set Size was very highly significant (F2 (1, 15) = 101.173, p < .001). The overall main effect of Word Type was nearly significant (F2 (2, 15) = 2.894, p = .086). Further analyses demonstrated that the main effect of Word Type was significant in the four-item condition (F2 (2, 15) = 4.919, p = .023 < .05) but not in the three-item condition (F2 (2, 15) = 1.241, p = .317 > .05). Further t-tests revealed that the differences between the four-item condition and three-item condition were highly significant for all the three Word Type conditions (totally constant condition: t (5) = 5.586, p = .003 < .01; underlyingly constant condition: t (5) = 5.379, p = .003 < .01; variable condition: t (5) = 8.119, p < .001). Post-hoc LSD analyses showed that there were significant differences between the totally constant condition and the underlyingly constant condition in the four-item condition (p = .01) and between the totally constant condition and the variable condition (p = .027 < .05) in the four-item condition. No significant difference between the underlyingly constant condition and the variable condition in the four-item condition was found. The priming effects (the difference between voice onset latencies of the four-item condition and the - 34 -

corresponding three-item condition within each Word Type condition) in the totally constant condition were much smaller (673 – 658ms = 15ms) than those in the underlyingly constant condition (708- 674ms = 34ms) and in the variable condition (702 – 675ms = 27ms). The mean reaction times, standard deviations, error rates, and priming effects of the by-items analyses were all summarized in Table 3-4. Analyses of error rates The error rates were 5.6% in general in experiment 1. None of the significant effects were found in the by-items analyses of error rates. In the by-participants analyses of error rates, the overall main effect of Word Type was significant (F1 (2, 52) = 7.403, p = .005 < .01). The interaction of Set Size and Word Type was significant (F1 (2, 52) = 4.160, p = .021 < .05). The main effect of Set Size was not significant (F1 (1, 26) = .251, p = .620 > .05). Further analyses showed that the main effect of Word Type was significant both in the four-item condition (F1 (2, 52) = 5.543, p = .014 < .05) and in the three-item condition (F1 (2, 52) = 7.055, p = .002 < .01). Post-hoc LSD analyses demonstrated that there were significant differences between the totally constant condition and the underlyingly constant condition and between the totally constant condition and the variable condition both in the four-item condition (p = .002 < .01; p = .006 < .01) and in the three-item condition (p = .017 < .05; p = .001). No other effects were found significant.

3.2.3 Discussion As was predicted, the factor Set Size had an effect on the reaction times. The three items in the three-item sets were generally produced faster in comparison with their counterparts in four-item sets. The number of items in three-item sets was less than those in four-item sets, thus posing fewer burdens on participants‘ working memory during the recall of the words and leading to shorter latencies. The data showed that there was on average 22ms for the items in the three-item condition than those in the four-item condition. However, the prediction of the effect of Word Type was not fully met. As was - 35 -

expected, participants responded to the items in totally constant condition significantly faster than those in the underlyingly constant condition and the variable condition in the four-item sets. At first glance, the difference might result from the difference of the frequency of response words or semantic associations of the prompts between the totally constant condition and the other two Word Type conditions, since there were statistically significant difference in terms of the error rates between the totally constant condition and the other two Word Type conditions. However, this possibility did not hold true with closer examination, because there were no significant differences in the error rates between the three-item condition and the four-item condition across the three Word Types, and the reaction times between the three Word Type conditions in the three-item sets also displayed no statistically significant difference. Thus, the difference of the reaction times of the items between the totally constant condition and both the underlyingly constant condition and the variable condition in the four-item sets should be due to the spoiling effect of the odd-man-outs in the underlyingly constant condition and in the variable condition in the four-item sets. The odd-man-outs in the underlyingly constant condition and the variable condition both spoiled the priming effects of the whole sets, resulting in larger discrepancies between items in the four-item sets and the counterparts in the three-item sets in the underlyingly constant condition and the variable condition than those in the totally constant condition. The results indicated that the implicit priming effects were stronger only when both the surface and the underlying tonal representation were shared. However, no difference between the spoken latencies of the underlyingly constant condition and the variable condition in the four-item condition was found. This suggested that only sharing the underlying tonal representation did not lead to stronger priming effects, contradictory to the previous predictions. Both the odd-man-outs in the underlyingly constant condition and the variable condition caused the same disturbance of the priming effects of the whole sets. This cast some doubts on the role of the underlying tonal representation in the tonal encoding. Experiment 1 only demonstrates that sharing the mere underlying tonal - 36 -

representation of the third tone did not facilitate the production of the half-third sandhi variant of Mandarin Tone 3. Whether the production of the third-tone sandhi variant of Mandarin Tone 3 can be speeded up when the underlying tonal representation of Tone 3 was pre-activated still needs investigation. Thus, Experiment 2 was intended to directly delve into this issue.

3.3

Experiment 2

3.3.1 Method Participants Another twenty-three Chinese postgraduate students from the Guangdong University of Foreign Studies in China participated in this Experiment. Though they came from different Chinese regions, they could speak Mandarin fluently and they used it as their major language for communication on the campus. All of them had normal or corrected-to-normal vision, without any hearing deficit or phonological disorders. None of them participated in the Experiment 1. Design and predictions Experiment 2 followed the same design of Experiment 1 with some slight modifications. The experiment was a 2×2 within-subject design, with the same factors Set Size and Word Type as two independent variables. The factor Set Size still included three-item and four-item conditions. The Word Type in this experiment was only composed of two levels: underlyingly constant condition and variable condition. Items except the odd-man-outs in the four-item blocks shared the same initial morphemes. In each Word Type conditions, the initial morphemes of the three items excluding the odd-man-outs in the four-item blocks all shared the third-tone sandhi variant of Tone 3. An example was ―gu[3] zhang[3]‖ (鼓掌, ―applause‖), ―gu[3] shou[3]‖ (鼓手, ―drummer‖), ―gu[3] wu[3]‖ (鼓舞, ―encouragement‖) in which all the initial morphemes should change the underlying Tone 3 into a surface Tone 2. In the underlyingly constant four-item blocks, an odd-man-out was incorporated with its initial morphemes associated with the half-third sandhi variant of Tone 3, e.g. ―gu[3] - 37 -

shi[4]‖ (股市, ―stock market‖). That is, the initial morpheme of the odd-man-out shared the same underlying tonal representation of Tone 3 with the other three items but only differed in terms of surface tonal form. In the variable four-item blocks, there was an odd-man-out with its initial morpheme carrying either Tone 1 or Tone 4, different from the initial morphemes of the other three items in both underlying tonal representation and surface tonal representation. Examples are illustrated in Table 3-5.

Set Size Word Type Four-item set

Three-item set

Underlyingly

gu[3] dian[3] (古典, ―classical‖)

gu[3] dian[3] (古典, ―classical‖)

Constant

gu[3] lao[3] (古老, ―old‖)

gu[3] lao[3] (古老, ―old‖)

condition

gu[3] dong[3](古董, ―antique‖)

gu[3] dong[3](古董, ―antique‖)

gu[3] shi[4] (股市, ―stock market‖) Variable

gu[3] zhang[3] (鼓掌, ―applause‖)

gu[3] zhang[3] (鼓掌, ―applause‖)

condition

gu[3] shou[3] (鼓手, ―drummer‖)

gu[3] shou[3] (鼓手, ―drummer‖)

gu[3] wu[3](鼓舞, ―encouragement‖)

gu[3] wu[3](鼓舞, ―encouragement‖)

gu[4] lü[4] (顾虑, ―misgivings‖)

Table 3-5 Examples of response words in the underlyingly constant and variable conditions within a four-item and a three-item set in Experiment 2

According to Liu‘s (2006) model for tone sandhi production, the tone sandhi production is a discrete sequential process, and the surface third-tone sandhi variant is derived from the underlying representation of the third tone shared by the half-third sandhi variant. Based on these assumptions, it is predicted that sharing underlying tonal representation should facilitate the third-tone sandhi production. Thus, the spoken latencies for items in the underlyingly constant condition will be smaller than those in the variable condition in the four-item condition, i.e. RT (underlyingly constant) < RT (variable). Since the initial morpheme of the odd-man-out in the underlyingly constant condition shared the same underlying tonal representation as those of the other three items, the odd-man-out in the underlyingly constant condition - 38 -

should less spoil the priming effects than those in the variable condition. Thus, the priming effects (the difference of spoken latencies between the three items in the four-item blocks and their counterparts in the three-item blocks) are also expected to be larger for the variable condition than the underlyingly constant condition, i.e. PE (standing for priming effects) (variable) > PE (underlyingly constant). Materials 16 disyllabic Mandarin words were selected as response words. These words were grouped into 4 four-item blocks, with two blocks in each word type condition. All the items in each block shared the initial syllable, and two different syllable tokens (gu and jian) were used for the two blocks in every Word Type condition. In each underlyingly constant four-item block, the initial morpheme of the odd-man-out carried the half-third sandhi variant of Tone 3 while the initial morphemes of the remaining three items were associated with the third-tone sandhi variant of Tone 3. An example was ―gu[3] dian[3]‖ (古典, ―classical‖), ―gu[3] lao[3]‖ (古老, ―old‖), ―gu[3] dong[3]‖ (古董, ―antique‖), with an odd-man-out ―gu[3] shi[4]‖ (股市, ―stock market‖). In each variable four-item blocks, the initial morpheme of the odd-man-out carried either Tone 1 or Tone 4 whereas the initial morphemes of the other three items shared the third-tone sandhi variant of Tone 3. An example was ―gu[3] zhang[3]‖ (鼓掌, ―applause‖), ―gu[3] shou[3]‖ (鼓手, ―drummer‖), ―gu[3] wu[3]‖ ( 鼓 舞 , ―encouragement‖), with an odd-man-out ―gu[4] l ü [4]‖ ( 顾 虑 , ―misgivings‖). The initial morphemes of the three items excluding the odd-man-out in a block all shared the same morpheme, and the initial morpheme of the odd-man-out differed from the other three items in a block. The second syllables of the odd-man-outs were associated with either Tone 1 or Tone 4. The Chinese character of the initial morpheme of the odd-man-out was also manipulated so that it did not share too much orthographical similarity to the characters of the initial morphemes of the other three items in the same blocks. The corresponding three-item blocks were derived from the four-item blocks by removing the odd-man-outs. This resulted in two three-item blocks in each Word Type - 39 -

condition, creating 4 corresponding three-item blocks. Another 16 disyllabic Mandarin words were chosen as prompts for each response word. All the prompt-response pairs bore an obvious semantic association. All the prompts were only made up of morphemes with either Tone 1 or Tone 4. None of the prompts contained the same syllables or rhymes as the corresponding response words. The morphemes of each prompt were also kept orthographically distinct from those of its corresponding response word. The entire list can be seen in Appendix B. Procedure and apparatus Participants received eight blocks in the experiment, with four four-item blocks and four derived three-item blocks. The four-item condition and three-item condition were presented to participants subsequently and the order of the presentation of these two conditions was counterbalanced across the participants. Within each Set Size condition, blocks of different Word Type conditions were randomly displayed. Each prompt item in each block was randomly repeated four times, with no intermediate repetitions of the same prompt. This created 112 trials altogether. And other procedures and apparatus were identical to those in Experiment 1. The average duration for Experiment 2 was 25 minutes. Data analyses The reaction times and vocal responses were also both examined in the program CheckVocal. Mis-triggered reaction times were recalculated by CheckVocal through placing the mis-triggered RT red line in the right position in the spectrograms. For the syllable ―gu‖, the mis-triggered RT red line was adjusted and placed at the time point of the first burst of the consonant /k/ in the spectrogram, as is shown in Figure 3-3. For the syllable ―jian‖, the mis-triggered RT red line was adjusted and placed at the beginning of high frequency noise of the consonant /tɕ/, as shown in Figure 3-4. As mentioned above, the odd-man-outs in the four-item blocks were excluded in the reaction time analyses. Mis-triggered reaction times were readjusted only when the responses did not contain fillers, hesitations, stuttering, mispronunciations, corrected responses, coughs and other illegal sounds, all of which were coded as errors. After the verification of reaction times, reaction times that were more than 1000ms or less - 40 -

than 200ms were regarded as invalid and removed in the reaction times analyses. Reaction times from error-coded responses also did not enter into the analyses. Other processes of data analyses were the same as those in Experiment 1.

Burst

Figure 3-3

The correctly triggered RT red line for the syllable ―gu‖ in CheckVocal

High frequency noise

Figure 3-4

The correctly triggered RT red line for the syllable ―jian‖ in CheckVocal

3.3.2 Results Analyses of reaction times In the by-participants analyses, the overall main effect of Set Size was significant (F1 (1, 22) = 7.586, p = .012 < .05). However, the overall main effect of Word Type was not significant (F1 (1, 22) =.022, p = .833 > .05). The interaction of Set Size and Word Type was also not significant (F1 (1, 22) = .079, p = .782 > .05). Further analyses showed that there were significant differences of the spoken latencies of the three items in the three-item blocks from the counterparts in the corresponding four-item blocks both in the underlyingly constant condition (F1 (1, 22) = 4.406, p - 41 -

= .048 < .05) and in the variable condition (F1 (1, 22) = 7.259, p = .013 < .05). The priming effects (the difference between voice onset latencies of the four-item condition and the corresponding three-item condition within each Word Type condition) in the underlyingly constant condition (686 – 665 = 21ms) were slightly smaller than the priming effects in the variable condition (686 – 662 = 24ms). However, this slight difference was not significant since there was no significant effect of Word Type found in the data. The mean reaction times, standard deviations, error rates, and priming effects of the by-participants analyses were all summarized in Table 3-6.

Set Size Priming Word Type

Four-item condition

Three-item condition effects

M

SD

%Err

M

SD

%Err

Underlyingly constant

686

56

(4.5)

665

70

(4.5)

21

Variable

686

68

(4.4)

662

81

(3.3)

24

Table 3-6

Mean reaction times (in ms), standard deviations, percentage error rates

(in parentheses), and priming effects (in ms) of the by-participants analyses in Experiment 2

In the by-items analyses, the overall effect of Set Size was significant (F2 (1, 10) = 13.635, p = .004 < .01). But the overall main effect of Word Type was not significant (F2 (1, 10) =.051, p = .825 > .05). The interaction of Set Size and Word Type was also not significant (F2 (1, 10) = .369, p = .557 > .05). Further t-tests revealed that the differences of the spoken latencies of the three items between the three-item blocks and the corresponding four-item blocks were close to significant for the underlyingly constant condition (t (5) = 2.353, p = .065) and significant for the variable condition (t (5) = 2.847, p = .036 < .05). The priming effects (the difference between voice onset latencies of the four-item condition and the corresponding three-item condition within each Word Type condition) in the underlyingly constant

- 42 -

condition were slightly smaller (687 – 667 = 20ms) than those in the variable condition (688 – 660 = 28ms). However, this difference was also not significant because the effect of Word Type was found insignificant. The mean reaction times, standard deviations, error rates, and priming effects of the by-items analyses were all summarized in Table 3-7.

Set Size Priming Word Type

Four-item condition

Three-item condition effects

M

SD

%Err

M

SD

%Err

Underlyingly constant

687

20

(4.5)

667

25

(4.5)

20

Variable

688

25

(4.4)

660

12

(3.3)

28

Table 3-7

Mean reaction times (in ms), standard deviations, percentage error rates

(in parentheses), and priming effects (in ms) of the by-items analyses in Experiment 2

Analyses of error rates Overall, the error rates were about 4.2% in experiment 2. Since none of the effects of error rates were found significant, the analyses of error rates were not reported here.

3.3.3 Discussion As was expected, the spoken latencies of the three items in the three-item blocks were smaller than those in the four-item blocks. This was in consistent with the results in Experiment 1, because the item number was less in the three-item blocks, which made it easier for participants to recall. However, no significant effect of Word Type was found. The reaction times of the items in the underlyingly constant condition did not differ from those in the variable condition in the four-item condition. Nor the priming effects differed a lot between the two conditions. This suggested that only sharing the underlying tonal representation of Tone 3 also did not facilitate the production of the - 43 -

third-tone sandhi variant of Tone 3. It seemed that the underlying tonal representation could not be prepared in advance. This did not conform to the previous prediction for the third-tone sandhi production, casting some doubts on those proposals that the underlying tonal representation plays a role in the tone sandhi production. The results of Experiment 2 are contradictory to some findings of a recent study conducted by Chen, Chen and Dai (to appear), who utilized the traditional version of the implicit priming paradigm to delve into the tone sandhi production. They found that blocks containing words sharing same surface tonal forms after third-tone sandhi resulted in similar priming effects to those blocks sharing the underlying tonal forms. For example, the sets sharing initial syllables with the same surface tones like ―qi[3] wu[3]‖ (起舞, ―dancing‖), ―qi[3] tao[3]‖ (乞讨, ―begging‖), ―qi[3] xian[3]‖ (奇险, ―spectacular adventure‖), ―qi[3] dao[3]‖ (祈祷, ―praying‖) had similar sizes of priming effects to the sets sharing initial syllable with the same underlying third tones like ―dao[3] yan[3]‖ (导演, ―director‖), ―dao[3] gui[3]‖ (捣鬼, ―making mischief‖), ―dao[3] mei[3]‖ (倒霉, ―have a bad luck‖), ―dao[3] guo[3]‖ (岛国, ―island country‖). Strikingly, they also observed similar priming effects when the homogenous blocks with the same initial syllables consisting of two Tone 2s and two half-third sandhi variants of Tone 3s. In contrast, they found that when the homogenous blocks with the same initial syllables consisting of two half-third sandhi variants of Tone 3s and two Tone 1s, the priming effects were reduced. Based on their data, they suggested that the tone sandhi may be the articulatory operation when carried out on-line and the Tone 2 or the third-tone sandhi variant shares the same initial low dipping part with the Tone 3. Hence, the Tone 2 or the third-tone sandhi variant can facilitate the production of Tone 3. They drew on the previous acoustic studies to support their interpretation. However, Experiment 2 failed to find out the effect of facilitation when the sets contained an odd-man-out with the initial syllable carrying the half-third sandhi tone, compared with those sets that contained an odd-man-out with the initial syllable carrying a Tone 1 or Tone 4. If the interpretation of Chen et al. is correct, the reaction times in the underlyingly constant condition in Experiment 2 are supposed to be faster than those in the variable condition. But this was not found in the current research. - 44 -

Pitch curve

Figure 3-5 Pronunciation of the half-third sandhi variant of Tone 3 for the word ―gu[3] shi[4]‖ (股市, ―stock market‖) from one participant in Experiment 2

One of the possible reasons for the conflicting results may be the intrinsic problem with the classic version of the implicit priming paradigm. As noted in 2.2.3 in Chapter 2, the classic version of the implicit priming paradigm used in their study does not make direct comparisons of the priming effects in the same experiment. The sizes of the priming effect are indirectly compared across different experiments, which probably introduce some unknown factors. In addition, the interpretation of Chen et al. (to appear) might underestimate or neglect the real phonetic values of the half-third sandhi variant of Mandarin Tone 3. Though some previous research suggested that Tone 3 and Tone 2 share some initial similarities (Jongman, Yue, Moore, & Sereno, 2006), the phonetic value of Tone 3 measured or mentioned in the literature in fact refers to the citation form of Tone 3, i.e., the form pronounced in isolation, as in the study of Chen et al. (to appear). When it is pronounced as the half-third sandhi variant in the non-final position, it is usually realized as a low falling tone, which differs a lot from the third-tone sandhi variant or Tone 2 that are usually realized as a high rising tone. This was observed by many participants in the two experiments when they produced the half-third sandhi variant, as can be shown by the following pitch curve extracted by the phonetic analysis software Praat (Boersma & - 45 -

Weenink, 2011) in Figure 3-5. Nonetheless, Chen et al.‘s (to appear) study indicated that the sandhi tones might be stored in terms of their surface tonal values instead of the underlying tonal representation, converging with the results of the two experiments in the current research.

- 46 -

CHAPTER FOUR GENERAL DISCUSSION

4.1

Summary of the findings of the two experiments

The aim of the current study is to apply the implicit priming paradigm to test whether the underlying tonal representation plays any role in the sandhi production of Mandarin Tone 3. Experiment 1 showed that when both the underlying tonal representation and surface tonal representation were shared, the reaction times became faster than the situations when only underlying tonal representation was shared, or when neither the underlying tonal representation nor surface tonal representation was shared. This indicated that only when both the underlying tonal representation and surface tonal representation were shared could the tonal production be facilitated. But in Experiment 1 and 2, it was found that only sharing the underlying tonal representation did not lead to stronger priming effects than the circumstances under which neither underlying nor surface tonal representation was shared. This demonstrated that the underlying tonal representation could not be prepared in advance during the production, casting doubts on the role of the underlying tonal representation in the tonal encoding. According to some previous proposals for tone sandhi production (Chen, 1999; Liu, 2006; Wan & Jaeger, 1998), the third-tone sandhi variant and the half-third sandhi variant of Mandarin Tone 3 are derived from the shared underlying tonal representation, and the process of tone sandhi production is serial in nature. Based on these proposals, the underlying tonal representation should play a role in the tonal encoding. Only sharing the underlying tonal representation is supposed to lead to shorter vocal reaction times than the cases in which neither the underlying nor the surface tonal representation is shared. However, the experimental results in this study did not conform to the prediction. It - 47 -

seemed that the derivational processing view for tone sandhi production did not find support in the current study.

4.2

Explanations

The experimental results indeed raise questions about the psychological reality of the underlying tonal representation. The current results does not uphold the ‗canonical representation view‘ and the ‗abstract representation view‘ mentioned in Zhou and Marslen-Wilson (1997) , both of which assume that the sandhi tones are stored in terms of a shared underlying representation in the mental lexicon. Together with Chen et al.‘s (to appear) study, the current research indicates that the sandhi variants of Mandarin Tone 3 are represented in the lexicon by virtue of their surface phonetic forms instead of a underlying tonal representation, corresponding to the ‗surface representation view‘ in Zhou and Marslen-Wilson (1997). This is also similar to the conclusion drawn from Xu‘s (1991) study on the representation of the sandhi variants of Mandarin Tone 3 in the short-term memory. For real words that are already stored in the lexicon, it seems that people may just retrieve their surface tonal values during the on-line processing of the tone sandhi production, since the lexical items are stored in terms of their specific phonetic values. There is no need to propose the additional computation for the derivation from an underlying tonal representation. However, it might be not sufficient to just assume that the tone sandhi production is only a retrieving process of the surface phonetic values of the real words. Instead, it should also take the following facts into consideration. Firstly, it is claimed that the tone sandhi rule is productive (Wan & Jaeger, 1998). It can be applied to new items and even nonce words only when the phonological context of the sandhi rules is satisfied, as demonstrated in the experiments by Deng et al. (2003). Considering the following example from Chen (1999): Wo[3] you[3] liang[3] dian[3] I have two points 我有两点认定 ―I recognize two points.‖

ren[4]-ding[4] recognize

- 48 -

In the above example, the second Tone 3 can optionally change to the third-tone sandhi variant or the half-third sandhi variant. This seems to suggest that both sandhi variants of the same morpheme are stored and may be simultaneously activated. Given the fact that tone sandhi is a productive process, it is not plausible to hold that the tone sandhi production is simply retrieval of the surface tonal values associated with lexical items already stored in the mind. Otherwise, it is difficult to explain why the tone sandhi rules can apply to new items or nonce words. Secondly, there was evidence showing that the half-third sandhi variant was also activated in the third-tone sandhi production. Deng et al. (2003) found that some of their participants made errors by pronouncing the third-tone sandhi variant as the half-third sandhi third variant. They suggested that this was indicative of the activation of both tonal variants of Mandarin Tone 3 in third-tone sandhi production. Similar errors were also observed in the current experiments, where some subjects mispronounced the third-tone sandhi variant as the half-third sandhi third variant. Thus, in order to accommodate the linguistic data and errors, it is better to assume that both tonal variants of Mandarin Tone 3 are stored for the same morpheme in the lexicon, similar to the proposal by Peng (2000). And both variants may be activated during the tone sandhi production, which may lead to competitions between the two phonetic variants, as suggested by Deng et al. (2003). The selection of the particular variant depends on the phonological contexts or other factors. The variant that fits better with the context will receive higher activation, whereas activations of the other variant is also suppressed or inhibited, as posited by Bürki et al. (2010). Finally, the most highly activated variant wins out and will be selected, which is in line with the mechanism of word-form selection in Dell‘s (1986, 1988) model. Surprisingly, these assumptions mesh very well with the basic assumptions of the ―allomorph selection hypothesis‖ posited by Tsay and Myers (1996) in their research into the tone sandhi of Southern Min dialects (also see Zhang & Lai, 2007). Such an explanation is plausible because it can provide good accounts both for the linguistic data and for the speech errors found in the experiments. Since the two sandhi variants of Mandarin Tone 3 are both stored and activated in the production, there is also no - 49 -

need to additionally maintain a so-called underlying representation in the on-line production. The selection of the particular sandhi form will be only subject to the specific linguistic context. If the sandhi variant is not properly activated, for instance, when the half-third sandhi variant is inappropriately selected, slips of tongue will occur and the third-tone sandhi variant of Mandarin Tone 3 will be mispronounced as the half-third sandhi variant.

Pitch curve

Figure 4-1 Mispronunciation of the half-third sandhi variant of Tone 3 as the third-tone sandhi variant for the word ―wu[3] jiao[4]‖ (午觉, ―nap‖) from one participant in Experiment 1 Introducing the competition of two tonal variants into the on-line processing of the tone sandhi production makes a prediction: During the production of the half-third sandhi variant of Mandarin Tone 3, the other sandhi variant might also be activated. If the selection goes awry, it will be expected that the third-tone sandhi variant will be also inappropriately selected to substitute for the half-third sandhi variant. It should be possible to find that the speakers mispronounce the half-third sandhi variant of Mandarin Tone 3 as the third-tone sandhi variant. This slip of tongue, though infrequent, was indeed observed in the Experiment 1, as can be showed by the following pitch curve extracted by the acoustic analysis program Praat (Boersma & Weenink, 2011) in Figure 4-1. In Figure 4-1, the initial morpheme of the word ―wu[3] jiao[4]‖ (午觉, ―nap‖), which should be pronounced as the half-third sandhi variant, was mispronounced as the third-tone sandhi variant by one participant, a rising tone, - 50 -

as can be reflected by the extracted fundamental frequency curve in the phonetic analysis software Praat. The current experimental results can be viably explained by assuming the competitions between the two sandhi variants of Mandarin Tone 3 during the production. Only sharing the underlying tonal representation did not facilitate the production because there was possibly no underlying tonal representation involved in the processing. Instead, two surface sandhi variants of Tone 3 were activated simultaneously during the production. Thus, the so-called underlying tonal representation could not be prepared in advance. Only when the surface and underlying tonal representations were shared, the competition between the two sandhi variants could be quickly resolved and skipped, because only the surface tonal form can be fully prepared and sharing the same surface form can quickly help speakers indentify the exact phonetic values and suppress the other variant before articulation. Thus, the tonal encoding could be speeded up, leading to shorter voice onset latencies. In sum, the current experiments support that the surface tone sandhi variants are stored in the lexicon. But the tone sandhi production may be not merely retrieving the surface tonal values, considering the linguistic data and speech errors. Both tonal variants may be activated during the on-line processing and the variant that receives higher activation will be selected in the end.

4.3

Implications for speech production models and phonological variants

Traditional production models assume that if a word has more than one phonological variant, only one variant will be stored in the lexicon, and other variants will be derived from rules. As mentioned before, such a view is under the influence of traditional generative framework (Chomsky & Halle, 1968). As to the Mandarin tone sandhi phenomena, similar views are also held by some proposals (Chen, 1999; Liu, 2006; Wan & Jaeger, 1998), which assume that the third-tone sandhi variant is derived from the underlying tonal representation of Mandarin Tone 3. However, the current study indicates that this may not be the case. It is probable that both tonal variants are - 51 -

stored in the lexicon and selected among competitions during the on-line production. Such a view may hold true not only for suprasgemental processing like tonal production but also segmental processing. For example, recent studies on the production of French schwa deletion (Bürki et al., 2010, 2011) shows that it is possible for the participants to store more than one phonological variant in the lexicon, and it is not necessary to derive one variant from another variant by rules. Therefore, the production of phonological or phonetic variants may not involve additional on-line computation. Instead, they are possibly pre-stored and selected from the lexicon during the on-line production. Strikingly, this echoes the view proposed in Bresnan‘s (1978, 2001) lexical-functional grammar, which contends that different morphological forms of a word and their fitting syntactic context should be included in the mental lexicon and no transformational syntactic rules are needed. In the same vein, different phonological or phonetic variants of a word form are probably all stored in the lexicon, and there is no need to propose additional phonological or phonetic rules in the on-line processing. Furthermore, competitions between different variants may need to be introduced into the phonological encoding in the current speech production models, as argued by Wheeldon (1999). Selection of a particular phonological variant is dependent on the variants‘ fitness with phonological contexts. The incorporation of the competition into the on-line processing can be termed ‗interactive view‘. Compared with the previously hypothesized derivational view, this interactive view seems more compatible with the experimental results in the current study. This interactive view is also in tune with some newly developed models like exemplar models (Johnson, 1997), which assume every phonetic variant is stored in the lexicon. If all the phonetic variants are stored in the lexicon, the speech production will involve competitions among a number of phonetic variants. The variant that fits the context best will finally win out and be selected. This has profound implications for the current major phonological encoding theories like the WEAVER++ model (Levelt et al., 1999), which may need to take the competitions and multi-phonological representations into account. - 52 -

But it should be cautious to generalize this interactive account to all the phenomena of phonological variants. Assimilation is such a case which is still better handled by the derivational view. If all the assimilated variants are stored in the lexicon as suggested by exemplar models, a large amount of variants will be required to be stored. Selection of a particular variant among a large amount of phonetic variants will pose a great burden for the processing. Furthermore, assimilation usually has a clear articulatory base. Thus, assimilation is better accounted for by the derivation from abstract underlying representations, resulting from automatic overlapping of abstract gestural scores. In contrast, tone sandhi, unlike the assimilation, only involves two phonetic variants. Moreover, the third-tone sandhi rule cannot be ascribed to clear phonetic foundations like assimilation and may be manifestation of phonotactic ad-hoc rules of particular

languages. Thus,

language-specific phenomena like tone sandhi may be more effectively explained by the interactive view.

4.4

Limitations and suggestions for future research

This study provides insights into the production process of tone sandhi and the representations of the Mandarin third tone. However, this study has limitations in several aspects and future research is needed to confirm the current results. Firstly, speech production in the laboratory might differ from production in natural settings. Therefore, more investigations from spoken corpus studies on tone sandhi production should be conducted in order to gain a full picture of tone sandhi production. Secondly, the current study only uses one type of experimental paradigm. Different experimental paradigms should be used in the future to confirm the current results. Thirdly, the results of the current study are only confined to the participants in these experiments. The experiments should be replicated and verified on more different groups of Mandarin speakers. Fourthly, the current investigation only focuses on the processing of real words. Thus, the results need further confirmation by using nonce words in future studies. Fifthly, the current research did not conduct a large scale of - 53 -

acoustic analysis of the recorded data, which may reveal more findings. Future chronometric studies could combine with the acoustic analysis so as to get a deeper understanding of the tone sandhi process. Finally, the current inconsistency in Experiment 2 with Chen et al.‘s (to appear) research also deserves further examination. Moreover, the current study only focuses on word production. However, the application of the tone sandhi rules is often constrained by the syntax (Chen, 2000; Yip, 2002). Thus, further investigation into the interaction of syntactic information and tone sandhi during the on-line production should be also conducted. The experimental paradigm utilized in the current research can be also extended to probe into other research questions concerning the tonal representations and tonal production in Mandarin and other Chinese dialects. For example, the controversy over the representation of the third-tone sandhi variant in Mandarin, that is, whether it shares the same representation as Mandarin Tone 2 or not, could be tested with the application of the implicit priming paradigm. Whether the two tonal mergers share the same phonological representation in the lexicon can be also dealt with by employing this experimental paradigm. Therefore, the current experimental paradigm constitutes a very insightful and promising tool for the future studies.

- 54 -

CHAPTER FIVE CONCLUSION

In conclusion, this study attempts to use implicit priming experiments to investigate the role of underlying tonal representation in tone sandhi production of Mandarin Tone 3. This study shows that only sharing the underlying tonal representation but not surface form could not facilitate the production. The production could benefit only when the underlying and surface tonal representation were shared. This did not support those proposals which assume that the third-tone sandhi variant and half-third sandhi variant of Mandarin Tone 3 are both serially derived from the shared underlying tonal representation during the tone sandhi production. Such a derivational view cannot explain the data reported in this study. The experimental results may be better explained by an interactive view which assumes that both of the two sandhi variants of Mandarin Tone 3 are stored in the lexicon and activated, and they compete with each other during the tonal production. Tone sandhi occurs abundantly in different Chinese dialects with delicate complexities. However, the processing mechanisms behind the tone sandhi are still far less clear. This study is a preliminary attempt to investigate the tone sandhi phenomena with the application of on-line experiments. Further research into the production of tone sandhi not only in Mandarin but also in other Chinese dialects is required, which may shed new light on the tonal processing and speech production models in the future.

- 55 -

- 56 -

REFERENCES

Boersma, P., & Weenink, D. (2011). Praat: Doing phonetics by computer (Version 5.1) [Computer program] (Vol. 2011.3.1). Bresnan, J. (1978). A realistic transformational grammar. In J. Bresnan, M. Halle & G. A. Miller (Eds.), Linguistic Theory and Psychological Reality (pp. 1-59). Cambridge, MA: MIT Press. Bresnan, J. (2001). Lexical-functional Syntax. Oxford: Blackwell. Bürki, A., Alario, F. X., & Frauenfelder, U. H. (2011). Lexical representation of phonological variants: Evidence from pseudohomophone effects in different regiolects. Journal of Memory and Language, 64(4), 424-442. Bürki, A., Ernestus, M., & Frauenfelder, U. H. (2010). Is there only one "fenêtre" in the production lexicon? On-line evidence on the nature of phonological representations of pronunciation variants for French schwa words. Journal of Memory and Language, 62, 421-437. Chao, Y. (1930). A system of tone letters. Le Maître Phonétique, 45, 24-47. Chen, J. (1999). The representation and processing of tone in Mandarin Chinese: Evidence from slips of the tongue. Applied Psycholinguistics, 20, 289-301. Chen, J., Chen, T., & Dai, Y. (to appear). Cognitive mechanisms and locus of tone sandhi during Chinese spoken word production. Paper presented at the Third International Symposium on Tonal Aspects of Languages, Nanjing Normal University, Nanjing, China. Chen, J., Chen, T., & Dell, G. S. (2002). Word-Form Encoding in Mandarin Chinese as Assessed by the Implicit Priming Task. Journal of Memory and Language, 46, 751-781. Chen, J., & Chen, T. (2007). Form encoding in Chinese word production does not involve morphemes. Language and Cognitive Processes, 22(7), 1001-1020. Chen, M. (2000). Tone Sandhi: Patterns Across Chinese Dialects. Cambridge: Cambridge University Press. Chen, T. (2007). Morphological encoding in Chinese word production. Unpublished doctoral dissertation, National Chung Cheng University, Taibei. Chen, T., & Chen, J. (2006). Morphological encoding in the production of compound words in Mandarin Chinese. Journal of Memory and Language, 54, 491-514. Cholin, J., Schiller, N. O., & Levelt, W. J. M. (2004). The preparation of syllables in speech production. Journal of Memory and Language, 50, 47-61. - 57 -

Cholin, J., & Levelt, W. J. M. (2009). Effects of syllable preparation and syllable frequency in speech production: Further evidence for syllabic units at a post-lexical level. Language and Cognitive Processes, 24(5), 662-684. Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. New York: Harper & Row. Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93, 283-321. Dell, G. S. (1988). The retrieval of phonological forms in production: Tests of predictions from a connectionist model. Journal of Memory and Language, 27, 124-142. Dell, G. S., Schwartz, M. F., Martin, N., Saffran, E. M., & Gagnon, D. A. (1997). Lexical access in aphasic and nonaphasic speakers. Psychological Review, 104, 801-838. Deng, Y., Feng, L., & Peng, D. (2003). Research on articulatory rule of third tone sandhi of Standard Chinese in different contexts. Acta Psychologica Sinica, 35(6), 719-725. Foster, K. I., & Foster, J. C. (2003). DMDX: A windows display program with millisecond accuracy. Behavior Research Methods, 1(35), 116-124. Janssen, D. P., Roelofs, A., & Levelt, W. J. M. (2002). Inflectional frames in language production. Language and Cognitive Processes, 17, 209-236. Johnson, K. (1997). The auditory/perceptual basis for speech segmentation. Ohio State University Working Papers in Linguistics, 50, 101-113. Jongman, A., Yue, W., Moore, C. B., & Sereno, J. A. (2006). Perception and production of Mandarin Chinese tones. In P. Li, E. Bates, L. Tan & O. Tzeng (Eds.), The Handbook of East Asian Psycholinguistics: Volume 1, Chinese (pp. 209-217). Cambridge: Cambridge University Press. Kessler, B., Treiman, R., & Mullennix, J. (2002). Phonetic Biases in Voice Key Response Time Measurements. Journal of Memory and Language, 47(1), 145-171. Kuo, Y., Xu, Y., & Yip, M. (2007). The phonetics and phonology of apparent eases of iterative tonal change in Standard Chinese. In C. Gussenhoven & T. Riad (Eds.), Tones and Tunes (Volume 2): Experimental Studies in Word and Sentence Prosody. Berlin: Mouton de Gruyter. Ladeforged, P. (2006). A Course in Phonetics (5th ed.). Boston: Thomson. Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge: MIT Press. Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral and brain sciences, 22, 1-38. Liu, G. (2006). Word-Form encoding in Mandarin Chinese word production: The - 58 -

roles of the syllable and the prosodic frame. Unpublished MA. thesis, National Chung Cheng University, Taibei. Meyer, A. S. (1990). The Time Course of Phonological Encoding in Language Production: The Encoding of Successive Syllables of a Word. Journal of Memory and Language, 29, 524-545. Meyer, A. S. (1991). The Time Course of Phonological Encoding in Language Production: Phonological Encoding Inside a Syllable. Journal of Memory and Language, 30, 69-89. Mooshammer, C., Goldstein, L., Nam, H., McClure, S., Saltzman, E., & Tiede, M. (2012). Bridging planning and execution: Temporal planning of syllables. Journal of Phonetics, 40(3), 374-389. Peng, S. (2000). Lexical versus 'phonological' representations of Mandarin sandhi tones. In M. B. Broe & J. B. Pierrehumbert (Eds.), Acquisition and the Lexicon (pp. 152-167). Cambridge: Cambridge University Press. Protopapas, A. (2007). CheckVocal: A program to facilitate checking the accuracy and response time of vocal responses from DMDX. Behavioral Research Methods, 4(39), 859-862. Roelofs, A. (1992). A spreading-activation theory of lemma retrieval in speaking. Cognition, 42, 107-142. Roelofs, A. (1997). The WEAVER model of word-form encoding in speech production. Cognition, 64, 249-284. Roelofs, A. (2003a). Modeling the relation between the production and recognition of spoken word forms. In N. O. Schiller & A. S. Meyer (Eds.), Phonetics and Phonology in Language Comprehension and Production (pp. 115-158). Berlin: Mouton de Gruyter. Roelofs, A. (2003b). Shared phonological encoding processes and representations of languages in bilingual speakers. Language and Cognitive Processes, 18(2), 175-204. Roelofs, A., & Meyer, A. S. (1998). Metrical Structure in Planning the Production of Spoken Words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(4), 922-939. Rosenbaum, D. A., Inhoff, A. W., & Gordon, A. M. (1984). Choosing between movement sequences: A hierarchical editor model. Journal of Experimental Psychology: General, 113, 372-393. Tsay, J., & Myers, J. (1996). Taiwanese tone sandhi as allomorph selection. Proceedings of the Berkeley Linguistic Society, 22, 394-405. Wan, I., & Jaeger, J. (1998). Speech errors and the representation of tone in Mandarin Chinese. Phonology, 15, 417-461. - 59 -

Wheeldon, L. R. (1999). Competitive processes during word-form encoding. Behavioral and Brain Sciences, 22, 59-60. Xie, Y. (2007). Onset preparation effect in Mandarin Chinese speech production. Unpublished MA. thesis, National Chung Cheng University, Taibei. Xu, Y. (1991). Depth of phonological recoding in short-term memory. Memory & Cognition, 19(3), 263-273. Yip, M. (2002). Tone. Cambridge: Cambridge University Press. Zhang, J. (2007). A directional asymmetry in Chinese tone sandhi systems. Journal of East Asian Linguistics, 16, 259-302. Zhang J., & Y. Lai. (2007). Two Aspects of Productivity in Taiwanese Double Reduplication. Paper presented at the 15th Annual Meeting of the International Association of Chinese Linguistics and 19th Annual North American Conference on Chinese Linguistics, Columbia University, New York. Zhang, J., & Lai, Y. (2010). Testing the role of phonetic knowledge in Mandarin tone sandhi. Phonology, 27, 153-201. Zhang, Q. (2008). Phonological Encoding in Monosyllabic and Bisyllabic Mandarin Word Production: Implicit Priming Paradigm Study. Acta Psychologica Sinic, 40(3), 253-262. Zhou, X. L., & Marslen-Wilson, W. (1997). The abstractness of phonological representation in the Chinese mental lexicon. In H. Huang (Ed.), Cognitive Processing of Chinese and Related Asian Languages (pp. 3-26). Hong Kong: The Chinese University Press.

- 60 -

APPENDICES

Appendix A Materials for Experiment 1 Prompts

Four-item set

Three-item set

Totally

shu[4] zhi[1] (树枝, ―branch‖)

wu[3] dong[4] (舞动, ―to wave‖)

wu[3] dong[4]

Constant

qi[1] pian[4] (欺骗, ―deceive‖)

wu[3] bi[4] (舞弊, ―cheating‖)

wu[3] bi[4]

condition

ju[4] yuan[4] (剧院, ―theater‖)

wu[3] ting[1] (舞厅, ―dance hall‖)

wu[3] ting[1]

bu[4] wei[4] (部位, ―body parts‖)

wu[3] guan[1] (五官, ―sensory organs‖)

tiao[1] ti[4] (挑剔, ―hypercritical‖)

yan[3] guan[1] (眼光, ―judgment‖)

yan[3] guan[1]

pei[4] dai[4] (佩戴, ―wear‖)

yan[3] jing[4] (眼镜, ―spectacles‖)

yan[3] jing[4]

xuan[4] yun[1] (眩晕, ―dizzy‖)

yan[3] hua[1] (眼花, ―blurred vision‖)

yan[3] hua[1]

gui[1] na[4] (归纳, ―inductive‖)

yan[3] yi[4] (演绎, ―deductive‖)

mian[4] bao[1] (面包, ―bread‖)

wu[3] can[1] (午餐, ―lunch‖)

wu[3] can[1]

Constant

xiu[1] xi[1] (休息, ―rest‖)

wu[3] jiao[4] (午觉, ―nap‖)

wu[3] jiao[4]

condition

tian[1] hei[1] (天黑, ―become dark‖)

wu[3] ye[4] (午夜, ―midnight)

wu[3] ye[4]

Chang[4] ge[1] (唱歌, ―sing‖)

wu[3] dao[3] (舞蹈, ―dance‖)

pao[4] dan[4] (炮弹, ―bomb‖)

yan[3] hu[4] (掩护, ―shield‖)

yan[3] hu[4]

bao[4] lu[4] (暴露, ―expose‖)

yan[3] gai[4] (掩盖, ―cover up‖)

yan[3] gai[4]

nei[4] xin[1] (内心, ―heart‖)

yan[3] shi[4] (掩饰, ―conceal‖)

yan[3] shi[4]

shuo[1] hua[4] (说话, ―talk‖)

yan[3] jiang[3] (演讲, ―make a speech‖)

Variable

jing[4] ji[4] (竞技, ―athletics‖)

wu[3] shu[4] (武术, ―martial art‖)

wu[3] shu[4]

Condition

jun[1] shi[4] (军事, ―military‖)

wu[3] li[4] (武力, ―force‖)

wu[3] li[4]

pian[4] mian[4] (片面, ―biased‖)

wu[3] duan[4] (武断, ―arbitrary‖)

wu[3] duan[4]

gao[1] shan[4] (高尚, ―lofty‖)

wu[2] chi[3] (无耻, ―shameless‖)

Underlyingly

- 61 -

Variable

yue[4] qi[4] (乐器, ―instrument‖)

yan[3] zou[4] (演奏, ―play‖)

yan[3] zou[4]

condition

jin[4] hua[4] (进化, ―evolution‖)

yan[3] bian[4] (演变, ―evolve‖)

yan[3] bian[4]

gong[1] shi[4] (公式, ―formula‖)

yan[3] suan[4] (演算, ―compute‖)

yan[3] suan[4]

ban[4] shi[4] (办事, ―work‖)

yan[2] jin[3] (严谨, ―meticulous‖)

Appendix B Materials for Experiment 2 Prompts

Four-item set

Three-item set

yin[1] yue[4] (音乐, ―music‖)

gu[3] dian[3] (古典, ―classical‖)

gu[3] dian[3]

Constant

jian[4] zhu[4] (建筑, ―building‖)

gu[3] lao[3] (古老, ―old‖)

gu[3] lao[3]

condition

pai[1] mai[4] (拍卖, ―auction‖)

gu[3] dong[3](古董, ―antique‖)

gu[3] dong[3]

zheng[4] quan[4] (证券, ―securities‖)

gu[3] shi[4] (股市, ―stock market‖)

shu[1] mian[4] (书面, ―written‖)

jian[3] tao[3] (检讨, ―self-criticism‖)

jian[3] tao[3]

zi[1] liao[4] (资料, ―materials‖)

jian[3] suo[3] (检索, ―search‖)

jian[3] suo[3]

gao[4] fa[1] (告发, ―accuse‖)

jian[3] ju[3] (检举, ―prosecute‖)

jian[3] ju[3]

fu[4] zhi[4] (复制, ―copy‖)

jian[3] qie[1] (剪切, ―cut‖)

Variable

guan[1] zhong[4] (观众, ―audience‖)

gu[3] zhang[3] (鼓掌, ―applause‖)

gu[3] zhang[3]

condition

yue[4] shi[1] (乐师, ―musicians‖)

gu[3] shou[3] (鼓手, ―drummer‖)

gu[3] shou[3]

huan[1] xin[1] (欢欣, ―happily‖)

gu[3] wu[3](鼓舞, ―encouragement‖)

gu[3] wu[3]

dan[1] xin[4] (担心, ―worried‖)

gu[4] lü[4] (顾虑, ―misgivings‖)

zeng[1] shou[1] (增收, ―harvest‖)

jian[3] chan[3] (减产, ―drop in

Underlyingly

jian[3] chan[3]

producion‖) suan[4] shu[4] (算术, ―arithmetic‖)

jian[3] fa[3] (减法, ―subtraction‖)

jian[3] fa[3]

shu[4] liang[4] (数量, ―number‖)

jian[3] shao[3] (减少, ―reduce‖)

jian[3] shao[3]

yi[4] zhi[4] (意志, ―will‖)

jian[1] ding[4] (坚定, ―firm‖)

- 62 -