Ethnocentrism Scale (RACES)

i

Development and Validation of the Australian Racism, Acceptance, and CulturalEthnocentrism Scale (RACES): Measuring Racism in Australia

Kaine Grigg BSocSc(Psych), BAppSc(Psych)(Hons)

School of Psychological Sciences Faculty of Medicine, Nursing, and Health Science Monash University

Supervisor: Professor Lenore Manderson

Submitted in partial fulfilment of the requirements of the degree of Doctor of Psychology in Clinical Psychology Specialising in Forensic Psychology July 2014

ii

Our mission is to confront ignorance with knowledge, bigotry with tolerance, and isolation with the outstretched hand of generosity. Racism can, will, and must be defeated. Kofi Annan, March 1999

iii Copyright Notices

Notice 1 Under the Copyright Act 1968, this thesis must be used only under the normal conditions of scholarly fair dealing. In particular no results or conclusions should be extracted from it, nor should it be copied or closely paraphrased in whole or in part without the written consent of the author. Proper written acknowledgement should be made for any assistance obtained from this thesis.

Notice 2 I certify that I have made all reasonable efforts to secure copyright permissions for third-party content included in this thesis and have not knowingly added copyright content to my work without the owner's permission.

iv Table of Contents

Table of Contents ................................................................................................................................. iv

Summary............................................................................................................................................. viii

Acknowledgements .............................................................................................................................. xi

Chapter 1: Introduction ....................................................................................................................... 1 Introduction ........................................................................................................................................... 2 Understanding Race.............................................................................................................................. 3 Defining Racism .................................................................................................................................... 6 Psychological Theories of Racism ...................................................................................................... 10 Racism in Australia............................................................................................................................. 14 Impacts of Racism ............................................................................................................................... 18 Measurement of Racism ..................................................................................................................... 21 Mechanisms of Attitudes .................................................................................................................... 26

Chapter 2: Methods ............................................................................................................................ 29 Methods................................................................................................................................................ 30 Research Setting .................................................................................................................................. 32 Research Development Project Partners .......................................................................................... 36 Participants.......................................................................................................................................... 38 Recruitment ......................................................................................................................................... 39 Research Procedure ............................................................................................................................ 39 Analytical Techniques and Research Design .................................................................................... 41 Achieved Research Timetable ............................................................................................................ 53 Scale Development Procedure............................................................................................................ 54 Scale Refinement and Pilot Testing Procedure ................................................................................ 56 Scale Reliability and Validity Testing ............................................................................................... 58 Chapter 3: Measures of Explicit Racist Attitudes – A Literature Review ..................................... 60 Measures of Explicit Racist Attitudes – A Literature Review ........................................................ 61 Method ................................................................................................................................................. 62 Results .................................................................................................................................................. 66

v Characteristics of Studies Documenting Explicit Measures of Racist Attitudes ........................... 66 Results of Studies Documenting Explicit Measures of Racist Attitudes ........................................ 85 Results of Most Cited Studies Documenting Explicit Measures of Racist Attitudes .................... 94 General Discussion .............................................................................................................................. 96 Limitations ........................................................................................................................................... 96 Conclusion ........................................................................................................................................... 97

Chapter 4: Submitted Research Articles ........................................................................................ 100 Chapter 4.1: “Just a Joke”: Young Australian Understandings of Racism ................................ 102 Abstract.............................................................................................................................................. 103 1. Introduction ................................................................................................................................... 104 1.1. Rationale and Aim ...................................................................................................................... 105 2. Method ........................................................................................................................................... 106 3. Results ............................................................................................................................................ 108 3.1. Group versus Individual ............................................................................................................. 109 3.2. Actions versus Beliefs ................................................................................................................. 111 3.3. Exceptions, Exclusions, and Minimisation ................................................................................ 113 4. Discussion....................................................................................................................................... 116 4.1. Conclusion................................................................................................................................... 122

Chapter 4.2: Developing the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES) ............................................................................................................................................ 124 Abstract.............................................................................................................................................. 125 Developing the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES) 126 Study 1: Item Development .............................................................................................................. 127 Study 2-4 Preliminary Data Analysis .............................................................................................. 128 Study 2: Principal Components and Exploratory Factor Analyses ............................................. 129 Materials and Methods ..................................................................................................................... 129 Results ................................................................................................................................................ 129 Study 3: Exploratory and Confirmatory Factor Analyses ............................................................ 131 Materials and Methods ..................................................................................................................... 131 Results ................................................................................................................................................ 132 Study 4: Item Response Theory Analyses ....................................................................................... 136 Materials and Methods ..................................................................................................................... 136 Results ................................................................................................................................................ 136

vi Study 5: Convergent and Discriminant Validity ............................................................................ 137 Materials and Methods ..................................................................................................................... 137 Results ................................................................................................................................................ 137 General Discussion ............................................................................................................................ 142

Chapter 4.3: Validating the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES): Item Response Theory Findings .................................................................................... 146 Abstract.............................................................................................................................................. 147 Validating the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES): Item Response Theory Findings ...................................................................................................... 148 Method ............................................................................................................................................... 152 Participants and Procedure.............................................................................................................. 152 Measures ............................................................................................................................................ 154 Model Selection ................................................................................................................................. 155 Response Category Variability ........................................................................................................ 155 Unidimensionality ............................................................................................................................. 156 Results ................................................................................................................................................ 159 Unidimensional Model Fit ................................................................................................................ 159 Unidimensional Scale Information .................................................................................................. 164 Multidimensional Model Fit............................................................................................................. 169 Multidimensional Scale Information............................................................................................... 172 Discussion .......................................................................................................................................... 173

Chapter 4.4: Building Harmony: Racism Reduction in Australian Schools ............................... 176 Abstract.............................................................................................................................................. 177 Building Harmony: Racism Reduction in Australian Schools ...................................................... 178 Method ............................................................................................................................................... 179 Research Setting ................................................................................................................................ 179 Participants and Procedure.............................................................................................................. 181 Measures ............................................................................................................................................ 182 Analytical Techniques ...................................................................................................................... 183 Results ................................................................................................................................................ 183 Building Harmony Evaluation: Preliminary Data Analysis .......................................................... 186 Building Harmony Evaluation Findings .......................................................................................... 186 Evaluation of Variable Relationships: Preliminary Data Analysis .............................................. 194 Evaluation of Variable Relationships .............................................................................................. 195

vii Test-Retest Assessment Evaluation: Preliminary Data Analysis.................................................. 199 Test-Retest Reliability Evaluation ................................................................................................... 200 Test-Retest Correlation Evaluation................................................................................................. 202 Discussion .......................................................................................................................................... 202

Chapter 4.5: Is there a Relationship between Psychopathic Traits and Racism?....................... 207 Abstract.............................................................................................................................................. 208 Is there a Relationship between Psychopathic Traits and Racism? ............................................. 209 Method ............................................................................................................................................... 212 Participants and Procedure.............................................................................................................. 212 Measures ............................................................................................................................................ 213 Results ................................................................................................................................................ 214 Discussion .......................................................................................................................................... 223 Conclusion ......................................................................................................................................... 225

Chapter 5: Discussion ....................................................................................................................... 227 Discussion .......................................................................................................................................... 228 Potential Uses of RACES.................................................................................................................. 231 Results Summary .............................................................................................................................. 233 Recommendations ............................................................................................................................. 240 Conclusion ......................................................................................................................................... 261

References .......................................................................................................................................... 263

Appendices ......................................................................................................................................... 305 Appendix 1 Advertising Material .................................................................................................... 306 Appendix 2 Human Research Ethics Material ............................................................................... 341 Appendix 3 Informed Consent Material ......................................................................................... 355 Appendix 4 Survey Material ............................................................................................................ 372 Appendix 5 Supplementary Tables and Figures ............................................................................ 400 Appendix 6 Supplementary Publications ........................................................................................ 459

viii Summary Existing Australian measures of racist attitudes focus on single groups and have generally not been validated across the lifespan. To redress this, a measure of racial, ethnic, cultural, and religious acceptance – the Australian Racism, Acceptance, and CulturalEthnocentrism Scale (RACES) – was developed and validated with children, adolescents, and adults. Drawing on data from semi-structured interviews and focus groups with 30 high school attendees aged 14-22 years, conducted from December 2011 to January 2012 in Victoria, Australia, understandings of and experiences with racism were critically examined. Data demonstrated the ambiguity of racism, while confirming that Australian youth utilise a reasonably consistent and sophisticated explanatory model to conceptualise, explain, and classify racism. The interview and focus group data were used to develop RACES items, and the preliminary instrument was consequently pilot tested with eight children. Expert advice and cognitive interviewing techniques ensured the item content was comprehensive, comprehendible, and relevant. RACES was utilised throughout the implementation of a Victorian anti-racism and pro- diversity initiative, Building Harmony in the Growth Corridor (Building Harmony), which was implemented from March to September 2012 with 296 primary school children. RACES enabled an evaluation of the efficacy of this initiative, which was one of few racism prevention interventions to respond proactively to potential diversity issues as new populations arrive within an identified area. The instrument was also disseminated to 402 adolescents and adults in the Australian community from April 2012 to April 2013. Consequent work aimed to provide the first exploration of psychopathic personality traits and racist attitudes, due to both being anti-social and sharing several commonalities, including their development and manifestation.

ix RACES was refined and its reliability and validity was empirically investigated with data modelled and analysed utilising both Classical Test Theory and Item Response Theory. Psychometric properties, including content, construct, factorial, convergent, discriminant, and predictive validity, in addition to internal consistency and test-retest reliability, were each explored. The analyses provided strong support for the instrument as a robust measure of racist attitudes in the Australian context and for the overall reliability and validity of the 24item RACES across primary school children, adolescents, and adults. The results indicate that RACES is a three-dimensional scale of Accepting Attitudes (12 items), Racist Attitudes (8 items), and Ethnocentric Attitudes (4 items), in addition to a 10-item measure of social desirability, each a reliable and valid scale independently. The instrument is the first Australian measure of general racist attitudes towards all racial, ethnic, cultural, and religious groups to be empirically validated across the lifespan. It is hoped that RACES will be utilised to assess and consequently enhance the efficacy of antiracism and pro-diversity initiatives to assist with the reduction of racism throughout the Australian community.

x Monash University Declaration for thesis based or partially based on conjointly published or unpublished work

General Declaration In accordance with Monash University Doctorate Regulation 17.2 Doctor of Philosophy and Research Master’s regulations the following declarations are made: I hereby declare that this thesis contains no material which has been accepted for the award of any other degree or diploma at any university or equivalent institution and that, to the best of my knowledge and belief, this thesis contains no material previously published or written by another person, except where due reference is made in the text of the thesis. This thesis includes five original papers submitted for publication in peer reviewed journals. Two reports and five presentations completed throughout the candidature are also included in the appendices. The core theme of the thesis is the development and validation of a measure of racist attitudes. The ideas, development and writing up of all the papers in the thesis were the principal responsibility of me, the candidate, working within the School of Psychological Sciences under the supervision of Professor Lenore Manderson. In the case of Chapter 4 my contribution to the work involved the following: Thesis chapter 4.1

Publication title

“Just a Joke”: Young Australian Understandings of Racism 4.2 Developing the Australian Racism, Acceptance, and CulturalEthnocentrism Scale (RACES) 4.3 Validating the Australian Racism, Acceptance, and CulturalEthnocentrism Scale (RACES): Item Response Theory Findings 4.4 Building Harmony: Racism Reduction in Australian Schools 4.5 Is there a Relationship between Psychopathic Traits and Racism? I have not renumbered sections of submitted or presentation within the thesis.

Signed: Date: 01/07/2014

Publication status* Submitted

Nature and extent candidate’s contribution 75% of all components

Submitted

75% of all components

Submitted


Submitted


Submitted


of

published papers in order to generate a consistent

xi Acknowledgements I am indebted to all project participants for their valuable contributions and positive responses. Without the volunteering of their time, mostly without any reimbursement other than the knowledge that their input would be utilised in a bid to reduce the impact of racism in Australia, this research could not have begun, let alone have been completed. Other than the participants, this venture would not have been possible without the support and commitment of a number of additional people and organisations. First and foremost my research supervisor, Professor Lenore Manderson. Without Lenore‟s guidance, support, and expertise throughout the project, I have no doubt that it would not have been completed and the research would not have had the same impact without Lenore‟s influence and knowledge. I will be forever grateful that I had the opportunity to be supervised by such a skilled and professional supervisor, researcher, and academic. It is clear why she is a leader in her field. I also want to acknowledge Professor James Ogloff‟s support and guidance throughout selected parts of the project. Jim offered his time when needed without the requirement of formal recognition as a supervisor. Presenting my research proposal to the Monash University Human Ethics Committee was overwhelming and without Jim at my side I am unsure how I would have handled the situation and what outcome may have been delivered. His expertise in psychopathy and quantitative data analysis were integral to the later components of the research. The statistical advice of Professor Grahame Coleman, Melbourne University and Monash University, Professor Ray Adams, Melbourne University, and Associate Professor John Reece, RMIT University, were similarly integral to parts of the project. I wish to extend my gratitude to the Building Harmony in the Growth Corridor (Building Harmony) strategy partners and supporters, without whom the significant impact on

xii local communities stemming from this project would not have been possible. The ongoing support of my research venture also deserves acknowledgement: I thank Windermere Child and Family Services, the Australian Community Foundation, Sunshine and Crocodiles Pty Ltd, Cardinia Shire Council, and Southern Integrated CALD Child and Family Services Network. I would also like to acknowledge the Building Harmony Executive and Steering Committee members (Stephen Sparrow, Serap Ozdemir, Yasemin Soydas, Doug Bailey, Sue Nelson, Masouda Keshtiar, Rachel George, Simon Greely, David Gleeson, and Judy Linossier), as well as Tim Cooper and Carrolyn Aguis from the Shire of Cardinia for their valuable contributions. Special mention to Serap Ozdemir for her continued support of the importance of involving my research in the Building Harmony project: Serap‟s tenacity and persistence drove the strategy from humble beginnings to a community changing initiative. Principal Tanya Roberts and Primary Welfare Officer Cindy Healy from Pakenham Consolidated School also deserve mention for their strong support of the research, via the involvement of their school pupils as a control group. The sacrifice of school time and resources, with the only immediate direct reward being to contribute to a potentially significant local initiative, was an amazing input which improved the meaningfulness of the investigation substantially. I especially thank the participating primary schools and government departments who contributed to this project: •

Berwick Grammar School

•

Maranatha Christian School

•

Minaret College

•

Officer Primary School

•

Pakenham Consolidated Primary School

•

St Brigid‟s Catholic Primary School

xiii •

Department of Education

•

Victoria Police

I would also like to thank each and every organisation and media outlet that assisted with the promotion of the research. The public interest that the project generated was overwhelming and gives hope that one day, as a community, we will be able to reduce societal levels of racism in a meaningful way and so improve the lives of those impacted by this insidious phenomenon. Finally, I need to acknowledge the support of my friends and family, particularly my wife Eleanor. After being predominantly absent and uninvolved for four years, I must note my appreciation of the ongoing tolerance for the sometimes difficult circumstances this doctorate has brought to my family, friends, and partner. I hope that we are able to continue to deepen our friendships and relationships into the future, and that I am able to provide the same support I received to each of you, whenever it is needed. Kaine Grigg Monash University July 2014

1 Chapter 1: Introduction Introduction ........................................................................................................................................... 2 Understanding Race.............................................................................................................................. 3 Defining Racism .................................................................................................................................... 6 Psychological Theories of Racism ...................................................................................................... 10 Racism in Australia............................................................................................................................. 14 Impacts of Racism ............................................................................................................................... 18 Measurement of Racism ..................................................................................................................... 21 Mechanisms of Attitudes .................................................................................................................... 26

2 Introduction Australians live in a country with unprecedented racial, ethnic, cultural, religious, and linguistic diversity, an artefact of its establishment post 1788 upon a platform of immigration and, from the last decades of the 20th century, policies of multiculturalism. A by-product of this diversity has been increasing reports of racist attitudes and incidents (Dunn, Forrest, PePua, Hynes, & Maeder-Han, 2009). Globally, racism research has grown substantially over the past decade, showing positive associations with an array of negative mental health outcomes. Perceived racism has pervasive negative physical and psychological effects in various minority racial and Indigenous groups (Chou, Asnaani, & Hofmann, 2012; Harrell, Hall, & Taliaferro, 2003; Paradies, 2006b; Pascoe & Richman, 2009; Williams, Neighbors, & Jackson, 2008). Most explorations examine the effects of racism by concentrating on the victims, and less research addresses the factors that produce racism or explores questions related to low levels of acceptance of diverse groups. Several measures of racist attitudes exist, but many concentrate on anti-African attitudes and are validated only for US populations. Some measures of racism exist in Australia, but these have not been empirically developed and validated in Australia, or they focus on single racial or age groups. No scale currently exists capable of objectively evaluating the levels of general racist attitudes in individuals or groups in an Australian context, and hence, the effectiveness of racism-reduction programs cannot be assessed quantitatively. Moreover, there is need for a general measure of racial, ethnic, cultural, and religious acceptance to be constructed that follows an accepted scientific process of scale development, and for the instrument to be appropriately validated for the Australian population. The work detailed within this dissertation aimed to address this gap.

3 Below, I provide an overview of the structure of the thesis, in the context of a discussion of important concepts related to racism, theories and potential impacts of racism, racism in the Australian context, the mechanisms of attitudes, how racism is measured and the importance of doing so. I then offer a summary of the rationale, aims, and hypotheses that guided the research. Next I provide a general overview of scale development and explain the methodology utilised for this study. The following chapter consists of a literature review examining the properties of existing measures of racist attitudes. The subsequent section comprises five research articles describing the study results, from the development to the validation of the novel scale. The final chapters discuss the findings and limitations of the research, identify areas that require future exploration, and reflect on the potential implications of the present investigation. All research materials are presented in the Appendices. Advertising materials utilised throughout the recruitment phase are presented in Appendix 1. All human research ethics material is presented in Appendix 2. The explanatory statements and consent forms utilised at each stage of the data collection are provided in Appendix 3. Each of the data collection materials, including the semi-structured interview schedules, and the surveys implemented in the study, are shown in Appendix 4. Supplementary results and materials are presented in Appendix 5 and 6. Understanding Race Since the introduction of the concept in the scientific literature in 1749, race has become a common demographic variable, especially in health-related research (Gabard & Cooper, 1998). The US Census Bureau has collected data by race since the first decennial census in 1790, vital statistics in the US have been published by race since 1940, and the presentation of data by race is routine in introductory science courses and textbooks (Jones, 2001). With most of the world‟s research and publications originating from the US, the

4 consequent influence of the US globally means these trends are worldwide; every area of science is deeply affected by the idea that the human species is divided into races (Cooper & David, 1986). Yet, an adequate theoretical construct for race remains to be widely accepted. From a biological perspective, the notion of race is an attempt to extend taxonomy beyond the classification of species, with a subspecies considered a geographically circumscribed population with distinct genetic differentiation (Cooper & David, 1986; Templeton, 1998). Homo sapiens show only modest levels of variation among populations; human races therefore do not exist under traditional conceptualisations. Indeed, most human genetic diversity exists as differences among individuals within populations rather than between populations (Blakey, 1999; Phinney, 1996). Recent studies of the genetic basis of human diversity, using molecular biology and DNA analysis, further dismantle race as a valid construct (Morgan, 2002). Only 15.6% of genetic variation has been found to differentiate the major human „races‟, with differences only in relative frequencies of single traits, no demonstrated suite of discrete trait gene differences, and the independent distribution of each trait (Blakey, 1999; Cooper & David, 1986; Gabard & Cooper, 1998; Templeton, 1998). There is considerable overlap in genetic inheritance, making it essentially impossible to classify the human species into separate biological categories with firm boundaries (Phinney, 1996; Williams, 1997). Given the above, many academics have argued that race as a biological or scientific notion should be abandoned completely. Consequently, alternate understandings of race have been proposed. In addition to the genetic construct at least two other notions of race in contemporary society interact with the biological in a dynamic and fragile relationship (Gabard & Cooper, 1998). First is the social construct, which is not only an important component of self-identity, but also a powerful influence on health outcomes. The second is the political construct, which has moved from a

5 focus on exclusion to inclusion, and in numerous ways shapes the social construct. In both everyday and academic use, race is utilised as a representation of an interface of socioeconomic status, ethnicity, culture, religion, and genetic endowment (Gabard & Cooper, 1998; Jones, LaVeist, & Lillie-Blanton, 1991; Paradies, 2006a). The interaction of these factors is the fundamental cause of racial differences, with distinct combinations more salient depending upon the context, outcome, or research question (Williams, 1997). Generally, the social construction of race may be diminished by the biological and scientific information about the concept of race and at least partially contained through policy. For example, a definition of race based on externally visible biological features gains importance only when society considers it to be socially and culturally significant, it is supported by the academic community, or it is politically sanctioned (Sanson et al., 1998). Race is therefore socially constructed, politically manipulated, and perpetuated by scientific researchers, despite not being biologically determined (Reynolds, 1992). Regardless of whether race measures purely political and social factors or if it is an amalgamation of biological, political, and social variables, it does predict health outcomes and demonstrates the consequences of the globally race-conscious society by producing profound biological manifestations through increased stress and decreased access to essential services (Gabard & Cooper, 1998). Notions of race are frequently promoted and drawn upon to support inequitable relationships among not only racial, but ethnic, cultural, and religious groups (Gabard & Cooper, 1998; Williams, 1997), as reflected by a substantial and growing literature on the racialisation of ethnicity, culture, and religion (Dunn, Burnley, & McDonald, 2004; Dunn, Klocker, & Salabay, 2007; Imhoff & Recker, 2012; Love, 2009). The mere existence of racial categories propagates racial differences, supports the fragmentation of society, and enhances the philosophy of biological determinism (Fullilove, 1998; Williams, 1997).

6 Defining Racism Regardless of the validity and most accurate definition of race, belief in the existence of races means that racism will be present; without race there could be no racism. Racism is a persistent and widely prevalent social problem. Akin to the concept of race, despite its pervasiveness in everyday language, there is no academic consensus on the meaning of racism, and most research fails to adequately define racism before operationalising and measuring it (Brondolo, Gallo, & Myers, 2009; Paradies, 2006a). Theories of racism vary from those understanding racism as inevitable, to those attributing racism to personality characteristics, to those proposing racism to be a product of societal norms and discourse. Racism has also been studied from both the perspective of the reflecting perpetrator and of the perceiving victim. In this sense, perceived racism refers to the subjective experience of racism, prejudice, stereotyping, or discrimination (Clark, Anderson, Clark, & Williams, 1999). Here, however, my focus is solely on the perpetration of racism and the attitudes underlying such offences. In its most basic form, racism refers to differential treatment due to the perceived racial membership of an individual or group based on stereotypic characteristics (Paradies, 2006b; Phelan, Link, & Dovidio, 2008). The term can also be applied more extensively to cultural and ethnic group differences, and contemporary definitions have expanded to include religious affiliation (Contrada et al., 2001; Dunn, et al., 2007). Broadly, racism can be defined as any behaviours, beliefs, and attitudes that underlie inequalities across groups and disadvantage minority racial groups or advantage dominant groups (Paradies, 2013). Racism is grounded in pervasive assumptions of the inherent superiority of some, and inferiority of other, groups, based on cultural differences in values, norms, and behaviours (Sanson, et al., 1998). Racism is reflected by the inequitable distribution of opportunity, benefit, or resources across racial groups, and is perpetuated by deeply rooted historical, social, cultural, and

7 power inequalities in society (Paradies, 2006a; Sanson, et al., 1998). Central to racism is the ability of dominant groups to exercise power or authority systematically to mistreat others (Sanson, et al., 1998). Racism is therefore contingent on access to social, economic, and political power and is an outcome of the dominance of one particular group over others (Link & Phelan, 2001; Sanson, et al., 1998). Although both powerful and powerless groups may stereotype, negatively evaluate, and treat others unfairly, because the former controls access to resources, their beliefs, attitudes, and actions widely prevail (Fiske, 1993; Link & Phelan, 2001). Racism can be operationally defined as beliefs, attitudes, acts, or systemic provisions that target, exclude, or disparage individuals or groups because of phenotypic qualities or racial, ethnic, cultural, or religious group affiliation (Brondolo, et al., 2009; Clark, et al., 1999). Unlike other conceptualisations that view racism as a relationship only from members of dominant to oppressed groups, this definition acknowledges that racism is not unidirectional. Both dominant and minority groups can be targeted, and denigration may occur by members of a different group (inter-racial racism) and by members of the same group (intra-racial racism). The most expansive definitions assume all inequality among groups exists because of current or past racism. The narrowest definitions restrict racism only to acts intended to harm the target group. In most accounts, prejudice is the core motivator of racist behaviour, but such behaviour can have nonprejudicial causes, and the perpetrators of racism may not be aware of how their beliefs about race influence their judgments and actions (Quillian, Cook, & Massey, 2006). Racism is expressed through stereotypes, prejudice, and discrimination, respectively cognitive, affective, and behavioural components of racism. Racist beliefs are viewed as cognition (i.e., stereotyping); racist emotions and attitudes as affect (i.e., prejudice); and the

8 enactment of racist laws, norms, and practices as behaviour (i.e., discrimination) (Paradies, 2006a; Quillian, et al., 2006). These terms are often used interchangeably, although they may represent distinct, but related, concepts. In this dissertation, I consider racism a specific form of any stereotyping, prejudice, and discrimination based upon racial, ethnic, cultural, or religious group membership. Racism can be interpersonal, intraindividual, or systemic; inter-racial or intra-racial; legal or illegal; direct or indirect; overt or covert; blatant or subtle; ambiguous or specific; intentional or unintentional; and it can occur through action or inaction (Harrell, 2000; Krieger, 1999; Paradies, 2006a). Racism can be manifested at all levels of society and through various mechanisms including interacting institutions (governments, legislature, media, organisations), in addition to intergroup, intragroup, and interpersonal exchanges, and intrapersonal attitudes, beliefs, and feelings (Sanson, et al., 1998). Internalised, interpersonal, and systemic racism are the three most studied forms. Internalised racism can be defined as acceptance, incorporation, or adoption of negative messages of racist attitudes, beliefs, or ideologies by members of marginalised groups about the inferiority of one‟s own racial group (Jones, 2000; Paradies, 2006a). Internalisation of a debased status and sense of oppression can lead to people embracing denigrating views and judgements both about oneself, one‟s own identity, and about others in the racial group (Phinney & Ong, 2007; Sanson, et al., 1998). This process is reinforced by the dominant group‟s own symmetrical process of internalised dominance (Paradies, 2006a). Interpersonal racism encompasses any racist interaction between individuals or groups; it is perpetrated by persons and includes any act that reflects racist attitudes and beliefs. Such personally mediated racism includes differential assumptions about the abilities, motives, and intents of others due to racial group membership and consequent differential responses and actions (Jones, 2000). Interpersonal racism can be further broken down into

9 inter-racial racism and intra-racial racism. Inter-racial racism is more common and occurs when racism is transmitted to, or from, distinct groups. In contrast, intra-racial racism occurs when a group or individual targets members of their own racial group. Interpersonal racism can range from significant and blatant attacks to racial microaggressions, everyday verbal, behavioural, or environmental degradation, slurs, and insults that deliver messages of hostility, derogation, or negativity (Sue et al., 2007). Some authors propose that all interracial encounters are prone to the manifestation of racial microaggressions, whether intentional or unintentional, which occur between and within minority groups (Sue, et al., 2007). To date, the study of interpersonal racism has focused on negative relations between groups. An alternate approach proposes a more hopeful paradigm that humans are equally capable of nonprejudiced thought and action (Phillips & Ziller, 1997). Nonprejudice, or acceptance as it is considered in this dissertation, has been conceptualised as an interpersonal orientation whereby similarities rather than differences between an individual and diverse others are attended to, accentuated, and interpreted (Phillips & Ziller, 1997), so acknowledging commonalities and similarities (cognitive integration) rather than concentrating on categorisation and variation (cognitive differentiation). However, it must also be acknowledged that the perception of outgroup similarity, in some circumstances, can lead to prejudice (Gabarrot, Falomir-Pichastor, & Mugny, 2009; Pedersen & Thomas, 2013). Systemic, institutional, structural, societal, or organisational racism is defined as the racist production, control, and access to material conditions, information, power, and resources within a society (Jones, 2001; Paradies, 2006a). Social systems, structures, and organisations create and implement policies and practices that directly or indirectly target specific racial, ethnic, cultural, or religious groups to produce and reproduce and maintain existing inequalities and disparities, leading to differential access to societal goods, services,

10 and opportunities (Jones, 2000; Paradies, 2006a; Utsey, Ponterotto, & Porter, 2008). Within systemic racism, cultural racism pertains to value systems that allow and support discriminatory actions against specific racial, ethnic, cultural, or religious groups. Institutionalised racism often involves the uncompromising incorporation of minority groups and their ideals within the dominant group and the subversion of minority groups, resulting in reduced recognition of their underlying needs (Sanson, et al., 1998). Structures and processes designed to serve the rights and needs of the dominant group often simply fail to meet adequately those of minority groups. Although in Australia existing services are technically available to everyone regardless of background or group affiliation, the failure to provide appropriate services or address underlying problems of service accessibility represents an ongoing and entrenched form of institutional racism. Psychological Theories of Racism As a complex social issue, a variety of explanations, perspectives, and theories of racism have been advanced, with evolving hypotheses appearing in distinct phases and forms in different epochs and localities. Correspondingly, but to a lesser extent, alternative measurement tools and techniques have been developed and utilised to assess the prevailing understanding of racism. A historical analysis focussing on explanations of racism suggest that the prevalence and emergence of different theoretical orientations, and distinct theoretical explanations, has shifted in response to wider historical and social factors and the dominant psychological paradigm (Duckitt, 1992). Social circumstances and historical events interact with the evolution of knowledge, theory, and investigative techniques, encouraging consideration of distinct issues and questions throughout each period. Alternate philosophies of racism have concentrated on varying causal factors, with no adequate general theories or integrative frameworks yet able to provide a comprehensive and complete explanation of racism and its causes (cf. Duckitt, 1992). Broadly, racism research can be viewed as

11 stemming from three waves. In the first wave racism was assumed to reflect psychopathology, and in the second it was viewed as rooted in normal processes. The third and current wave emphasises the multidimensional nature of racism, taking advantage of new technologies and techniques to examine processes not previously measurable (Dovidio, 2001). An overview of existing theories is detailed below. Racism has attracted the attention of psychologists only since the 1920s (Milner, 1983; Samelson, 1978). Prior to this time, during the era of White domination and Western rule of colonial peoples, White superiority was firmly established and accepted by Western communities, with antipathy towards minority groups considered an inevitable and natural response to their inferiority and deficiency (Duckitt, 1992). Racism subsequently came to be represented as irrational, unjustified, and psychopathological into the 1950s, a hazardous aberration from normal and rational thought processes (Dovidio, 2001). Much early work on racism attempted its quantification by using simplistic social distance measures (e.g., Bogardus, 1933) and the categorisation of stereotypes by racial group (e.g., Katz, 1933). Racism then became viewed as an unconscious defence mechanism by psychodynamic accounts, which purported that the foundation of racism stemmed from an individual‟s intrapsychic and unconscious conflicts. Racism was understood as an expression of universal psychological processes against minority groups, which diverted inner conflicts, hostilities, tension, and other problems arising either within the individual, or from external frustrations, threats, deprivations, and other environmental stressors (Duckitt, 1992). Understandings of racism progressed and began to attribute it to a manifestation of an inner need produced by underlying pathological personality structures. The most influential and recognised theory in this era was that of the authoritarian personality (Adorno, Frenkel-Brunswik, Levinson, & Sanford, 1950), which argued that parent-child relationships with harsh and punitive parental discipline led children to form a personality prone to the development of racist attitudes. At

12 this time, measurement of related concepts such as right wing authoritarianism were commonly utilised as a proxy for racist attitudes (e.g., Adorno, et al., 1950). Despite the prevailing belief that a vast majority of individuals were non-racist, there was continued evidence of discrimination and consequent racial disparities into the 1960s, which focussed attention on racism as normative and embedded within social and cultural contexts (Dovidio, 2001; Duckitt, 1992). Persistent racial inequalities in health, employment, and education were frequently drawn upon to demonstrate the impacts of ongoing societal racism. Racism was explicated as an expression of group interests by intergroup approaches such as Realistic Group Conflict Theory (Sherif & Sherif, 1969) and Social Identity Theory (Tajfel & Turner, 1979), which highlighted the consequences of ingroup identification and subsequent intergroup differentiation. Integrated Threat Theory (Stephan, Stephan, Demitrakis, Yamada, & Clason, 2000) combined earlier intergroup theories by classifying group threats into four major types: realistic threat, symbolic threat, intergroup anxiety, and negative stereotypes, each of which were proposed to act as antecedents to racism. During the 1970s, racism became regarded as an inevitable consequence of ordinary categorisation processes (i.e., stereotyping) driven by the limited cognitive capacity of the human brain (Devine, 1989). The core contention was that, provided stereotypes endure, racism will continue to exist (Brigham, 1971). By this time, scales of explicit (i.e., self-report) attitudes towards various minority groups had begun to proliferate (e.g., Indigenous Australians, Larsen, 1978; American Jews, Middleton, 1976; Black South Africans, Pettigrew, 1958; African Americans, Woodmansee & Cook, 1967). In the post-1970s, contemporary understandings of racism emerged. Modern forms of racism are generally seen as covert, ambivalent, and multifaceted (c.f. Sniderman, 1991). Theories of racial ambivalence began to develop that were characterised by the mutual coexistence of both positive and negative racial attitudes and of subtle and unintentional

13 forms of bias. Each theory of racial ambivalence was directly assessed by focussed evaluations utilised as proxies for generalised racist attitudes, including symbolic racism (Kinder & Sears, 1981), modern racism (McConahay, 1983), aversive racism (Gaertner & Dovidio, 1977; Kovel, 1970), and later subtle prejudice (Pettigrew & Meertens, 1995) and colour-blind racism (Neville, Lilly, Lee, Duran, & Browne, 2000). Stemming from these understandings of modern racism, cognitive psychology offered an important distinction between implicit and explicit attitudes, with implicit attitudes proposed to lack conscious awareness, be unable to be directly perceived, be unintentionally and automatically activated by the presence of an attitude object, and therefore requiring indirect measurement (Dovidio, 2001; Greenwald, McGhee, & Schwartz, 1998). Specialised tools to evaluate implicit racism were developed and utilised to quantify an individual‟s „true‟ racist attitudes (Greenwald, et al., 1998). More recently, critical psychological or discourse analytic methods have introduced an appreciation of racism as a product of societal dialogue, an emphasis that proposes racism to be beyond simple cognitions. Such approaches examine racism as common discursive techniques that rationalise, justify, legitimate, and maintain existing power symmetries (i.e., majority group dominance and minority group oppression) and inequalities in both formal dialogue, such as political rhetoric, and in everyday informal talk (Augoustinos, Tuffin, & Every, 2005; Augoustinos, Tuffin, & Sale, 1999; Essed, 1991). In summary, shifts in the way in which racism has been understood and measured have occurred, characterised by distinct theoretical orientations, research emphasis, and investigative questions emerging in response to specific historical events and social circumstances. These diverse approaches are best interpreted as representing attempts to answer fundamentally dissimilar questions about the nature, cause, and measurement of racism. Although distinct, these fundamental psychological perspectives are most meaningfully appreciated as valid and compatible approaches to different aspects of this

14 phenomenon, which have interacted as new analytical techniques have been developed (Duckitt, 1992). Racism in Australia The expression of racism and its components, stereotypes, prejudice, and discrimination, are believed to have changed over the past century from overt to more covert and subtle forms. However, racism remains prevalent in Australia and many other countries. Although similar in numerous respects to other Western nations such as the US, racism in Australia is unique and direct extrapolation of international research is therefore problematic (Pedersen, Beven, Walker, & Griffiths, 2004). Due to their prevalence as a focus in the research literature, some key differences are highlighted by examining Aboriginal Australian versus African American history and experience. Aboriginal Australians are Indigenous and African Americans were introduced; genocide was overtly enacted against Indigenous Australians and less explicitly through genocidal oppression against African Americans; the relative size of the two populations is vastly different, with African Americans consisting around 13% of the US and Indigenous Australians around 3% of Australia; the visibility of African Americans is much greater than for many Indigenous Australians; traditional ownership land rights was significant for Indigenous Australians but not relevant for African Americans; and Indigenous Australians experience much poorer health indicators than African Americans (Walker, 2001). This issue becomes more problematic when dissimilar patterns of cultural diversity across the two countries are examined. Despite these distinctions, Australian researchers have often imported and utilised US concepts uncritically (Pedersen, et al., 2004; Walker, 2001). As a result, academic understandings and many investigative tools have not been developed in, or for, an Australian context; most contemporary research still assumes, at least tacitly, that racism is embodied by the attitudes of White towards African Americans.

15 Current racism research is therefore limited in terms of generalisability, validity, and utility because of the lack of empirical examinations of racist attitudes in Australia (Dunn, et al., 2004). Moreover, explorations of racism in Australia have predominantly been conducted with a view of racism existing only between White non-Indigenous Australians and Indigenous Australians, with limited work with non-Indigenous „Others‟ (Tuffin, 2008). Indeed, the first systematic investigation of implicit and explicit racist attitudes in an Australian minority group did not occur until relatively recently (McGrane & White, 2007). Australia‟s history of colonisation is important in this context, because early characterisations of Indigenous people provided the foundations for contemporary racist practices and Australian history has been essentially indivisible from the issues of race and racism (McCreanor, 1993; Tuffin, 2008). Yet racism is dynamic and ever-changing, with different groups suffering from racism at different stages and in distinct fashions after their arrival in Australia. Pedersen, Clarke, Dudgeon, and Griffiths (2005) describe the historical progression of racism in Australia as moving from targeting Yugoslavs, Italians, Asians, Arabs, to Afghans. The past decade would most appropriately also include Indians and Africans, who are widely reported in the media as key out-groups in modern Australian society. Racism has been a feature of Australia‟s societal systems since the first British settlers in 1788, systematically disenfranchising Indigenous populations and other minority groups. The White Australia Policy, which restricted non-White European immigration to Australia, mandating racial discrimination in immigration, was implemented at Federation in 1901 and continued as a formal policy until its gradual abolition from 1949-1973. From the late 1960s official government policy moved from assimilation to multiculturalism. Despite the recognition of Indigenous Australians in the national census with the 1967 referendum, the subsequent introduction of an official policy of multiculturalism in 1972, and legislation

16 outlawing racial discrimination, racism is still prevalent and minimal progress in addressing racism has occurred since the 1970s (Jayasuriya, 2002). The Stolen Generations, an artefact of an atrocious socio-political intervention implemented from 1937 to 1969 whereby Aboriginal children were forcibly removed from their families and placed with White non-Indigenous families, and the subsequent political failure to formally acknowledge and apologise for such government endorsed undertakings until 2008, provides a key example of racism perpetrated against Australia‟s Indigenous peoples which continues to encourage and perpetuate their disadvantage, ostracism, and degradation (Lecouteur & Augoustinos, 2001). There has also been considerable racism towards Asian populations in Australia centring around their freedom to apply for migrant status, with public debates gaining political salience in response to Pauline Hanson‟s rightwing anti-immigration One Nation party and the associated perpetuated rhetoric of xenophobia from 1996-2002 (Tuffin, 2008; Wu, 1999). In 2005, a series of purportedly racially motivated assaults in Sydney ignited widespread violence, with 5,000 people involved in racial violence on the beaches of Cronulla in Sydney („the Cronulla Riots‟) (Poynting, 2006). Less than five years later in Melbourne in 2009, thousands of international students protested in response to alleged assaults and robberies targeting Indian students (Dunn, Pelleri, & Maeder-Han, 2011). Today Australians continue to witness enduring racialised political rhetoric concerning humanitarian aid to refugees seeking asylum. Moreover, the adverse impacts of recently proposed amendments to the Racial Discrimination Act 1975 are widely feared to include reduction of protection for Australia‟s most vulnerable racial, ethnic, cultural, and religious populations in addition to widespread negative community health outcomes (Australian Psychological Society, 2014). Issues related to racism remain highly prevalent in the current Australian political and social landscape.

17 Whilst multiculturalism has been officially in place for over 40 years, migrants and Indigenous peoples are nonetheless consistently exposed to racism in the Australian community. There are similarities and differences in the experiences of the various minority groups in Australia, with the attitudes towards novel groups impacted by varying characterisations of non-Australian versus Australian (Sanson, et al., 1998). The underrepresentation of minority group members in the media, reinforcement of negative stereotypes in the reporting of minority group conflicts, continuation of restrictive immigration policies, and limitations in access to education and employment for minority group members, as well as the continued exclusion of many Indigenous people from access to adequate standards of health, education, employment, housing, and basic infrastructure, are all visible outcomes of racism (Sanson, et al., 1998). Racist dialogue similarly continues to infiltrate our principal establishments, systems, and public debate on issues such as immigration, reconciliation, and Indigenous rights (Tuffin, 2008). Despite differences in the exposure to, and experience of, racism by distinct minority groups, the experience of Indigenous Australians is remarkable. Racism in the wider community against Indigenous Australians remains high, with rates of up to 97% reported to be victims of racism (Cunningham & Paradies, 2013; Ferdinand, Paradies, & Kelaher, 2012). On most indices including education, employment, health, and incarceration rates, Indigenous Australians persist as the most disadvantaged group within Australian society (Sanson, et al., 1998). Despite this grim picture of the Australian racial landscape, Australia has also had leaders such as Nicky Winmar and Adam Goodes, Indigenous Australian Football League players, who publicly stood up to racial vilification in 1993 and 2013 respectively. Winmar was subsequently embodied as an anti-racist champion and an unofficial icon for all people fighting racism in Australia; Goodes was named Australian of the Year in 2013. There are

18 also a great number of research projects related to racism and many anti-racism and prodiversity ventures being implemented throughout Australia. With the increasing acknowledgement and awareness of racism in the popular media and the general population, there is hope that racism in Australia can be reduced if not prevented from developing. Impacts of Racism Racism, stereotyping, prejudice, and discrimination are pervasive and persisting challenges for all people. Race as a construct has no scientific basis, but racism continues to permeate every facet of contemporary society (Morgan, 2002). It is deeply rooted in our global community and has multiple cumulative deleterious effects on the health of all Australians. Positive social contact is essential for social, psychological, and physiological health and development throughout the lifespan; individuals who perceive social isolation, ostracism, or rejection are susceptible to a range of behavioural, emotional, and physical problems, in addition to pervasive negative educational, economic, and social outcomes (Clark, et al., 1999; Kurzban & Leary, 2001). Relatively few studies have assessed the impact of intragroup racism, but both inter-racial and intra-racial racism are acknowledged to be significant stressors, with the available evidence suggesting that perception of either form of racism exerts a significant negative effect on the well-being of victims (Clark, et al., 1999). Racism research consistently shows positive associations between racist attitudes, negative mental health outcomes, poorer physiological outcomes, and general psychopathology. The link between poorer physical and mental health and self-reported perceptions or experiences of racism has been well documented and there is robust evidence that victims of racism are at greater risk of developing a range of health problems (Pachter & Coll, 2009; Paradies, 2006b; Priest et al., 2012). The most compelling and consistent results have shown racism to be related to increased stress responses, psychological distress, depression, anxiety, and stress, in addition to decreased self-esteem. Although several

19 moderators of the negative impact of racism have been found, racism is also a key influence on common psychiatric conditions such as mood, anxiety, eating, and substance use disorders. The pervasive effects of racism have been demonstrated in various minority racial groups in North America, South America, Africa, and Europe; and Indigenous groups from New Zealand, Australia, and Africa (Chou, et al., 2012; Harrell, et al., 2003; Paradies, 2006b; Pascoe & Richman, 2009; Williams, et al., 2008). Racism has negative cognitive, behavioural, affective, relational, and developmental effects: increasing anxiety, depression, emotional distress, self-defeating thoughts, avoidance behaviours, and medical complications, while reducing academic and social development, self-esteem, and self-efficacy in both children and adults nationally, globally, historically, and contemporarily (American Psychological Association, 2004; Pachter & Coll, 2009; Paradies, 2006b; Priest, et al., 2012; Williams, et al., 2008). Racism also has harmful effects on society as a whole; when groups are relentlessly depicted as problematic and undesirable, internalisation and belief of these deleterious stereotypes has destructive psychological and social consequences for entire communities (American Psychological Association, 2004; Sanson, et al., 1998). Notably, these impacts can be the result of both the direct perception of racism or the vicarious experience of racism that is directed towards peers, relatives, or other group members. Moreover, racism negatively impacts members of the dominant group directly by perpetuating distorted conceptualisations about both dominant and non-dominant groups, promoting anxiety and fear when in, or anticipating, the presence of, non-dominant group members, potentially triggering acts of hostility and aggression (American Psychological Association, 2004; Bastian et al., 2013). Perhaps more importantly, child-specific research has documented the devastating psychological and physical effects of racism, suggesting that it penetrates and potentially damages or debilitates individuals‟ sense of self, belonging, and security, through the

20 internalisation of messages from racist behaviour and the resultant sense of powerlessness (Howarth, 2009). For children and young people, racism is associated with a range of outcomes including anxiety, depression, and hopelessness; increased alcohol, tobacco, and drug use; lowered self-esteem, self-worth, self-efficacy, and life satisfaction; anger, conduct problems, delinquency, and Attention Deficit and Hyperactivity Disorder; and indicators of metabolic and cardiovascular disease including poor metabolic control/dietary adherence and insulin resistance (Pachter & Coll, 2009; Priest, et al., 2012). Similarly, caregiver reported racism has been associated with child preterm birth, low birth weight, and depression; heightened caregiver stress about progeny experiencing racism; higher levels of uninvolved parenting and interpersonal sensitivity; and reduced maternal support and overall satisfaction with child rearing (Priest, et al., 2012). Various pathways and mechanisms of action for the impacts of racism have been proposed, with a key theory suggesting perceived racism acts as a toxic psychosocial stressor resulting in psychological and physiological stress responses including anger, paranoia, anxiety, helplessness-hopelessness, frustration, resentment, and fear (Armstead, Lawler, Gorden, Cross, & Gibbons, 1989; Bullock & Houston, 1987; Clark, et al., 1999; Pachter & Coll, 2009). Perception of an environmental stimulus as racist is purported to result in exaggerated psychological and physiological stress responses that are influenced by a complex interaction of sociodemographic, psychological, and behavioural factors, and coping responses, which influence a plethora of negative outcomes (Clark, et al., 1999; Priest, et al., 2012). Physiological responses following exposure to psychologically stressful stimuli, including the perception of racism, involve immune, neuroendocrine, and cardiovascular functioning that may increase susceptibility for an array of adverse health effects (Anderson, McNeilly, & Myers, 1991; Cacioppo, 1994; Cohen & Herbert, 1996; Herd, 1984). It is also

21 possible that racism may be detrimental to health even when it is not perceived as a stressor (Clark, et al., 1999). Racism is a complex multidimensional construct that requires careful conceptualisation, characterisation, and modelling, yet the study of racism and health is still limited, particularly for child health. Clearly, conceptual clarity is required to disentangle, through rigorous, theory driven, and empirical research, the complex means by which racism acts as a determinant of health (Paradies, 2006a; Williams, 1997). Such improved knowledge will better guide policies and actions aimed at reducing the health impacts of those aggrieved by this invidious phenomenon (Krieger, 1999; Paradies, 2006a). Given that racism occurs at all levels from macro to intraindividual, it is important to understand how it occurs, to allow interventions to counteract racism to be focused at each level independently and interdependently. Due to the unique challenges faced by the Australian population, it is important that tailored racism-reduction programs are developed and disseminated throughout Australia. Most significantly, such initiatives must be evaluated accurately and improved accordingly. Measurement of Racism Despite most recommendations stemming from Allport (1954)‟s seminal, preeminent, and influential work on prejudice and racism being widely applied in theory and practice, his proposal that social science can aid in racism reduction by accurately evaluating the outcomes of anti-racism programs seems to have been predominantly ignored within Australia. Internationally, various measures of racist attitudes exist, but these generally concentrate on anti-African attitudes and are validated only for US populations. Since differences in context and cultural milieu preclude direct extrapolation of US findings to Australia (Pedersen, et al., 2004), several Australian scales have been developed. However, these either concentrate on one group (e.g., Indigenous Australians; Pedersen, et al., 2004) or

22 have not been empirically developed and appropriately validated (e.g., Dunn & Geeraert, 2003). This gap is especially apparent for Australian youth, where research still relies on measures of social distance and stereotyping (e.g., Doyle & Aboud, 1995; Walker & Crogan, 1998), adapted but not appropriately validated non-Australian measures (e.g., White & Gleitzman, 2006), or complex surveys of questionable reliability (e.g., White & Abu-Raya, 2012). A dearth of instruments for accurately measuring racism across groups in Australia therefore currently exists. There are no appropriate tools for quantitatively assessing the impact of multi-group racism-reduction and pro-diversity strategies, and hence, the programs, projects, and initiatives being implemented throughout Australia to reduce racism and increase acceptance of difference, cannot be adequately evaluated. It is essential that racism is able to be quantified accurately, so that the effectiveness of racism reduction strategies can be comprehensively evaluated and subsequently enhanced. Although a change in one‟s beliefs or attitude toward a stereotyped group may or may not be reflected in a change in the corresponding evaluations of, or behaviours toward, members of that group (Devine, 1989), attitude change is considered an essential component of reducing community levels of racism. Measurement has therefore been a fundamental issue in discussions of changing racial attitudes and accordingly community levels of racism (Quillian, et al., 2006). Quantifying racism is challenging and requires an understanding of each aspect of the multifaceted concept from its development, perpetration, perception, reinforcement, and aetiology, and the potential reactions and responses to exposure to racism. Racism has traditionally been evaluated via self-report surveys, which generally assess attribution of group traits (stereotypes), group evaluations (prejudice), and differential behaviour toward in-group and out-group members (discrimination) via explicit endorsement of attitudes (Hewstone, Rubin, & Willis, 2002). However, appraisal of these components is often empirically dissociated with modest to weak overall relationships between measures

23 (Hewstone, et al., 2002). Moreover, even when it is not illegal, the strong normative prohibition and social desirability norms against racism likely result in underreporting (Crosby, Bromley, & Saxe, 1980; Quillian, et al., 2006). Hence, although self-reports from perpetrators can provide insight into racism they are not necessarily accurate proxies for actual discrimination. Similarly, although accounts from victims about their experiences face fewer social desirability problems, they may capture perceptions rather than reality (Quillian, et al., 2006). More generally, self-reports are influenced by a range of additional psychological and sociodemographic factors that require further investigation (Major, Quinton, & McCoy, 2002). In contrast, indirect measurement approaches infer that exposure to racism is associated to outcomes by eliminating (through design or analysis) other potential causes, or by highlighting differences by race. Indirect approaches are frequently utilised when racism is imperceptible, otherwise unable to be expressed by individuals, or incapable of comprehensive or direct appraisal, such as various forms of systemic and internalised racism (Krieger, 1999; Paradies, 2006a). Nonetheless, there are noteworthy methodological issues accompanying indirect methods, such as the potential for inadequate scope and substandard evaluation of confounding factors, as well as restrictions in capacity to investigate prospective interactions between racism and alternate variables (Bonilla-Silva, 1997). The limitations of self-report and indirect approaches led to the development of implicit measures of racist attitudes. The promise of implicit instruments is to assess the true extent of unintentional and unconscious individual bias, of which even well-intentioned and consciously non-racist people are largely unaware due to pressures to conform to social expectations and cultural norms (Hewstone, et al., 2002). Explicit attitude measures invoke two general processes to assess an attitude: (a) respondents reflect consciously on their own attitudes and (b) respondents report their attitudes based on this self-reflection (Blanton &

24 Jaccard, 2008). Conversely, implicit measures bypass the conscious mind by invoking two alternate processes: (a) unobtrusive activation of an automatic attitude so it is not consciously perceived and (b) unobtrusive assessment of the attitude so it cannot be consciously obscured (Blanton & Jaccard, 2008; Hewstone, et al., 2002). Hence, implicit racism is conceptually distinct from explicit racial attitudes: explicit measures are unable to directly tap the unconscious and therefore implicit racism cannot be assessed by questionnaires, qualitative interviews, or other methods that depend upon the surface meaning of participant responses (Blanton & Jaccard, 2008; Quillian, et al., 2006). Although dozens of methods for implicitly assessing attitudes have been proposed, priming and latency methods are the most prominent. Priming is an experimentally based method in which participants are shown a racially linked word or image (the prime) or they engage in a task that is designed to prime an attitude outside of conscious awareness before beginning another task; performance is then compared with those primed with a neutral term or not primed at all (Blanton & Jaccard, 2008; Quillian, et al., 2006). Latency methods, the most preeminent of which is the Implicit Association Test (IAT), rely on mental categorisations. The IAT evaluates the response latency for people to classify race-related stimuli into different evaluative categories, with the latency purportedly reflecting an individual‟s underlying attitude (Blanton & Jaccard, 2008). Although the distinction between implicit and explicit racism is pervasive in social psychology, with implicit instruments becoming increasingly popular and widely used, controversy remains about what is being measured, with critics arguing that these instruments merely capture common and deeply ingrained social stereotypes strongly associated with particular groups rather than implicit racist attitudes (Blanton & Jaccard, 2008; Devine, 2001). Although implicit attitude measures ostensibly assess unconscious mental processes,

25 the methods for establishing that a given response reflects the workings of an unconscious attitude are not well developed (Blanton & Jaccard, 2008). Tentative interpretations and robust empirical criteria are imperative when new technologies are gaining acceptance, yet many researchers have made unjustifiably vigorous claims about the merits of implicit instruments. The developers of the IAT purport that it can be used to tap unconscious racism or real attitudes that self-report measures cannot assess and a large meta-analysis recently concluded that the IAT‟s predictive validity was significantly superior to self-report instruments (Greenwald, Poehlman, Uhlmann, & Banaji, 2009). However, there were limitations to the analysis, such that the aforementioned claims are unconvincing (Blanton & Jaccard, 2008). Measurement and control of explicit attitudes in the included studies was universally weak; the effects of implicit measures on racism-related criteria appeared to be moderated by multiple factors; the relationship strength varied considerably across studies; and the analysed research reports outlined methodological limitations that questioned strong assertions being drawn from their data. More realistic interpretations suggest that a lack of conceptual correspondence between implicit and explicit measures reduces the strength of their association (Gawronski & LeBel, 2008; Hofmann, Gawronski, Gschwendner, Le, & Schmitt, 2005). Additional threats to the validity of implicit instruments which are yet to be sufficiently addressed include individual differences in processing speed, association-strength correlates, reliability, metric meaning, and generalisability (for review see Blanton & Jaccard, 2008). Implicit attitude measures clearly face a wide range of new validity challenges that are unfamiliar to investigators and require greater attention in forthcoming years; although the validity of self-reports can be vulnerable to social desirability biases and a range of response artefacts, decades of research have strengthened methods for minimising and addressing such issues. Due to the major concerns about the quantification of implicit racist

26 attitudes and the general lack of understanding of what implicit instruments actually measure, for the purpose of the present study, the development of a self-report scale was considered the most appropriate course of action. Despite the limitations of explicit measures, the ability for paper and pencil surveys to be widely disseminated, easily completed, and quickly scored makes their use attractive for research teams needing to evaluate the effectiveness of community wide anti-racism or pro-diversity initiatives. It was therefore considered important to design and validate an explicit scale of racist attitudes for its effectiveness and usefulness to be most widespread in the Australian context. Mechanisms of Attitudes In the research reported here, I aimed to develop and validate an attitudinal measure of racist attitudes for use as an evaluation tool for anti-racism and pro-diversity strategies attempting to change racist behaviours, attitudes, and cognitions. It is therefore important to provide an outline of the mechanisms of attitudes and their relationship to behaviours and cognitions. The social psychological study of attitudes has been one of the central themes of psychology as a discipline for decades and attempts to measure attitudes have been made for almost a century (Allport, 1954; DeFleur & Westie, 1963; Rosander, 1930). The validity of the concept of attitude has been debated for a similar period of time. An attitude is an enduring pattern of evaluation toward a psychological object, an individual‟s tendency to act or react in a certain way (Colman, 2003). As a disposition toward a particular entity, an attitude can be classified as positive or negative, depending upon whether the individual feels favourable or unfavourable toward the entity (Ajzen & Fishbein, 1977; Dejaeghere & Hooghe, 2012; Insko & Schopler, 1967). The structure of an attitude can be further broken down into its components: affect, behaviour, and cognition (Baron, Byrne, & Suls, 1989). The affective response expresses an individual‟s degree of preference for an entity. The behavioural intention is the tendency of an individual to behave in a certain

27 manner toward an entity. The cognitive response is the evaluation of the entity constituted by an individual‟s beliefs about the entity. When entities have attitudes directed toward them they are referred to as “objects of affective significance” (Insko & Schopler, 1967, p. 362). Alternate theories consider attitudes as the affective component distinguished from cognitions and behaviour, with most definitions fundamentally linking attitudes to behaviour (DeFleur & Westie, 1963). In this sense cognitions are considered beliefs about, or perceptions of, relationship between two objects of affective significance (Insko & Schopler, 1967). Behaviour is any goal-directed activity, which can be classified as positive or negative depending upon how an individual evaluates it (Insko & Schopler, 1967). Attitudes are linked to behaviour via cognitions and it is the basic tenet of triadic consistency theory that there is a propensity for attitudes, cognitions, and behaviours to be consistently related; there is a tendency for an individual holding certain beliefs and attitudes to engage in certain behaviours (Insko & Schopler, 1967; Stephan & Stephan, 1996). For instance, when an individual holds a negative stereotype that is linked to negative affect, such as intergroup anxiety, the result can be a negative behavioural response, for instance discrimination (Stephan & Stephan, 1996). Regardless of whether attitudes, affect, cognition, and behaviour are considered part of the same concept, or distinct but related constructs, most research considers each to be intrinsically related in some manner. A key theory in the field of attitude and behaviour relations is the theory of planned behaviour. This theory suggests that if individuals evaluate a behaviour as positive (attitude), they believe significant others desire performance of the behaviour (subjective norm), and the individual has confidence in undertaking the behaviour (perceived behavioural control), this results in a higher intention (motivation) and the individual is subsequently more likely to undertake the behaviour (Ajzen, 1988; Ajzen & Fishbein, 1980; Armitage & Conner, 2001). Several meta-analytic reviews provide compelling empirical support for the theory of planned

28 behaviour and, at present, it is arguably the most dominant model of attitude-behaviour relations (Armitage & Christian, 2003; Armitage & Conner, 2001). Due to the inherent relationship between attitudes, beliefs, and behaviour, it is of vital importance in the reduction of racist behaviour (discrimination) and racist cognition (stereotypes) to reduce racist attitudes or affect (prejudice). Although it is imperative that interventions targeting racism are implemented across all planes, focusing on attitude change and consequently cognitive and behaviour change is arguably a necessary cornerstone for anti-racism approaches. Attitude change is understood as impacted by a complex interaction of many variables and often takes significant time and effort (Forgas, Cooper, & Crano, 2010). Moreover, the more entrenched the attitude, the more difficult it is to change. It is therefore integral that racist attitudes can be measured accurately in order to evaluate and improve the effectiveness of anti-racism and pro-diversity initiatives to reduce the racist attitudes, cognitions, and behaviours of participants.

29 Chapter 2: Methods Methods................................................................................................................................................ 30 Research Setting .................................................................................................................................. 32 Research Development Project Partners .......................................................................................... 36 Participants.......................................................................................................................................... 38 Recruitment ......................................................................................................................................... 39 Research Procedure ............................................................................................................................ 39 Analytical Techniques and Research Design .................................................................................... 41 Achieved Research Timetable ............................................................................................................ 53 Scale Development Procedure............................................................................................................ 54 Scale Refinement and Pilot Testing Procedure ................................................................................ 56 Scale Reliability and Validity Testing ............................................................................................... 58

30 Methods The research reported here explored conceptualisations of racism, as experienced by young Victorians from diverse racial and cultural backgrounds, and utilised this data to examine racism in children, adolescents, and adults from around Australia. In this chapter, I describe the multidisciplinary processes involved in defining participant experiences and understandings and the methodology for drawing upon this data to develop and validate a racism measurement tool. The initial stages of the research aimed to develop and construct a measure of racial, ethnic, cultural, and religious acceptance, to be used as a proxy measure of racist attitudes. A draft measure was based on data from in-depth semi-structured interviews and focus groups about racism with young people from various racial, ethnic, cultural, and religious backgrounds, in addition to a comprehensive review of the relevant literature. The measure was then refined and pilot tested. The secondary stages of the research were designed to validate the psychometric properties of this measure (factorial validity, internal consistency, test-retest reliability, convergent validity, discriminant validity, and predictive validity). The following hypotheses guided my work. First, I hypothesised that the final measure would discriminate between (1) individuals involved in an anti-racism initiative and individuals not involved in an anti-racism initiative; (2) individuals involved in the justice system and individuals not involved in the justice system; and (3) individuals prior to involvement in an anti-racism initiative and the same individuals after involvement in an anti-racism initiative. I hypothesised that the measure would be moderately correlated with a current measure of racism. No existing Australian instrument quantifying general racist attitudes utilises a scale which provides a total racist attitude score on a continuum. However, a 10item survey has been utilised throughout Australia in numerous studies (Dunn & Geeraert,

31 2003), and therefore it was considered the most appropriate instrument for comparative purposes. I anticipated that higher levels of racist attitudes would be related to undesirable personality traits such as a lack of empathy and remorse, shallow emotions, egocentricity, and deceptiveness. Higher levels of these personality traits are related to anti-social behaviour and a variety of psychopathology (Salekin & Frick, 2005). The measure used as a comparison was the Minnesota Temperament Inventory (MTI; Loney, Taylor, Butler, & Iacono, 2007), a 19-item research-based measure of these personality traits collectively known as psychopathy. The 13 meaningful items, as suggested by the development factor analysis and utilised in subsequent research, were used for the purposes of the current research (Loney, et al., 2007; Neumann, Wampler, Taylor, Blonigen, & Iacono, 2011) I hypothesised that higher levels of racist attitudes would be related to higher levels of social, emotional, and behavioural problems. The measure utilised for this comparative purpose was the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997), a 25-item screening questionnaire designed for use with 3-16 year olds. This instrument assesses emotional problems; conduct problems; hyperactivity/attention problems; peer relationship problems; and prosocial behaviour. Finally, I hypothesised that the measure would function comparably across children, adolescents, and adults. The preliminary aim of the research was to validate the measure to ensure that it was appropriate for use in evaluating an anti-racism initiative. My research was conducted at the time of a project being undertaken by Windermere Child and Family Services (hereafter Windermere) and the Cardinia Shire Council, implemented in five schools in the south eastern region of Melbourne - the Building Harmony in the Growth Corridor project (henceforth Building Harmony). If the instrument proved reliable and valid to evaluate this

32 project, I purported that it would be able to be disseminated to the scientific community and subsequently utilised in schools around Australia. As described in this thesis, a suitable tool was developed and validated to evaluate the effectiveness of racism-reduction programs, by assessing the racial, ethnic, cultural, and religious acceptance of a group prior to, and after, implementation of a racism-reduction program. The future appraisal of such programs will ensure that more focused and effective racism-reduction and pro-diversity initiatives can be implemented, and community levels of racism may be subsequently reduced. After the tool was subjected to appropriate validation within Building Harmony, further validation was undertaken with adolescents and adults. The result is Australia‟s first reliable and valid measure of racial, ethnic, cultural, and religious acceptance for use with multiple groups across the lifespan. Research Setting The main research setting was a south eastern growth corridor region of Melbourne, specifically Officer, a small town in Cardinia Shire approximately 55km from Melbourne city. The main component of the research involved five local primary schools participating in a racism-reduction initiative (Building Harmony) and an additional local primary school not actively participating in the strategy. Supplementary participants were recruited from the community throughout the region, Victoria, and around Australia. Building Harmony is an anti-racism and pro-diversity initiative implemented annually by Windermere and the Cardinia Shire Council. The strategy was first undertaken in 2009 with the aim of building the capacity of young people from diverse backgrounds, and their families, to live harmoniously in the emerging community of Officer, with the goal of discouraging crime based on race and encouraging acceptance of diversity through preventative interventions. The scheme was informed by a two year research project entitled Building Family and Community Resilience in Cardinia Growth Corridor: A Case Study of

33 Officer 2010-2011, with the key recommendations identifying the need to build and foster healthy communities through the early engagement of racially, culturally, ethnically, and religiously diverse individuals and families who had newly moved into the new growth corridor (Ozdemir, Soydas, & Grigg, 2012). Since inception, Building Harmony has had many successful outcomes. Building Harmony is unique, as it is one of very few Australian prevention projects to respond proactively to potential diversity issues as new populations arrive within an identified area. In 2011 it was awarded the Rotary Community Services Award and in 2012 it was awarded the APEX Institute Gippsland Multicultural Services Education Award (Ozdemir, et al., 2012). The delivery of the Building Harmony project and the involved activities were based on the prevailing recommendations from the research literature. Most importantly, the intervention was developed and refined over multiple years in consultation with the participating schools and therefore the implemented activities were appropriate both for the social and emotional developmental level of participants and for the wider demographic context of the involved schools. Taking into account local needs and the community context is integral in the delivery of an effective racism reduction intervention (Cotton, 1993; Greco, Priest, & Paradies, 2010; Pedersen, Walker, Paradies, & Guerin, 2011; Sanson, et al., 1998). A major component of the intervention was a student leadership program whereby high achieving students from each participating school were selected by their school principal, trained to actively participate in the delivery of project activities, and taught basic conflict resolution skills, which aimed to assist them to understand and respond to inter-racial conflict within their respective schools. This was based on recommendations from the literature that encourage the implementation of a peer education system with peers as co-facilitators (Sanson, et al., 1998) and from other authors that highlight the importance of effective problem solving skills in dealing with intergroup conflict (Caughy, O‟Campo, Randolph, &

34 Nickerson, 2002; Eisenberg & Miller, 1990; Hughes et al., 2006). Similarly important, although not directly assessed in the present research, was the provision of standardised cross-cultural training to teaching staff across the participating schools, noted previously as integral to the modelling of appropriate beliefs, attitudes, and behaviour about distinct groups transmitted to students (Sanson, et al., 1998). With regard to specific intervention activities, a cultural visual arts activity was undertaken by participants whereby students worked together across racial, ethnic, cultural, and religious boundaries to create a shared piece of artwork. The implementation of this activity was based on the abundant literature on intergroup contact that purports that the most effective form of such contact is when it is sanctioned by authorities; cooperative rather than competitive along racial, ethnic, cultural, or religious lines; with groups sharing equal status and endeavouring for similarly valued superordinate goals in activities that require desegregation (Crisp & Abrams, 2009; Gaertner, Mann, Dovidio, Murrell, & Pomare, 1990; Hewstone & Swart, 2011; Pettigrew, 1998). Akin to the learning delivered to the student leadership group outlined above, a „Say “No” To Bullying‟ day was conducted to enhance remaining participant understandings of, and confidence in utilising, effective problem solving skills. A theatre games day which aimed to challenge students through various games and activities, by building skills in team work, respect, support, and co-operation across racial, ethnic, cultural, and religious group boundaries drew upon similar principles, in addition to recommendations that interventions be delivered comprehensively and reinforce learning over multiple events (Cotton, 1993; Sanson, et al., 1998). A shared literature activity was also utilised, which involved the presentation of a personal journey by a minority group member and a corresponding book that students studied within their classes and subsequently discussed across schools. The book „Refugee – My Australian Story‟ by Alan Sunderland was utilised for this purpose and Endal Katchew, an Ethiopian refugee, led a presentation and

35 discussion about his migration experiences with all participating students. Employing facilitators of diverse backgrounds to deliver learning is acknowledged as an important aspect of anti-racist education that can assist to break down existing stereotypes (Cotton, 1993; Greco, et al., 2010; Hewstone, et al., 2002; Sanson, et al., 1998). The Building Harmony initiative was therefore a distinctive, notable, and important site for the core component of my doctoral research. The City of Casey and Cardinia Shire, neighbouring localities of the City of Greater Dandenong, are two of the most rapidly growing residential areas of Melbourne and Australia (Cardinia Shire Council, 2012; City of Casey Council, 2012). Population forecasts suggest that Casey and Cardinia will respectively experience 60% and 90% population increases over the next 20 years (Cardinia Shire Council, 2012; City of Casey Council, 2012). The change is even more exceptional if we examine data from 2006, when the City of Casey forecast a 102% increase in population from 2006 to 2036 and the Cardinia Shire forecast a 166% increase in population from 2006 to 2031 (Cardinia Shire Council, 2012; City of Casey Council, 2012). This contrasts with Victorian and Australian growth forecasts of 30% and 32% respectively from 2006 to 2026 (Australian Bureau of Statistics, 2008). This significant change in population will be accompanied by increasing cultural diversity. Although the City of Casey is of similar diversity to Victoria and Australia more generally, the population of the Cardinia Shire is predominantly of Anglo-Australian background (Australian Bureau of Statistics, 2012a, 2012b), with a 30% self-identified Australian background population, compared to the City of Casey, Victoria, and Australia at 22%, 23%, and 25% respectively. Moreover, 79% of the inhabitants of the Cardinia Shire are Australian born, compared to the City of Casey, Victoria, and Australia at 61%, 69%, and 70%. Finally, both parents of 61% of the residents of the Cardinia Shire were born in Australia, in contrast to the City of Casey, Victoria, and Australia at 38%, 50%, and 54%. The cultural uniformity that exists within the

36 Shire is expected to change dramatically with the increase in population, with an enormous inflow of culturally and linguistically diverse migrants and residents of neighbouring localities (Cardinia Shire Council, 2012). An unwanted by-product of such increased cultural diversity can be increased levels of racism and intergroup antipathy (Barlow et al., 2012; Dhont, Van Hiel, De Bolle, & Roets, 2012; Dunn, 2008; Hewstone & Swart, 2011; Pettigrew & Tropp, 2006). Allport (1954), in his seminal work on prejudice and racism, suggested that migration of a visibly different group into a given area can increase the likelihood of conflict. He added that the likelihood of conflict is greater the larger the group and the more rapid the influx. This proposition is reinforced by findings that indicate 10-33% of Australians have experienced some form of racism, a figure that surges to around 50% if we examine only non-English speaking background minority groups (Dunn, et al., 2009). Given the above, the Cardinia Shire was considered a highly relevant setting to explore racism in Australia. Research Development Project Partners The research presented within this dissertation was conducted as part of the degree requirements for a Doctor of Psychology in Clinical Psychology specialising in Forensic Psychology. Multiple parts of the inquiry that I describe in this thesis were conducted in collaboration with Windermere. Windermere was established in 1851 and has evolved to provide a broad range of programs across multiple service sites throughout the southern Melbourne metropolitan region. Windermere aims to improve the wellbeing of children, families, and communities by enhancing their potential, building resilience, and connecting them to the community. It is a leading community-based agency, with a long history of working in collaboration with all levels of philanthropists, government, and other service providers.

37 The research component of the Building Harmony project, which formed the crux of the validation stages of the research, involved several schools. Over a period of several months, following initial discussions with Windermere and multiple meetings with the Building Harmony Executive, a partnership between the Building Harmony project and my research venture was forged. Building Harmony had a clear need for an appropriate evaluation tool to assess the efficacy of the program activities in reducing the racist attitudes of participants. Correspondingly, I had a need for access to participants involved in an antiracism or pro-diversity initiative in order to provide validity evidence for the novel measure. The partnership enabled the Building Harmony Executive to run the project as intended and without compromise, with the data collected, collated, analysed, and presented in an annual report in exchange for the unrestricted utilisation of relevant data in the present dissertation. The schools already involved in Building Harmony activities endorsed my research and agreed to continue their ongoing participation in the project. However, to ensure the robustness of the appraisal, there was also a need for a control group school not already involved in the Building Harmony strategy. After reaching out to local primary schools with the support of the Building Harmony Executive, Pakenham Consolidated School provided access to their students for this purpose. The geographic context of the schools is notable. Whilst Pakenham Consolidated School is located in Pakenham, around five kilometres from the base of the Building Harmony project in Officer, the other schools involved in the initiative were within a two kilometer radius of each other, with three of the schools located on the same street. The schools involved in the Building Harmony activities comprised two nondenominational Christian schools, one Catholic school, one Islamic school, and one government funded school. Of these schools, Berwick Grammar School is an independent and non-denominational school with single gender (male) education from

38 Grade 5 to Year 12. Maranatha Christian School is a co-educational non-denominational school for Christian families with classes from Preparatory to Grade 6. St. Brigid‟s Catholic Primary School is a Catholic parish school that comprises pupils from Preparatory to Grade 6. Minaret College is a co-educational independent Islamic school providing curriculum for children from Preparatory to Year 12. Officer Primary School is a local government school (the only government school involved in Building Harmony activities) with students from Preparatory to Grade 6. The control group school was a local government funded school, Pakenham Consolidated School, with students from Preparatory to Grade 6. Participants Various participants were recruited for the wider research project, with the aim that each stage of the research would inform the characteristics of target participants. Participant characteristics are provided in greater detail in the relevant results chapters following, but a brief overview is offered here. Participants were selected to acquire a sample that typified the phenomenon under investigation (Patton, 1999). In this context, my initial objective was to understand racism from the perspective of school-aged youths. For this purpose, interviews were conducted with young Victorian individuals aged 15 to 20 years who were currently attending school and were from diverse racial, ethnic, cultural, and religious backgrounds. Two focus groups were completed each with additional youth of the same demographic characteristics. A pilot test was undertaken with eight Victorian primary school children. Pretest, post-test, and test-retest assessments were administered to 116 Victorian Grade 5 and Grade 6 primary school children involved in the Building Harmony anti-racism initiative. A local Victorian control group school provided 180 Grade 5 and Grade 6 children for pre-test and post-test as a comparison with Building Harmony participants. To ensure that the developed measure was useful for adolescents and adults as well as children, a community

39 sample comparison was undertaken with 402 participants (aged 15 years or over) recruited from around Australia. The inclusion of different populations aimed to gather support for the novel measure across children, adolescents, and adults. Recruitment Participants for the interviews, focus groups, and pilot test were recruited via advertising through Southern Integrated Culturally and Linguistically Diverse (CALD) Child and Family Services Network organisations, a network managed by the partner organisation (Windermere). The network consists of approximately 80 child and family service organisations that provide care and support to people in the southern region of Melbourne, including government agencies and departments (e.g., Victoria Police, Centrelink, Department of Human Services, and TAFE colleges), peak NGOs (e.g., Catholic Care and Anglicare), and local frontline agencies (e.g., Wellsprings for Women and Hanover). Participants for the pre-test, post-test, and test-retest were already involved in Building Harmony. Participants from the control group school were recruited through a local primary school not involved in the Building Harmony project. Participants for the community sample comparison were recruited via advertising in local print and in online newspapers nationwide, in local schools, via the radio, and through the Southern Integrated CALD Child and Family Services Network organisations. Research Procedure The general procedure of this research was based predominantly on the comprehensive and detailed guidelines provided in DeVellis (2012), on other scale development textbooks (e.g., Furr, 2011; Loewenthal, 2001), general statistical textbooks (e.g., Salkind, 2006; Tabachnick & Fidell, 2007), published peer reviewed psychological scale development guidelines (e.g., Clark & Watson, 1995; Smith & McCarthy, 1995), partly on recommendations made by authors who have previously developed measures of attitudes,

40 and partially on recommendations made by various psychology statisticians.1 Recommendations from the various aforementioned sources were integrated into the following stages in the scientific process of constructing and validating a scale to provide a clear guide about the step-by-step process undertaken throughout the research. 1.

Comprehensively review the literature to deeply understand the concept that is to be measured and evaluate previous attempts at quantification of the concept.

2.

Conduct interviews and/or focus groups with individuals from a population with knowledge relevant to the concept that is to be measured to add a lay perspective to academic understandings of the construct.

3.

In consultation with experts in the field and based on the above develop a preliminary set of questions which potentially evaluate the concept that is to be measured.

4.

Conduct interviews and/or focus groups with the initial set of questions to evaluate their clarity and comprehensiveness.

5.

Refine the measure based on comments and feedback from the interviews and/or focus groups.

6.

Have the measure reviewed by experts in the field and refine as required.

7.

Pilot test the measure with an appropriate population to ensure each question is understood as intended and amend as required.

8.

Assess the factor structure of the measure to evaluate whether the measure taps a unitary construct or if it is made up of multiple components.

9.

Assess the internal consistency of the measure and subscales to evaluate how well each item relates to each other item in the scale.

10.

Assess the test-retest reliability of the measure to evaluate how stable the measure is over time.

1

Professor Grahame Coleman, Melbourne University and Monash University; Professor Ray Adams, Melbourne University; and Associate Professor John Reece, RMIT University.

41 11.

Assess the convergent validity of the measure to evaluate how strongly the measure is related to concepts it would be expected to be related to, or alternatively how strongly results from two groups which would be expected to have similar results are related.

12.

Assess the discriminant validity of the measure to evaluate how strongly the measure is related to concepts it would be expected to not be related to, or alternatively how strongly results from two groups which would be expected to have different results are related.

13.

Assess the predictive validity of the measure to evaluate how well the measure is able to predict future outcomes.

14.

Repeat steps 8-13 with alternate populations and research teams to confirm results. Analytical Techniques and Research Design Psychological research has tended to focus on intergroup and intraindividual levels of

racism, utilising both quantitative and qualitative research methods. This research has led to calls for social scientists to devote increasing energy to researching the dynamics of racial attitudes and their underlying emotional, cognitive, and developmental influences (Utsey, et al., 2008). Clearly, comprehensive understandings of racism are core to the development of efficacious anti-racism and pro-diversity strategies. Prior qualitative and other languagebased (e.g., discourse analytic) research argues that racism continues to support, perpetuate, and legitimate prevailing inequalities and reinforces contemporary conceptualisations of racist attitudes as subtle, flexible, ambivalent, and entrenched in our societal values and norms (Sanson, et al., 1998). Quantitative research has accordingly detailed the significant detrimental impacts of such racist attitudes and the consequent racist behaviour (Paradies, 2006b). As there is stronger validity in utilising multiple methods (Mullan, Todd, Chatzisarantis, & Hagger, 2014; Tuffin, 2008), both qualitative and quantitative research was required to provide a complete understanding of racism, its development, and its impacts.

42 Combining approaches is particularly advantageous when examining novel, or less well understood, research areas because a mixed-methods approach strengthens the overall research by reducing the impact of limitations inherent in the individual research designs (Geurts & Roosendaal, 2001; Mullan, et al., 2014). Qualitative methods provide data that can deliver deep and valuable insights into racism as a construct, its development, and its experience, whereas quantitative methods allow for the statistical exploration of racism, its prevalence, differences among groups, and its relation to other concepts. Before racism could be quantitatively examined, qualitative research had to inform its measurement. Drawing upon a mixed-methods approach utilising initial interviews, focus groups, and pilot testing to elicit questionnaire content which consequently led to the use of correlational and experimental designs was considered integral to enhance the validity of the current research. Validity has long been considered the most crucial facet of psychometric standards (Furr, 2011; Furr & Bacharach, 2008; Parry & Crossley, 1950). Regardless of the strength of the design, response and participation rate, method of analysis, or researcher expertise, if a measure is not valid for the purpose for which it is utilised, the results are meaningless. Importantly, a measure in and of itself cannot be valid for all purposes: the manner in which the instrument is utilised and how responses are interpreted lead to a determination of whether the measure is valid for a particular purpose (Furr & Bacharach, 2008). Within the area of racism research, many studies purport to have established the validity of employed instruments, but results are frequently developed from measures with weak psychometric properties. This limitation is especially evident when exploring the relationship between health and perceived racism among children, with a majority of studies relying upon tools that had been validated for use with adults or youth from another age group (Pachter & Coll, 2009). Questions have also been raised about the psychometric properties of measures used in adult perceived racism research (Paradies, 2006b). As highlighted and discussed in greater

43 detail in Chapter 3, these methodological issues are highly prevalent in the measurement of explicit racist attitudes. The methodological and ethical implications of drawing interpretations from instruments with significant limitations are concerning and lead to questions of the validity of the reported results. The meaning and appropriateness of a scale in one context significantly differs when results are derived from a different context or population: clearly psychometric standards must be re-examined before dissemination to a distinct population from the one the measure was developed with (Furr, 2011). There are also ethical implications associated with the improper use of measurement tools, particularly in research with children or in relation to sensitive topics such as racism. Inappropriate instrument usage leads to invalid results, specifically under‐ or over‐reporting of attitudes or behaviours. This can result in under‐ reporting of negative and harmful attitudes and behaviours, or to over-labelling individuals as holding racist attitudes. Both situations have momentous implications for the reduction of racism in society, reinforcing the ethical responsibility of researchers to adhere to strict standards of practice. Achieving a robust standard of validity in a study of racism is imperative both in terms of data quality and ethical integrity. The present research was largely focused on quantitative assessment of racist attitudes. However, interpersonal relationships and attitudes can be difficult to access and assess using this approach. The employment of qualitative methods including interviews, focus groups, and cognitive interviewing during the development and piloting process provided context to the quantitative data, but also served to build construct validity for the newly developed measure. The implementation of these methods is detailed in Chapter 4.1. Although there is now considerable quantitative research on racism and racist discourse in Australia (Augoustinos, et al., 2005; Pedersen, Dunn, Forrest, & McGarty,

44 2012), most work has examined, through quantitative methods, the prevalence, causes, and effects of racist attitudes of White towards Black Americans. Comparatively little attention has been devoted to common-sense understandings and lived experiences (Figgou & Condor, 2006). Differences in context and cultural milieu preclude direct extrapolation of US findings to Australia (Pedersen, et al., 2004), and there has been little work on the experiences of and attitudes underpinning racism in Australia (Pedersen, et al., 2012). Still less investigation has addressed these attitudes in young people (c.f. McLeod & Yates, 2003; Poole, 1975). It was therefore considered important for further qualitative research to be conducted to examine the complexities of racism across multiple racial, ethnic, cultural, and religious groups among young Australians. Interviews were utilised as the primary method of initial data collection; this method is a key technique for undertaking qualitative research (Dilley, 2000). Due to the inherent disconnect between adult and youth conceptualisations and experiences of racism, it was vital for age-appropriate methods to be drawn upon to inform the qualitative data collection (Greene & Hogan, 2005). Put simply, in order to gather accurate information about youth, direct and youth-centred methods must be utilised. The qualitative interview is widely accepted as youth‐friendly and enables collection of rich and nuanced data from participants (Irwin & Johnson, 2005). However, concerns have been raised by selected academics about the ability of children to understand and interpret the questions being posed in the manner desired by the researcher (Short & Carrington, 1996). Even researchers who have undertaken successful interviews with children under age 10 years have noted that conceptualisations of racism in youth became more elaborate with age (McKown, 2004). Moreover, some investigators have expressed doubt that children‟s focus groups are efficacious and useful, whilst others have characterised them as unproductive and misrepresenting of the attitudes of participants (Kennedy, 2001; Palmer, 1990). Yet, young children learn about race in

45 complex, multifaceted, and interesting ways (Riggs & Augoustinos, 2006), and children and youth have become viewed as possessing a level of competence that allows them to understand, process, and articulate their experiences for the purposes of racism research (Singh, 1996). Additionally, previous qualitative research utilising both interviews and focus groups has successfully been completed with youths of various socioeconomic and racial, ethnic, cultural, and religious backgrounds (Dulin-Keita, Hannon, Fernandez, & Cockerham, 2011). Therefore, to enhance the interview data, focus groups were utilised to provide an additional layer of understanding not able to be gathered in one-on-one interviews. The ideal number for effective focus group analysis ranges from four to twelve people (Seal, Bogart, & Ehrhardt, 1998). Furthermore, focus groups involving adolescents have been recommended to contain no more than six to eight participants (Morgan, 1998). The number of participants in each focus group was therefore designed to be between four and eight. In following the path set by Riggs and Augoustinos (2005), I believed it important to be identified and identifiable as an Australian born non-Indigenous person throughout the abovementioned qualitative component of my research. Such identification is important in the context of cross-cultural face-to-face work, such as that conducted as a core element of this research, due to the potential for respondent and interviewer bias (Leal & Hess, 1999). Although it is impossible to know how participants would have responded had I been of an alternate racial, ethnic, cultural, or religious affiliation, it is probable that my presentation as a non-Indigenous Australian investigating racism in Australia impacted upon participant responses in some manner. Indeed, previous research has demonstrated the effect of the background of the interviewer has on participants of distinct backgrounds (Davis & Silver, 2003). I also echo Solorzano, Ceja, and Yosso (2000) in my belief that examining and exploring racism through interviews and focus groups would enable victims of racism to

46 become empowered. In reality, several interview participants commented on the positive experience of being interviewed about their experiences with racism. It was similarly clear in each focus group that group members had their opinions and experiences reinforced and validated, but also challenged and contradicted, therefore deepening their understanding of their own experiences, attitudes, and beliefs, whilst correspondingly normalising their experiences and consequent responses. Importantly, no participant became noticeably distressed, lodged a complaint, or sought assistance or counselling after participation in the qualitative component of the research, despite the discussion of sensitive, frequently unpleasant, and often negative experiences. Chapter 4.1 describes the interview and focus group procedure and outcomes in greater detail and Appendix 4 presents the semi-structured interview schedules utilised throughout the qualitative component, but an overview is provided here. All interviews were held in a private consultation room on the university campus (Clayton or Caulfield campus respectively 25km and 12km south east of Melbourne city) or at the head office of Windermere in Narre Warren 45km south east of Melbourne city. All focus groups were conducted in private consultation rooms: one at Monash University Clayton campus (FG1), one at a Leongatha Secondary College 130km south east of Melbourne city (FG2), and the third at the Cardinia Shire Council local government offices 62km from Melbourne city (FG3). Participants predominantly engaged well, warmed quickly to my questioning, and appeared to present open and honest accounts of their conceptualisations and experiences. Once data were collected, I transcribed data driven by conventional content analysis. Subsequent reviews of the selected quotes and salient themes with my research supervisor ensured that the quotations were representative of the overall data. The qualitative data drawn from the interviews and focus groups were supplemented by an extensive and comprehensive literature review of the racism literature to create the

47 preliminary measure. Over 500 statements utilised in previous predominantly adult oriented research were examined to ensure that all prior conceptualisations of racism were considered in developing the current measure. Once developed the preliminary measure required pilot testing. In order for a questionnaire to have validity, respondents must interpret the item questions in the way that the researchers intend (Furr & Bacharach, 2008). This objective can prove challenging, particularly when designing a quantitative measure for use across children, adolescents, and adults. The inherent disconnect between the social world of the adult and the child can result in verbal, developmental, and cultural oversights in the research design process. A direct study of the question‐and‐answer process known as cognitive interviewing helps to build validity by asking respondents to speak their thoughts aloud as they complete survey items (Collins, 2003; Willis, 2005). Drawing upon this process, one‐on‐ one cognitive interviews were undertaken with children as they pilot tested the measure. This strengthened the content validity of the questions by allowing exploration of how child respondents understand the items and confirming they interpreted the questions in the intended manner, consistent with adolescents and adults (Beatty & Willis, 2007). Inconsistent reliability and validity among existing measures of racism (appraised in Chapter 3) signals the need for a more rigorous standard of assessment when developing and validating such tools; this was considered integral when developing the present survey. In addition to the construct and content validity provided by drawing upon existing measures via a literature review complemented by the experiences and conceptualisations of real people, validity was further built through a variety of methods including convergent, discriminant, predictive, and factorial validation. The secondary quantitative components of the research, detailed in Chapters 4.2, 4.3, 4.4, and 4.5, were dictated by accepted cut-off points and guided by a robust methodological approach to instrument design and validation.

48 Within the social science literature, Furr and Bacharach (2008) argue that there is a general lack of systematic rigor when establishing the reliability and validity of a measure. To ensure formidable reliability and validity of the developed measure, I relied upon both Classical Test Theory (CTT) and Item Response Theory (IRT) approaches. Quantitative measure development studies have traditionally relied on CTT‟s statistical tests and rules of measurement as the gold standard for good practice. This is partially due to the historically pervasive use of CTT, the fact that it is widely taught in universities, and the reliance on relatively weak theoretical assumptions which allow CTT to be easily implemented in a variety of testing situations (Fan, 1998; Sharp, Goodyer, & Croudace, 2006). In contrast, IRT is a method that has been increasingly recognised as theoretically and methodologically sophisticated and that is particularly valuable for use with instrument development. Over the past few decades, IRT has grown progressively popular among psychometricians due to a strong theoretical base and methodological rigor. Recently, many individual intelligence tests, personality tests, attitude measures, and behavioural ratings have begun to adopt IRT models, making it the favoured approach in the fields of educational and psychological assessment (Embretston & Reise, 2000). Despite the many theoretical and methodological benefits, IRT continues to receive comparatively little attention within the mainstream psychology and social science literature and it remains uncommon in the development of child and adolescent measures (Sharp, et al., 2006). Moreover to my knowledge, in the field of youth racism an IRT approach has never before been utilised to develop a published instrument. IRT emphasises that both the qualities of the individual and their relationship to the construct, and the qualities of the item, are key influences on item responses (Furr & Bacharach, 2008). The core underlying theory is that there is a differential effect of item „difficulty‟ on individuals at different trait levels (Furr & Bacharach, 2008). Assuming all

49 items represent the same construct, „difficult‟ items will be answered „correctly‟ less often than will „easy‟ items (Embretston & Reise, 2000). When given a test item, there is a probability that an individual will answer the question in a positive way, depending on their trait level. In the instance of a typical test item, the probability will be small for individuals with a low trait level and will be large for individuals with a high trait level. For example, on a hypothetical measure of racist attitudes, of the two items “I hate people from other backgrounds” and “I have some minor racist tendencies,” the former is considerably more „difficult‟ to endorse and would be expected to be sanctioned only by individuals high on the trait of racism. Conversely, the latter item would be expected to be endorsed by individuals much lower on the trait of racism as well as those moderate and high on the trait of racism. Therefore, ratification of each item provides different information about individuals with differing levels of the underlying trait of racism. In contrast, CTT tends to ignore such factors by treating each item as the same „difficulty‟ and ignoring differing response patterns of individuals of distinct trait levels. Consequently, IRT can be utilised to perform advanced analytical techniques which evaluate the differential effects of item „difficulty‟ and individual trait level that are not otherwise available within a CTT framework. The IRT approach also considers the fact that a particular test might have stronger psychometric qualities for some participants than for others (Furr & Bacharach, 2008). The capability to discriminate between individual test items as well as between individual test takers distinguishes IRT from CTT and allows for a more nuanced and sensitive analysis of quantitative data (Furr, 2011). The IRT approach is considered superior to CTT for several reasons. While in CTT, longer tests are always more reliable due to the nature of internal consistency, shorter measures in IRT can be more reliable (Embretston & Reise, 2000). As the present research aimed to develop an instrument able to be widely disseminated and utilised by children, adolescents, and adults, a brief questionnaire was desirable. Lengthy paper and pencil surveys

50 have been demonstrated to be less child-friendly, can increase drop-out and incompletion rates, and are less likely to be widely utilised regardless of test-taker age (Morrow & Richards, 1996). In IRT, a participant‟s true position on the latent variable is not dependent on the specific set of items administered, while the measurement properties obtained are not dependent on the sample studied, eliminating the circular dependency of CTT and allowing for more robust test reliability. Furthermore, IRT allows the concepts of context and variance to be constructive rather than destructive in measurement development. Rather than generalise or summarise standard error, IRT allows the development of instruments and scales that maximally differentiate participants, either across the entire range of the latent trait or on a particular area of the latent trait (Embretston & Reise, 2000). IRT accepts that there exists a relationship between person properties and item properties, enabling the impact of context to be recognised within quantitative measures. I therefore determined that the use of an IRT approach in the analysis of quantitative data would contribute to robust validity. As noted above, IRT is often considered to be psychometrically superior to CTT methods and is capable of improving the precision and validity of psychological measurement (Furr & Bacharach, 2008; Reise, Ainsworth, & Haviland, 2005). Nevertheless, both IRT and CTT methods have significant advantages and limitations, with certain statistical approaches more advantageous than others depending on the research purpose (DeVellis, 2012). Indeed, IRT (or other statistical approaches such as CTT) are not inherently superior to one another, which has led to a recommendation that the IRT and CTT be used interdependently to evaluate psychometric properties (Embretson & Hershberger, 1999). To ensure that the novel measure had undergone the most complete exploration of its psychometric properties, I therefore considered it appropriate to refine the measure by initially utilising CTT and consequently examining the measure using IRT analysis to confirm factorial validity.

51 Evaluation of the factor structure of the measure was initially performed utilising CTT techniques including an exploratory Principal Components Analysis (PCA), an Exploratory Factor Analysis (EFA), and a Confirmatory Factor Analysis (CFA). Additional evaluation of the factor structure of the measure was subsequently performed utilising IRT techniques including a unidimensional Rating Scale Model (RSM) Rasch Analysis and a multidimensional RSM Rasch Analysis. Various additional CTT techniques were utilised throughout to examine the internal consistency, test-retest reliability, convergent validity, discriminant validity, and predictive validity of the new tool. The specific analyses drawn upon at each stage of the research are discussed in detail in Chapters 4.2, 4.3, 4.4, and 4.5, but a synopsis of each analytical technique is supplied below. Evaluation of internal consistency demonstrates how well each item relates to each other item in the scale. Evaluation of the internal consistency of the measure was performed at each stage of data collection using Cronbach‟s Alpha in addition to inspection of corrected item-total correlations. Nunnally (1978) has indicated that an acceptable Cronbach‟s Alpha is above .70. DeVellis (2012) has indicated that acceptable item-total correlations are above .20. Assessment of test-retest reliability demonstrates how stable the measure is over time. Evaluation of test-retest reliability was performed via the use of Pearson‟s r correlations and paired samples t-tests. Correlations were expected to be in the very high to nearly perfect range (.70-.90; Cohen, 1988) and the paired samples t-tests were expected to show no significant difference between the scores at time one and time two. Assessment of convergent validity demonstrates how strongly the measure is related to concepts it would be expected to be related to or alternatively how strongly results from two groups which would be expected to have similar results are related. It can be built by comparing the results of the newly developed quantitative instrument with results from

52 reputable measures of the same construct. Equally, such validity can also be appraised by comparing the results of two similar groups on the novel measure. Evaluation of convergent validity was performed via the use of Pearson‟s r correlations. For evaluations of convergent validity correlations was expected to be in the upper end of the moderate to very high range (.30-.70; Cohen, 1988). No assessment of convergent validity via the examination of the results of two similar groups was undertaken. Assessment of discriminant validity demonstrates how strongly the measure is related to concepts it would be expected to not be related to, or alternatively, the strength of the relationship of results from two groups which would be expected to have different results. Such validity can be built by comparing the results of the developed quantitative measure with results from reputable measures of a distinct construct. Similarly, such validity can also be appraised by comparing the results of two distinct groups on the novel measure. Evaluation of discriminant validity was performed via the use of Pearson‟s r correlations, Analyses of Variance (ANOVAs), Analyses of Covariance (ANCOVAs), and independent samples t-tests. For evaluations of discriminant validity, correlations were expected to be in the lower end of the moderate to very high range (.30-.70; Cohen, 1988) and the ANOVAs, ANCOVAs, and independent samples t-tests were expected to show a significant difference between the scores of two groups. Assessment of predictive validity demonstrates the degree to which a measure can predict or determine performance or behaviours in the „real world‟ or how well an instrument is able to predict future outcomes. Evaluation of predictive validity was performed via the use of Pearson‟s r correlations and a paired samples t-test. Correlations were expected to be in the moderate to very high range (.30-.70; Cohen, 1988) and the paired samples t-test was expected to show a significant difference between scores at pre-intervention and postintervention.

53 Achieved Research Timetable Ethics approval to commence the research was granted by the Monash University Human Research Ethics Committee (MUHREC) in November 2011 (all human research ethics material is presented in Appendix 2), enabling me to undertake semi-structured interviews and focus groups. Interviews and focus groups were completed December 2011 January 2012 and January 2012 - February 2012 respectively. Ethics approval for the remaining stages of the investigation was granted by the Department of Education and Early Childhood in March 2012 and consequently by MUHREC in April 2012. Pilot tests were conducted in early April 2012. The preliminary measure was subsequently implemented in the Building Harmony project at the end of April 2012. Due to timeframes outside my control, the implementation of the Building Harmony project activities could not be delayed past the end of March 2012. Execution of the pre-test assessment was one month after the planned implementation date and after the commencement of the Building Harmony activities. This delay therefore partially invalidated the pre-test assessment and the appraisal of the Building Harmony activities. The post-test assessment was implemented as planned in September 2012. Due to the initial delay the implementation test-retest assessment was also postponed from the 1st-2nd quarter of 2012 and implemented in September 2012, but there is no reason for this change to have impacted the test-retest results. Correspondingly, data collection for the community sample participants commenced in April 2012 and was finalised by the end of March 2013. Throughout 2012, preliminary analysis of the qualitative and quantitative data was completed. Final analysis of the qualitative data was completed by the end of 2012. Final analysis of the quantitative data commenced after data collection had been completed at the beginning of April 2013 and was completed by the end of 2013.

54 Scale Development Procedure Sensitive item construction is central to the development of explicit measures of racist attitudes, especially where historical and social contexts impact upon the expression of racism (Echebarria-Echabe & Guede, 2007). An extensive item construction, selection, and refinement process was therefore undertaken. The full scale development process, including the refinement and pilot testing procedure, is outlined in Chapter 4.2. Based on interviews with 13 youths aged 15-20 years and a comprehensive review of the literature, an initial item pool was developed. Salient conceptualisations of, and experiences with, racism for young people in Australia were examined to ensure that the measure would be relevant to the Australian context. Exploring how racism was understood and experienced by individuals of diverse racial, ethnic, cultural, and religious backgrounds within the Australian context was considered to be essential due to prior research findings that have demonstrated the differentiated conceptualisation that individuals of different backgrounds develop and express (McKown, 2004). For the purpose of my research, I considered „race‟ to include racial, ethnic, cultural, and religious background. Subsequently racism was deemed to be any differential treatment based upon racial, ethnic, cultural, or religious background. Conversely, racial acceptance was regarded to be the equal treatment of individuals and groups regardless of racial, ethnic, cultural, or religious background. I believed it important to explore both positive (accepting) and negative (racist) attitudes, which have been found to be functionally independent (i.e., positive attitudes are stronger predictors of positive behaviours and negative attitudes are stronger predictors of negative behaviours) (Pittinsky, Rosenthal, & Montoya, 2011) and conceptually distinct (Phillips & Ziller, 1997). Low- and high-prejudiced persons have been distinguished by the ability of low-prejudiced individuals to be aware of racial, ethnic, cultural, and religious stereotypes, yet disregard them in preference for their consciously controlled and activated,

55 stereotype-inconsistent beliefs (Devine, 1989). Despite these and similar findings, the most commonly used scales are designed to measure racism as a unipolar concept (e.g., Pettigrew & Meertens, 1995; Quillian, 1995; Wittenbrink, Judd, & Park, 1997). Not only do these instruments fail to directly measure low racism, but acceptance is often not even in the realm of consideration (Phillips & Ziller, 1997). There is tenuous evidence that acceptance consists of one underlying construct and lies in a different, yet related, domain from racism, with the former judging minority and non-minority individuals on an equal basis, showing little evidence of discrimination based on race, and the latter utilising race as a basis for discrimination (Phillips & Ziller, 1997). Conversely, other investigations have proposed acceptance to be closely linked to lack of racism (Berry & Kalin, 1995). Akin to prior findings on low-prejudice individuals, accepting individuals have been found to be aware of racial differences, but have an overriding tendency for selectively attending to and accentuating similarities (Phillips & Ziller, 1997). The present research was informed by these findings and I considered racism and acceptance to function along a single bidirectional continuum. I therefore attempted measurement of both acceptance and racism correspondingly. To ensure that the research project was strengths-based and focused on improving intergroup relations, rather than concentrating on the deficiencies of current intergroup relations, measuring increases in acceptance was prioritised and diametrically opposed to decreases in racism (i.e., increases in acceptance were considered to indicate decreases in racism). Another major concern was social desirability. Socially desirable responding is considered important to assess and is often included in addition to the primary measure of interest when scales address potentially uncomfortable or provocative topics (Anastasi & Urbina, 1996; Loewenthal, 2001). More specifically, in attempting to quantify constructs related to racism and acceptance, previous research has highlighted that responses may be

56 contaminated by socially desirable responses (Janus, 2010; Phillips & Ziller, 1997). Selfreports can be influenced by a range of additional psychological and sociodemographic factors that are not well understood (Major, et al., 2002). Alternative approaches include indirect or implicit measures. However, there are clear methodological limitations to such approaches (Blanton & Jaccard, 2008; Bonilla-Silva, 1997). I amended a shortened version of the Marlowe-Crowne Social Desirability Scale (MCSDS; Strahan & Gerbasi, 1972) for use with Australian youth (MCSDS-A) and included it in the current survey to assess and allow adjustment for self-presentation bias. Scale Refinement and Pilot Testing Procedure The initial item pool was presented to two experts in the racism field for review of appropriateness, comprehensiveness, redundancy, and clarity of the items. Several items were re-written, removed, and added. The initial item pool was then reduced to 40 statements by the research team consisting of the following themes: 

Comfort: three items



Safety: three items



Security: two items



Acceptance: four items



Judgement: three items



Treatment: five items



Friendships: three items



Understanding: two items



Racist Acts: three items



Accepting Acts: three items



Structural Racism Beliefs: three items

57 

School Related: three items



Existence of Racism: one item



Assimilation Beliefs: two items

At this stage the preliminary scale contained 15 items with higher scores indicating greater acceptance and 25 items with higher scores indicating less acceptance. Items were reworded to ensure a balance of higher scores indicating greater acceptance (lower racism) and lower scores indicating less acceptance (higher racism). The preliminary scale of 40 items and 10 social desirability items was then randomised using a random number generator. The preliminary scale was presented to focus groups (three focus groups with a total of 17 participants) of youths aged 14-22 years for review of appropriateness, comprehensiveness, redundancy, clarity, and developmental appropriateness of the items. Based on responses from focus group participants, a preliminary scale reliability analysis was performed and indicated that no items necessitated removal. Focus group data fed into the development of the preliminary items, but participant understandings essentially confirmed the conceptualisations established from the individual interviews and no entirely new items were added based on focus group data. The preliminary scale was presented to six experienced primary school principals for review of the developmental appropriateness of the items. No changes were recommended to items or instructions. The scale was also provided to an experienced clinical child psychologist for review. Introductory instructions were expanded and simplified to ensure understanding for young children. No items were recommended to be removed or re-written. The preliminary scale was pilot tested with eight children aged between 9-12 years for review of clarity and developmental appropriateness of the items. Cognitive interviewing techniques (Collins, 2003; Willis, 2005) were utilised to ensure that each item was understandable for the target age group and that the intended meaning of each item was

58 maintained when young children responded to the survey. Participants were asked to read each question and verbalise their thoughts. Participants were then asked to explain what the question was asking and what the definition of any key terms or words were. Participants did not read questions aloud, as this was not how future testing was intended to occur. Based on focus groups and pilot test no items were removed, but some were re-worded. Scale Reliability and Validity Testing The scale was implemented at pre-test, post-test, and test-retest of the Building Harmony initiative. Implementation also occurred at pre-test and post-test with a control group school. Results provided evidence for the validity of the measure by discriminating between Building Harmony participants compared to the control group school. The test-retest was implemented to assess stability of the measure. The survey disseminated at this stage of the research also included the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997), a measure of social, emotional, and behavioural strengths and difficulties. This instrument was utilised to examine relationships between acceptance and social, emotional, and behavioural strengths and difficulties, which provided evidence for the convergent validity of the measure. The scale was correspondingly disseminated to individuals aged 15 years and older, together with questions about the individual‟s offence history, an existing measure of racist attitudes (Dunn & Geeraert, 2003), and the Minnesota Temperament Inventory (MTI; Loney, et al., 2007), a measure of temperament and psychopathic traits. The relationships between the measure of temperament, the existing measure of racist attitudes, and the measure of acceptance demonstrated further evidence for convergent validity. Based on the data from both the Building Harmony project and the community sample, evaluation of the latent structure of the measure was undertaken using both CTT and IRT. These analyses ensured that only the most meaningful items were included in the final

59 measure. After data collection had ceased subsequent analyses ensued initially to refine and then to establish the reliability and validity of the new scale. In the following two chapters, I outline the main results of the present inquiry gathered through the abovementioned process. Chapter 3 provides an evaluation of the current state of the research evidence by examining the characteristics of existing measures of explicit racist attitudes. Chapter 4.1 consists of a research article that summarises the qualitative component of the research and presents the conceptualisations and experiences of racism offered by Victorian youth obtained from interviews and focus groups. Chapter 4.2 comprises a research article that describes the development of the measure and its preliminary validation, including internal consistency and factorial and convergent validity, in children, adolescents, and adults from around Australia. Chapter 4.3 encompasses a research article that provides further evidence for the factorial validity of the novel scale drawing upon IRT in children, adolescents, and adults from around Australia. Chapter 4.4 consists of an examination of the efficacy of an anti-racism and pro-diversity initiative implemented in Victorian primary schools, which provides support for the internal consistency and test-retest reliability of the measure as well as convergent, discriminant, and predictive validity in children. Finally, Chapter 4.5 details an examination of the relationship between psychopathic personality traits and racist attitudes in adolescents and adults from the Australian community, providing further evidence for convergent and discriminant validity.

60 Chapter 3: Measures of Explicit Racist Attitudes – A Literature Review Measures of Explicit Racist Attitudes – A Literature Review ........................................................ 61 Method ................................................................................................................................................. 62 Results .................................................................................................................................................. 66 Characteristics of Studies Documenting Explicit Measures of Racist Attitudes ........................... 66 Results of Studies Documenting Explicit Measures of Racist Attitudes ........................................ 85 Results of Most Cited Studies Documenting Explicit Measures of Racist Attitudes .................... 94 General Discussion .............................................................................................................................. 96 Limitations ........................................................................................................................................... 96 Conclusion ........................................................................................................................................... 97

61 Measures of Explicit Racist Attitudes – A Literature Review Attitudes inform all racism research. An attitude is an enduring pattern of evaluation toward a psychological object, an individual‟s tendency and mental readiness to act or react in a certain way (Colman, 2003; DeFleur & Westie, 1963). An attitude is composed of affect, behaviour, and cognition. A racist attitude can therefore be defined as an enduring pattern of feeling, thinking, and behaving negatively towards an individual or a group due to their racial, ethnic, cultural, or religious group membership. Various studies have shown notable relationships between attitudinal measures and overt behaviour dependent upon important moderators, and this appears to be common to both high and low prejudice individuals (Forgas, et al., 2010; Warner & DeFleur, 1969). The end goal of racism research is to reduce racist attitudes to consequently reduce racist behaviour. Racism research is therefore based on the assumption that racist attitudes can be changed and that such attitudes can be measured. Racism research has historically concentrated on two alternate and distinct methods of measurement. The majority of investigations examine the effects of racism by concentrating on victims of perceived racism. Such inquiries evaluate the frequency and intensity of racist events an individual has experienced (for reviews see Bastos, 2011; Harrell, et al., 2003; Paradies, 2006b; Pascoe & Richman, 2009; Quillian, et al., 2006; Williams, et al., 2008). Less attention has been given to exploring racism centred on the level of racist attitudes held by an individual. Research into measures of racist attitudes has been concentrated in the US, and has focussed on attitudes towards African Americans. This is problematic for societies outside the US when US measures are used in the absence of locally validated measures. Due to their specificity, existing scales may not necessarily be relevant, generalisable, valid, or useful in alternate settings. Despite an abundance of developed instruments, most explicit measures of racist attitudes have not been appropriately validated. The value of evaluating the existing

62 literature is to expose the weaknesses and identify the strengths of racism research conducted to date. This provides a foundation for the development of psychometrically reliable and valid measures that are superior to existing instruments and more practically to inform policy and government programs aimed at reducing racist attitudes and increasing racial, ethnic, cultural, and religious acceptance. Method A search of the Web of Knowledge database search engine was conducted using the terms: Topic=(measur* OR survey OR questionnaire) AND Topic=(raci*) from the earliest records in the available databases initially until the end of 2010. This was updated on multiple occasions and was finally extended until the end of June 2013 to ensure that the most up to date data informed all of initial to later components of the overall research. This was refined to include only English language journal articles in the following categories: sociology, psychology, psychology social, psychology educational, psychology multidisciplinary, psychology applied, psychology experimental, psychiatry, social work, social sciences interdisciplinary, social issues, religion, and ethnic studies. The inclusion criteria for this review required studies to be: (i) empirical; (ii) quantitative; (iii) a questionnaire/interview with multiple forced choice options; (iv) the first published use of an explicit measure of racist attitudes and/or the development study of such a measure; and (v) a document other than a book, thesis/dissertation, or conference proceeding. A sequential process of examining the title, abstract, and main text of each article was undertaken, with documents excluded at each stage. The reference lists of all articles meeting the inclusion criteria, and those of previous conceptual and statistical review articles, were also examined for further relevant studies, which were in turn acquired and checked against the above inclusion criteria. The process of inclusion is displayed in Figure 1 below.

63

Search Results

Exclusion Process

Additional Documents

• Web of Knowledge databases initial search results: 13,940 • Web of Knowledge databases initial refinement search results: 12,904 • Web of Knowledge databases second refinement search results: 3,239

• Documents excluded after titles inspected: approximately 2,000 • Documents excluded after abstracts inspected: approximately 1,000 • Documents excluded after main text inspected: approximately 200

• Additional documents included: approximately 75

Figure 1. Process of inclusion of articles in current review (final number selected 113).

Although racism, by definition, is the differential treatment of an individual or group due to racial group membership, instruments were included in the current review if they assessed racism but did not precisely conform to this definition. For example, the categories „Black‟ and „White‟ occur in several studies presenting measures of explicit „racist‟ attitudes (e.g., Brigham, 1993; Katz & Hass, 1988), although neither „Black‟ nor „White‟ are specific racial labels. The category „White‟ is predominantly used to denominate an individual of European origin. Conversely, the classification „Black‟ is principally used in US studies to label an individual of African origin. The definition of a measure of explicit racist attitudes was expanded to include these various studies. Instruments assessing attitudes toward specific religious groups (e.g., Muslims, Jews, etc.) were also considered to assess racism for the purpose of the current review due to contemporary racialisation of religion. Conversely, instruments which presented scales only concerned with social distance (e.g., Bogardus, 1933) were excluded from the analysis. The decision to omit scales of this

64 type was made because, although social distance measures have traditionally been considered an accurate proxy measure of racist attitudes, they are inaccurate when utilised alone (e.g., Gallois, Callan, & Parslow, 1982; Jackman, 1973). Moreover, they do not necessarily reflect the racist attitudes of the participant. Social distance is also often difficult to quantify (Qian, 2002) and such items are no longer considered acceptable for use as the sole measure of racist attitudes (e.g., Byrnes & Kiger, 1988; Hagendoorn & Sniderman, 2001). Instruments which only assess the differential categorisation of negative and positive adjectives to distinct races (e.g., Blake & Dennis, 1943; Katz, 1933; Wolsko, 2000) were also discounted. These instruments principally utilise positively valanced personality traits such as „honest‟ and „friendly‟ and negatively valanced traits such as „dishonest‟ and „mean‟; participants subsequently note which traits are most common to distinct groups, with differential categorisation providing evidence of stereotypical beliefs. However, these tools do not measure a broad spectrum of racial attitudes and appear only to assess personality stereotypes, which may be held without underlying racist attitudes (Islam & Jahjah, 2001). Furthermore, most studies that have explicitly examined the stereotype-prejudice relationship have reported low correlations (Stephan & Stephan, 1996). Research focussing on specific aspects of racial attitudes, such as social distance and differential categorisation, has been increasingly uncommon, although some researchers continue to use these measures in tandem with more sophisticated instruments (e.g., Hagendoorn & Sniderman, 2001). This suggests a systemic improvement in the field of racism research towards more complex understandings of racist attitudes. Some instruments are presented in publications as modified scales and might be considered to be a novel measures. However, some „modified‟ tools have retained the precise core construct measured by the initial scale, or have kept the original wording of items; these instruments were also omitted from the current review. For instance, Neville and Furlong

65 (1994, p. 373) presented “modified” scales, but the only modification was replication of the items with alternate ethnicities to the original scales. Similarly, Weigel and Howes (1985, p. 127) presented a “hybrid” scale of two previously published measures. Due to a lack of innovation and originality such instruments were also excluded from analysis to avoid analysing equivalent measures multiple times. Purpose-specific tools were also discounted from the current review. These included surveys designed to assess attitudes towards native English speaking Thais compared to nonnative English speaking Thais in a teaching environment (Todd & Pojanapunya, 2009); orientation towards learning in college students (Pascarella, Edison, Nora, Hagedorn, & Terenzini, 1996); attitudes towards desegregation (Greeley & Sheatsley, 1971; Hyman & Sheatsley, 1956); promoting/firing a White or Black employee (Schaefer, 1975); and Australian asylum seekers (Pedersen, Attwell, & Heveli, 2005). Although these instruments may capture information about individual racist attitudes, they were considered too specific for inclusion. Several publications also examined perceived racial threats (i.e., economic or social threat by another racial group) (e.g., Kinder & Sears, 1981; Stephan, Diaz-Loving, & Duran, 2000) or intergroup anxiety (e.g., Stephan & Stephan, 1992), which would be expected to be strongly related to explicit racist attitudes. However, these were also inappropriate for inclusion because they could only be utilised in the precise situations for which they were designed. In determining inclusion, therefore, only instruments that attempted to assess a consistent and broad attitude of racism were included. Tools requiring the administration of a semi-structured interview or of a video scenario (e.g., Brabeck et al., 2000; Sirin, Brabeck, Satiani, & Rogers-Serin, 2003) were also discarded, as these are more complex to administer and are not comparable to simple questionnaires. For instance, the use of props to facilitate evaluation of racist attitudes, often

66 utilised in studies with children, has been criticised as assessing in-group preference rather than prejudice or racism (Nesdale, 2007). Measures of implicit racism, as used in Greenwald (1998), have been argued to measure a distinct automatic component of racist attitudes and therefore may not accurately quantify more consciously controlled explicit racist attitudes (Devine, 1989). There is abundant research on the implicit measurement of racist attitudes and perceived racism but the reliability and validity of these measures is questionable (Cunningham, Preacher, & Banaji, 2001; Hewstone, et al., 2002). Further, the study of the explicit measurement of racist attitudes is a fast-growing body of research in its own right and was therefore the sole focus of the present review. Results Data regarding study characteristics were collated and basic descriptive information was calculated using Microsoft Excel 2010. Basic descriptive information comprising the characteristics of each study is displayed in Tables 1-3 below. Additional assessment of the adequacy of the presented measures is displayed in Table 4 following. A formal metaanalysis was not conducted due to the heterogeneity of reviewed studies. Publication bias (i.e., publication of only reliable or valid measures) was not formally assessed in this review. Nevertheless, it appears unlikely that bias was a significant issue due to the overall poor psychometric qualities of published measures. Characteristics of Studies Documenting Explicit Measures of Racist Attitudes The characteristics of studies documenting explicit measures of racist attitudes are discussed in the following section (see Table 1 below). Year of publication. Studies of implicit and perceived racism have grown considerably in recent years (for reviews see Bastos, 2011; Harrell, et al., 2003; Paradies, 2006b; Pascoe & Richman, 2009; Quillian, et al., 2006). In contrast, almost half of the

67 measures of explicit racism were developed more than 30 years ago, with only approximately 5% of all measures published in the past four years. The number of published studies presenting a new measure of racist attitudes did not rise incrementally from 1985, despite increasing and widespread community interest in racism since this time. If the scientific community had managed to develop a perfect (or near perfect) instrument for measuring racist attitudes, the lack of accelerated growth in this area might be justified. However, as discussed below, no reviewed instruments met the minimum criteria for each subtype of reliability and validity, or enables universal assessment of racist attitudes across racial group, age, and gender. Study design. Explicit measures of racism have been utilised within various study designs. Experimental designs, the „gold standard‟ in scientific research and the only design able to establish a cause and effect relationship, only accounted for 9% of published research on new measures of explicit racist attitudes; a further 17% utilised quasi-experimental designs. Cross-sectional designs, least useful to attribute cause and effect relationships, but important for assessment of concurrent validity, were most common (83%). No development study utilised a longitudinal design. Some investigations relied upon both cross-sectional and experimental designs in alternate sections (e.g., Grossarth-Maticek, Eysenck, & Vetter, 1989; Ho & Jackson, 2001; Katz & Hass, 1988; McConahay, Hardee, & Batts, 1981), so ensuring development by the strongest study design, simultaneously enabling the attribution of cause and effect and assessment of concurrent validity. Where a research article presented results of two or more studies of different design, the instrument was categorised as each of the applicable study designs. Sampling procedure. The ideal sampling procedure in scientific research is a randomly selected representative sample of the population being studied. However, it is often impossible to utilise a nationally representative sample for the cost of implementing such

68 sampling procedures or inability to access specific populations (e.g., prisoners). Convenience sampling was predominantly used (68%), followed by representative sampling (36%). Where multiple studies utilising distinct sampling procedures were presented in a single report (e.g., Griffiths & Pedersen, 2009; Henry & Sears, 2002), the measure was categorised as utilising each of the applicable sampling procedures. In some research a probability sampling method was only utilised for a single city or state (e.g., Seltzer, Frazier, & Ricks, 1995), and in such cases, the study was considered to be convenience sampled. Sample size. Sample size is important for research which involves the utilisation of advanced analyses such as factor / principal components analysis, common analyses in measure development and critical to the confirmation of the factorial validity of novel scales. Total sample size was recorded for this variable. For example, Griffiths and Pedersen (2009) conducted three studies with different samples and so the individual sample sizes (15, 210, and 223) were summed to give the aggregate (458). The most common sample size for research presenting an explicit measure of racist attitudes was between 200 and 999 (42%). Few studies utilised total samples of less than 100 participants (14%), but many did not meet the minimum required sample sizes for factor analysis (i.e., 5-10 participants per item or 100200 participants for a 20 item scale; Tinsley & Tinsley, 1987). In addition, the larger the sample the more statistically powerful the research, due to the reduction of error in the statistical analysis (DeVellis, 2012). Region of study. As already noted research into implicit and perceived racism has traditionally and predominantly been conducted in the US. Most research included in the present review of explicit racist attitudes was undertaken in North America (66%). This contrasts with scales measuring perceived racism (96%; Bastos, 2011). Outside of the US, most published research was conducted in Australia (17%), Europe (14%), and Africa (3%). None published within Asia or South America met the criteria for inclusion. Consequently,

69 the literature on which we base our understandings of racism and explicit racist attitudes is mostly US based and not necessarily generalisable to other cultures. Moreover, the majority of the world‟s population has not had a measure designed and validated for use in their specific context. Instrument administration. In circumstances in which the method of instrument administration was not explicitly stated (e.g., Brigham, 1977, 1993; Byrnes & Kiger, 1988; Gaertner & Dovidio, 1977; Woodmansee & Cook, 1967), all relevant information was assessed and attempts were made to identify which measures were self-report and which were interviews or similarly experimenter-administered tools. As with research into perceived racism (Bastos, 2011; Paradies, 2006b), the dominant instrument administration method to measure explicit racist attitudes was self-administration (72%). This may have enabled greater reporting of explicit racist attitudes, given that they may be underreported to an experimenter in person or via telephone due to the well-established effects of proximity on conformity to social norms (Milgram, 1963). Self-reporting may also reassure participants of anonymity and so encourage reports of negative attitudes, even if respondents feel uncomfortable holding such attitudes or beliefs. Moreover, the use of self-administered tools allows the researcher to reach a much wider audience, which might not be available in experimenter-administered circumstances. Length of final instrument. When actual instruments were not included in the research report and the exact number of items was unclear (e.g., Hagendoorn & Sniderman, 2001), I attempted to contact the developer to request a copy of the instrument. If this was not forthcoming, I classified the length of the final instrument in the most appropriate category based on information contained in the report. Only one measure could not be so classified (i.e., Frenkel-Brunswik, 1948). For instruments that included social desirability or other non-

70 racism items (e.g., White & Abu-Raya, 2012), the total scale length only included those items relevant to the construct of racism. The ideal length of a scale depends on balancing how many items participants are willing to answer and the reliability of the overall scale. The majority of studies of measures of perceived racism contain a final instrument with nine items or less (Paradies, 2006b). Conversely, studies of measures of explicit racist attitudes most commonly consisted of 1019 items (42%). As the number of items increases, so does the reliability of the scale (DeVellis, 2012). It is therefore integral to ensure the reliability of the instruments are at an adequate level, while allowing the researcher to collect information quickly with minimal effort required from individual participants to complete the measure. As such a definitive number of items may vary between participant groups and between scales; the ideal length of an instrument cannot be broadly prescribed. However, scales of less than ten items are arguably likely to have poor reliability; conversely scales of more than 20 items may suffer from participant withdrawal or incompletion. While the latter is less probable if only one scale is administered, most studies included a battery of instruments of which the racism measure was one.

71 Table 1 Characteristics of Studies Documenting Explicit Measures of Racist Attitudes No. of studies

% of total studies

Year of publication < 1985 49 43.36 1985-1989 8 7.08 1990-1994 11 9.73 1995-1999 14 12.39 2000-2004 13 11.50 2005-2009 12 10.62 2010-2013 6 5.31 Study design Cross-sectional 94 83.19 Longitudinal Quasi-experimental 19 16.81 Experimental 10 8.93 Sampling procedure Representative/probability 41 36.28 Convenience 77 68.14 Sample size n < 100 16 14.16 100 ≤ n < 200 18 15.93 200 ≤ n < 1000 47 41.59 n ≥ 1000 32 28.32 Region of study North America 75 66.37 Australia 19 16.81 Europe 16 14.16 Africa 3 2.65 a Other Instrument administration Self-administered 81 71.68 Experimenter administered 32 28.32 b Length of final instrument < 10 items 36 31.86 ≥ 10 items < 20 items 48 42.48 ≥ 20 items 28 24.78 a b Note. No published measures were developed in alternative regions. Length of one instrument was unable to be determined (Frenkel-Brunswik, 1948).

72 The target groups and populations within studies documenting explicit measures of racist attitudes are discussed in the following section (see Table 2 below). Instrument target population. A particular racial/ethnic group was considered to be the instrument target population if it was the sole group for which the measure was designed. If more than one racial/ethnic group was targeted, no specific group was targeted, or the same items were utilised for alternate target groups (e.g., Islam & Jahjah, 2001; White & AbuRaya, 2012), the target group was considered to be mixed. As noted above, racism research has historically focussed on the interactions between African and White Americans, and measures of racist attitudes have similarly predominantly concentrated on the attitudes of White towards African Americans. To date, almost half (39%) of all developed instruments have focused purely on attitudes toward individuals of African descent, usually African Americans. This is consistent with early understandings of racism as a unidirectional product of White toward Black interactions (Hyman & Sheatsley, 1956). As a result, a significant proportion of explicit racism measures are designed and validated for a small percentage of one country‟s population, and so of doubtful generalisability to alternate groups, cultures, or societies. More recently, racism has been acknowledged as multifaceted and not bound by the race of the perpetrator or the victim (Ponterotto & Pedersen, 1993). Therefore, a need exists for instruments that can be utilised across racial, ethnic, cultural, and religious groups; indeed although almost one-third of current measures are designed to be used with multiple groups, this demands additional attention. Study target population. Racial/ethnic group. A particular racial/ethnic group was considered to be the target if a specific group was recruited for the study. If more than one racial/ethnic group was recruited, the target population was considered to be mixed. Similarly, if no target was

73 specified the sample was considered to be mixed, unless a target population could be inferred from the instrument or other research characteristics (e.g., Warner & DeFleur, 1969). As expected, the most abundant specific target population was identified as White (43%), limiting the generalisability of many racism research tools because they have been validated only for White Americans. However, a similar number of measures were tested using a target population of mixed racial/ethnic background (50%). As discussed above, it is important that explicit racist attitudes are examined by viewing any group or person as potentially either perpetrator or victim of racism for the results to be generalised across countries and cultures. Target sample. Explorations of racism have principally relied upon samples of undergraduate university students due to the high accessibility of this population. Consequently, the majority of participants involved in racism research are within a restricted age bracket and results may not be generalisable to the wider population. This also applied to research into measures of explicit racist attitudes: 50% of studies relied upon university students for their sample in entirety, although many utilised community samples (36%). A minimal number (7%) targeted both community and university students (e.g., Imhoff & Recker, 2012; Neville, et al., 2000). Age. In circumstances in which the age of the participants was not reported, I attempted to garner the appropriate age bracket from other relevant information. For example, if a study stated that participants were university students, but did not report descriptive statistics for age (e.g., Brigham, 1993; Byrnes & Kiger, 1988; Gaertner & Dovidio, 1977; Katz & Hass, 1988; Lopez, Gurin, & Nagda, 1998), the participants were assumed to be adults aged 18-60 years, despite possible outliers. This allowed classification of the sample in the most appropriate and accurate manner available, rather than excluding any research which did not report descriptive statistics for age.

74 Likewise, in studies in which the general adult population was assumed to be tested but was not explicitly stated (e.g., McClain et al., 2006; Pedersen, et al., 2004), the sample was considered to span include all adults 18 years or older. Moreover, in studies that crossed two or more age brackets, such as children and adults (e.g., Parham & Helms, 1981; Ponterotto et al., 1995), all applicable age groups were considered to be the target sample. The most studied population was comprehensively those aged 18-59 years (96%); consequently abundant data exists for adults regarding explicit measures of racist attitudes. Conversely, minimal measures exist for children (3%) or adolescents (15%). Future research therefore needs to concentrate on developing instruments for these largely ignored age groups to ensure that we are able to obtain a comprehensive understanding of racism across the lifespan. Gender. Where gender was not reported in the study descriptive statistics (Brigham, 1993; Steckler, 1957; Wittenbrink, et al., 1997), the gender of the participants was assumed to be mixed and of approximately equal proportion, although this may be inaccurate in earlier studies when women were less likely to attend higher education institutions. Similarly, in research such as McConahay et al. (1981), where the multiple reported experiments involved a single gender only and another did not specify gender, the gender of participants was assumed to be male. Research of perceived racism has primarily been of mixed-gender samples (Paradies, 2006b). Similarly, research presenting measures of explicit racist attitudes focus on mixedgender samples (93%). This is the only appraised characteristic that does not need to be significantly improved to enhance the generalisability of explicit racism research. Socio-economic status and education. Socio-economic status (SES) and education of the target sample was not evaluated due to the lack of presentation of relevant descriptive statistics in the vast majority of research. When this information was presented, it was

75 generally of poor quality and precluded detailed analysis. This area should be investigated further, to ensure that the measures developed are valid across all levels of SES and education and to evaluate whether the instruments differentiate between distinct stratum of SES and educational attainment.

76 Table 2 Target Groups and Populations of Studies Documenting Explicit Measures of Racist Attitudes No. of studies

% of total studies

Instrument target group African 44 38.94 Asian 6 5.31 Indigenous 6 5.31 Jewish 6 5.31 White 5 4.42 Arabic 4 3.54 Muslim 4 3.54 a Other 4 3.54 Mixed 34 30.09 Study target population Racial/ethnic group White 49 43.36 African 6 5.31 b Other 2 1.77 Mixed 56 49.56 Target sample University 57 50.44 Community 41 36.28 Schools 5 4.42 c Other 2 1.77 Mixed 8 7.08 Age < 12 years 3 2.65 ≥ 12 years; < 18 years 17 15.04 ≥ 18 years; < 60 years 108 95.58 ≥ 60 years 43 38.05 Gender Male only 5 4.42 Female only 3 2.65 Mixed 105 92.92 a Note. Alternate target groups examined include Latinos (Case, 2007), Moroccans (Siegman, 1961), and Americans (Perlmutter, 1954). bNo published measures were developed with alternate racial/ethnic target populations. cAlternative target sample populations include seamen (Brophy, 1945) and women from housing projects (Wilner, 1952).

77 The scale psychometrics assessed in each study documenting an explicit measure of racist attitudes is discussed in the following section (see Table 3 below). Assessment of factor/component structure. As discussed above, the assessment of the factor structure of any newly developed instrument is important to ensure that it conforms to the theoretical assumptions upon which the development of the measure has been based (DeVellis, 2012). Almost half of the studies reviewed (49%) conducted factor/principal components analysis on their data. Future research should ensure that the assessment of factor/component structure is explored, due to its importance in determining whether the items conform to the theoretical position of the measure developer. Assessment of reliability. Reliability in scale development implies that the instrument under evaluation performs consistently and predictably (DeVellis, 2012). Scales can be considered to be reliable if the items within the scale are reliable and they measure a common concept (DeVellis, 2012). Several forms of reliability may be assessed in different circumstances, and each provides different evidence for the accuracy and consistency of the assessed instrument. One major advantage of designing, developing, and utilising reliable instruments is that they are more statistical powerful than less reliable instruments. Similar to increasing sample size, this is because the more reliable the scale, the less error they contribute to the statistical analysis (DeVellis, 2012). Thus, results from research that utilises reliable measures are more able to detect smaller effects than research relying upon unreliable instruments. This point is especially salient when examining concepts such as attitudes, which often do not, and are not expected to, experience a large change throughout an assessment period. Internal consistency. Internal consistency was considered to be assessed if either the internal consistency of the entire scale was evaluated, or the internal consistency of each factor/component of the instrument was assessed individually. Internal consistency reliability

78 is one of the most common forms of reliability assessed in scale development, due to traditional measurement models which aimed to measure a single concept (DeVellis, 2012). Most reviewed research assessed the internal consistency of the presented measure (69%). Although this result is promising, all scales should have evaluated their internal consistency prior to dissemination. Future research needs to ensure that internal consistency is assessed for all measures of explicit racist attitudes. Parallel forms. If two equivalent parallel, or alternate, forms of a scale exist, the correlation between the two forms can be examined after the completion of both by the same individuals, whether participants or raters (DeVellis, 2012). Assessment of this correlation is referred to as parallel forms reliability. In the majority of scale development studies, however, a parallel form of the scale was not available and parallel forms reliability could not be evaluated in the standard way. However, split-half reliability of a single scale can be evaluated as a pseudo-alternative. Examination of split-half reliability consists of dividing a set of items from a scale into two sub-sets and assessing the correlation between these two sub-sets (DeVellis, 2012). Although several reviewed studies provided split-half estimates (e.g., Frenkel-Brunswik, 1948; Gough, 1950; Neville, et al., 2000), for the purpose of the current review these were considered to be evidence of internal consistency rather than alternate forms reliability. Despite the importance of designing, developing, and validating a parallel form of an instrument, this process is rarely completed due to time and cost limitations. In racism research there are consequently no published and evaluated parallel forms of an explicit measure of racist attitudes. Although Ash (1954) produced two parallel forms, only one form of the scale underwent validation and similarly Stephan, Ybarra, Martinez, Schwarzwald, and Tur-Kaspa (1998) produced a single form validated across multiple target groups, but without assessment of parallel forms reliability. If an instrument of racist attitudes is to become

79 globally widespread in the future, it is important to develop and assess a parallel form to ensure that individuals do not experience practice effects. Inter-rater. In some instances scales are administered and scored by the experimenter as opposed to self-report scales, which are completed by the participant. When an experimenter scores a scale it is important to ensure that different raters score the same items in an equivalent way when assessing each participant. This ensures that the scores being achieved on a scale reflect the characteristics of the observations and not the characteristics of the raters (DeVellis, 2012). Various studies (28%) used experimenter administered questionnaires or interviews but none assessed inter-rater reliability. The results produced using these measures therefore cannot be conclusively verified as occurring due to the attitudes of participants or the raters. Test-retest. Test-retest reliability examines the constancy of scores over time (DeVellis, 2012). The chronological stability of scales is important when the instrument is utilised to compare the results of a group of individuals over two or more time points, including to measure actual change in the level of the construct being assessed. Few published articles assessed the test-retest reliability of the presented measure (12%); hence most measures could not be utilised to assess a change in attitudes without further evaluation. Almost three-quarters of research assessed the reliability of the instruments in one way or another, but usually only the internal consistency of the instrument. Assessment of internal consistency of a measure only assesses how well each item relates to each other item within the scale. For explicit measures of racist attitudes to be useful in the real world (i.e., to evaluate and enhance anti-racism programs), test-retest reliability must also be assessed and shown to be adequate. Test-retest reliability is extremely important when assessing changes in attitudes, as unstable measures may indicate attitude change when there is no actual change in attitudes. Thus, the majority of explicit racist attitude instruments are unable to definitively

80 evaluate change in racist attitudes over time. Supplementary tests of reliability, such as parallel forms and inter-rater reliability, were not assessed for many instruments and this should also be addressed in future research. Assessment of validity. The underlying validity of a newly developed scale is exceedingly important, as this indicates how well the instrument measures what it purports to measure, including whether the variable under study is the underlying cause of the variation between items (DeVellis, 2012). Scale validity is inferred in three main ways: (1) examination of the non-statistical development and selection of items, (2) examination of the scale‟s ability to predict definite events, and (3) examination of the scale‟s relationship to other constructs. The majority of reviewed research assessed at least one type of validity (96%). However, few studies assessed a range of types of validity for the measures presented. Content. Content validity refers to the adequacy of the items contained within the scale in reflecting the conceptual construct being measured (DeVellis, 2012). In a scale assessing a finite number of potential items, content validity is achieved by randomly choosing items from the pool of total items. However, this characteristic is not easily assessed in scales designed to measure complex concepts such as attitudes. One common way to evaluate the content validity of such constructs is to have the scale reviewed by an expert (or experts) in the field of study (DeVellis, 2012). Alternatively, statements may be produced via initial qualitative research and subsequently selected by the research team (e.g., Orpen, 1971). This dual-evaluation allows multiple viewpoints to examine the draft measure and leads to the (1) deletion of items which are least relevant, (2) addition of items relating to parts of the construct that may have been overlooked initially, and (3) revision of items that are unclear. Content validity is assessed to some extent in all scale development studies, as it is illogical to base the items contained in an instrument on an unrelated construct to the one

81 purported to be measured. Nonetheless, the direct and targeted evaluation of content validity was only explicitly addressed in a few studies (19%). Most of these studies expressly assessed the content validity of the instrument‟s items utilising the opinions of two or more researchers outside of the core research group. The number of studies directly evaluating content validity (and noting this in publications) was one of the least assessed of any category of validity. A significant increase in studies explicitly evaluating content validity is required to ensure that the instruments being developed are sampling the appropriate content area. Criterion-related. Criterion-related validity requires the examination of the empirical relationship between the scale being assessed and a criterion (DeVellis, 2012). The two major categories of criterion-related validity are concurrent validity, which includes discriminant validity and convergent validity, and predictive validity. Criterion-related validity was considered to be evaluated in studies that assessed any one type of criterion-related validity, as explained below. Predictive. Predictive validity is the extent to which a test score is able to predict a score on a criterion measure assessed in the future (Salkind, 2006). Examination of predictive validity was conducted in only a handful of studies (19%). The restrictions on the assessment of predictive validity are often difficult to overcome in the real world of scientific research, where budgets are limited and sufficient numbers of participants are challenging to recruit. Unlike other types of validity that can be assessed at one time point, the assessment of predictive validity requires participants to be evaluated at multiple time points. Thus the minimum requirement is to allow for at least two periods of testing with participants willing to be assessed at least twice. The predictive validity of the explicit measures of racist attitudes was limited by nature of the fact that most of the reviewed studies employed pure crosssectional designs.

82 Concurrent. Concurrent validity examines how well the measure under examination is related to a criterion that is measured at the same time. Concurrent validity was considered to be evaluated in studies in which any one type of concurrent validity (i.e., convergent or discriminant) was assessed, as explained below. Convergent. In the case of convergent validity the strength of the relationship between the instrument being developed and another theoretically related concept is examined, with the expectation that they will be significantly positively correlated (DeVellis, 2012). Convergent validity can also be assessed by evaluating two theoretically similar groups, with the expectation of similar results among the two groups (i.e., that the results of the two groups on the construct of interest will converge). Convergent validity was assessed in most studies (74%). However, one major problem with the examination of convergent validity in the reviewed research was the potential confounds of the examined relationships. For instance, Brigham (1993), when developing the items for both the Blacks‟ Attitude Towards Whites scale (ATW) and the Whites‟ Attitude Towards Blacks scale (ATB), drew upon several previously created scales of racism (e.g., Multifactor Racial Attitude Inventory (MRAI): Brigham, Woodmansee, & Cook, 1976; Symbolic Racism Scale (SRS): Kinder & Sears, 1981; Modern Racism Scale (MRS): McConahay, et al., 1981). Initially, this approach seems logical and valid. However, when the assessment of validity is evaluated by inspecting correlations between the newly developed measures (ATW and ATB) and the older scales (MRAI; MRS; SRS), major confounds potentially increase the correlations between the novel and established measures. Subsequently, the apparent validity of the developed measure is inflated and may be considered to be adequate when it merely reflects the relationship between similar and sometimes identical items (e.g., “Over the past few years, blacks have gotten more

83 economically than they deserve” from ATB scale cf. “Over the past few years, blacks have gotten more than they deserve” from SRS). Despite this concern, it is important for measures under development to have their convergent validity assessed. It may be more beneficial to examine the relationships between related constructs, such as far right wing political orientation (e.g., Griffiths & Pedersen, 2009), rather than relying on older measures of an identical construct. This is especially important when purportedly redundant measures have been utilised in the development stages of a novel instrument. Discriminant. Discriminant validity assesses the existence of a relationship between the measure under examination and a theoretically unrelated concept, with the expectation that the two constructs will be either non-significantly correlated or significantly negatively correlated (DeVellis, 2012). Discriminant validity can also be assessed by evaluating two theoretically dissimilar groups, with the expectation of divergent results among the two groups (i.e., that the construct of interest will discriminate between the results of the two groups). Discriminant validity was assessed in more studies than any other type of validity (75%). Like convergent validity, the examination of the relationship between measures of different, and purportedly unrelated, concepts provides essential information about the validity of a newly developed instrument. Distinct to convergent validity, there is less chance of confounds inflating correlations.

84 Table 3 Psychometrics Assessed in Studies Documenting Explicit Measures of Racist Attitudes No. of studies

% of total studies

Yes

55

48.67

No

58

51.33

Internal consistency

78

69.03

Test-retest

13

11.50

Parallel forms

-

-

Inter-rater

-

-

28

24.78

Content

21

18.58

Criterion-related (overall)

88

77.88

Predictive

22

19.47

Concurrent (overall)

105

92.92

Convergent

84

74.34

Discriminant

85

75.22

None

4

3.54

Assessment of factor/component structure

Assessment of reliability

None Assessment of validity

Note. Reliability and validity categories are not mutually exclusive.

85 Results of Studies Documenting Explicit Measures of Racist Attitudes The following section presents an evaluation of the statistical adequacy of instruments based on the assessment of the subtypes of reliability and validity presented in the published studies (see Table 4 below). To ensure that the data analyses offered below are objective, I have relied upon recommended cut-offs and definitions regularly utilised throughout the scientific research methods and scale development literature (e.g., Cohen, 1988; DeVellis, 2012; Nunnally, 1978; Salkind, 2006). Factor/component structure. There is often a large variance in the interpretation of factor/principal components analyses and a lack of firm guidelines for assessing the factorial structure of any instrument. Due to the inherent subjectivity in evaluating the factor/component structure of newly developed scales, the structure of an instrument was considered to be adequate if it had been assessed and if the authors had declared the final instrument to be appropriate based on pre-determined criteria. Conversely, if the authors had stated that the structure was inappropriate and they had subsequently failed to remove, add, or re-write, specific items to improve the measure, it was deemed to be inadequate. If no factor/principal components analysis had been conducted, the structural adequacy of the scale was considered unknown. Of instruments that had their factor structure evaluated (49%), few failed to justify the final structure adequately (4% of reviewed studies), and 42% of studies presented final measures with appropriate factor structure. However, as the examination of factor/component structure and subsequent improvement of instruments is necessary to ensure that a newly developed scale is appropriate, this needs to become more frequently assessed in future research. Instrument reliability.

86 Internal consistency. Internal consistency was considered to be adequate if the internal consistency of the entire scale was evaluated and above the minimum accepted criteria outlined below. It was also judged to be adequate if each factor/component of the instrument was assessed individually and the average of each was above the minimum accepted criteria (e.g., Wittenbrink, et al., 1997). Internal consistency is conventionally assessed by examination of Cronbach‟s (1951) coefficient alpha (α). Cronbach‟s alpha is equivalent to the average of all possible split-half reliabilities; therefore, split-half reliability was also considered to demonstrate internal consistency. The generally accepted level of α for use in scale development is .70 (DeVellis, 2012; Nunnally, 1978). DeVellis (2012) additionally suggests that a level of .80-.90 is very good; above .90 is too high and shortening the scale should be considered; and above .85 is adequate for group comparison. Most (63%) of reviewed studies reached the widely accepted minimum criteria of internal consistency reliability and few failed to satisfy minimum criteria (7%). As with the examination of factor/component structure, a large preponderance of the studies which assessed internal consistency presented final measures with adequate reliability. Nonetheless, as the examination of internal consistency and subsequent improvement of instruments is also intrinsically necessary to ensure that a newly developed scale is appropriate, this needs to continue to be widely assessed in future work. Test-retest. Test-retest reliability was considered to be adequate if it met the minimally accepted level of .70 (Loewenthal, 2001; Ponterotto, 1996; Salkind, 2006). The main purpose of the development of measures of explicit racist attitudes is to allow for the assessment of changes in such attitudes. For instance, tools may be implemented at pre-antiracism initiative and post-anti-racism initiative, to evaluate whether the program has been effective in reducing racist attitudes and therefore should be more widely disseminated.

87 Attitude change is understood as a complex interaction of many variables and often takes significant time and effort (Forgas, et al., 2010). Moreover, the more entrenched the attitude, the more difficult it is to change. Consequently, instruments that measure often strongly held concepts such as the racist attitudes of an individual need to be demonstrably stable over time. For those instruments that were appropriately assessed, the evaluation of test-retest reliability generally indicated that the explicit measures of racist attitudes were relatively stable. However, almost 90% of measures presented were not evaluated for temporal stability. Parallel forms. No reviewed publication developed, presented, and evaluated an alternate form of a measure of explicit racist attitudes. Therefore, the assessment of parallel forms reliability was not examined. This lack of evaluation is inadequate, as discussed above, and should be addressed in future investigations. Inter-rater. Although several studies utilised experimenter assessment methods for evaluating explicit racist attitudes, no study examined the inter-rater reliability of the presented measures. Therefore, the assessment of inter-rater reliability was not examined in this analysis. Again, this lack of evaluation is inadequate, as discussed above, and should be improved in forthcoming inquiries. Instrument validity. For an instrument to be considered to be generally valid, it must have presented an adequate assessment of at least two types of reliability (i.e., two of internal consistency: test-retest; parallel forms; inter-rater) and two sub-types of validity (i.e., content; predictive; convergent; discriminant). Similarly, an instrument was considered generally invalid if it failed to meet minimum criteria for two types of reliability and validity. Due to the unknown nature of the underlying latent factor structure of racism and the varying conceptualisations of racist attitudes in these measures, acceptable assessment of factor structure was not considered to be a core criterion for the measure to be valid in this review.

88 Only five measures reviewed (4%) achieved this goal: Neville et al. (2000) (CoBRAS: 20item measure of colour-blind racial attitudes developed with US university and community participants); Imhoff and Banse (2009) (29-item measure of explicit anti-Semitism developed with German university participants); Wang et al. (2003) (SEE: 31-item measure of ethnocultural empathy developed with US university participants); and two measures published by Siegman (1961) (NP and AP scales: respectively a 19-item measure of prejudice against Moroccan immigrants and 17-item measure of prejudice against local Arab population developed with Israeli university participants). Content. Examination of content validity is difficult to categorise as adequate or inadequate, due to the individual differences in expectations held by distinct experimenters. Therefore, if the validity of the items was expressly assessed and considered appropriate by more than one researcher, the content validity of the instrument was judged to be adequate. Appraisal of content validity is important to ensure that all items are sampling the construct of interest. Moreover, each newly developed measure must have items evaluated and if the content of the construct does not appear to be adequately assessed by the instrument, items must be removed, reworded, or added as appropriate. Due to the need to evaluate content validity in the initial stages of development for any tool, content validity was probably assessed in all reviewed studies to some extent, but remained unreported. Nevertheless, the procedure for the assessment of content validity needs to be expressly stated in research reports, so other investigators are able to evaluate whether the instrument content has been evaluated sufficiently rigorously prior to dissemination. Criterion-related. Overall assessment of criterion-related validity was categorised as adequate if two of the three types of criterion-related validity were assessed and considered to be satisfactory. Most studies (57%) demonstrated satisfactory criterion-related validity. Evidence for criterion-related validity shows that the instrument under scrutiny measures the

89 underlying construct purported to be measured. In almost half of the reviewed surveys, there was evidence presented that indicated the instrument either did not adequately measure racist attitudes or it was unknown whether it does. The assessment of at least two subtypes (and preferably all three subtypes) of criterion-related validity would allow much greater confidence in determining whether the instrument actually measures what it is purported to measure. It is integral for future racism research to address this concern. Predictive. Assessment of the predictive validity of the measures was categorised as adequate dependent on the expectation of the relationship between the measure and the concept it was predicting. For those constructs that were expected to be positively related to the instrument, the same principles for determining adequacy were used as detailed below for convergent validity. Conversely, for those constructs that were expected to be non- or negatively related to the instrument, the same principles for determining adequacy were used as detailed below for discriminant validity. Predictive validity was only assessed in 19% of the studies, but 75% of these were able to demonstrate the predictive value of their instrument. Although attitudes do not necessarily predict behaviour precisely, there is a definite link between the two (Forgas, et al., 2010). The ability of a measure of racist attitudes to predict future events is therefore vital. One-third of instruments that had their predictive validity assessed failed to meet minimum criteria. There is thus doubt that instruments in which predictive validity was not directly assessed are capable of accurately predicting future outcomes. Concurrent. Similarly, the overall assessment of concurrent validity was categorised as adequate if one of the two categories of concurrent validity were assessed and considered to be satisfactory. Most studies (85%) demonstrated adequate concurrent validity. This establishes some solid evidence that the measures reviewed are, as purported, actually measuring racist attitudes. However, future research needs to ensure both types of concurrent

90 validity are assessed to strengthen the substantiation that the tool is tapping into an individual‟s racist attitudes. Convergent. More specifically, the assessment of the convergent validity of the measures was categorised as adequate in two circumstances. First, if the measure was compared to another measure of the same, or highly related concept, the expectation was that the measures would have a minimum significant positive correlation of .50. Secondly, if the measure was compared to a measure of a different, but moderately related concept, the expectation was that the measures would have a minimum significant positive correlation of .30. However, these were not firm guidelines as subjective judgement was required about the importance and the probability of examined relationships, and further consideration about the prominence of each examined relationship, if multiple conflicting correlations were presented, was necessary. The same applied for group comparisons. There are no specific guidelines available for these results to be assessed, but the magnitude of these correlations was based on Cohen (1988)‟s classifications of correlations as „large‟ if above .50 and „medium‟ if above .30. The results for convergent validity were perhaps the most positive validity results examined. Seventy per cent of studies demonstrated adequate convergent validity. This provides good evidence that the evaluated instruments are measuring what they purport to measure. However, as stated above, some of these correlations may have been artificially inflated due to the recycling of similar items across instruments. Despite this, the evaluation of convergent validity should continue along current lines, with one improvement being the use of tools measuring similar, but distinct, constructs to evaluate convergent validity. Discriminant. Discriminant validity of the measures was categorised as adequate if the tool was compared to another measure of a distinct concept and the resulting correlation was either non-significant, or significant and negative. The same applied for group

91 comparisons. No definitive guidelines were available for the interpretation of discriminant validity, and I therefore set the aforementioned magnitude criteria to enable the assessment of discriminant validity objectively. A large proportion of the studies that evaluated the discriminant validity of the survey in question showed adequate validity (67%). The assessment of discriminant validity can be difficult to classify. For example, which constructs are expected to be negatively related to the construct under study and which are assumed to have no relationship, alternatively, which groups can be identified as distinct enough on the variable of interest to display discriminant scores on the instrument. Regardless, as discriminant validity provides strong evidence for the validity of the instrument under scrutiny, it is important that it is assessed and assessed well. Region. A tool was considered to be valid across region if at least one examination of the measure was conducted in more than one region. Moreover, the evaluations of the measure in two or more regions must have shown the instrument to be valid. As no studies reviewed here were conducted in more than one region and shown to be valid, no current measure of explicit racist attitudes can be categorised as valid across regions. A direction for future research is to validate such measures using multi-country studies to ensure their generalisability regionally and globally. Racial/ethnic group. Validity across racial/ethnic group was considered to be adequate if the target sample comprised more than three distinct racial/ethnic groups and the predominant group consisted of less than 85% of the total sample. Alternatively, if different sub-samples consisting of a single racial/ethnic group were examined, the instrument was considered to be valid across racial/ethnic group. Additionally, the measure must have been deemed valid in each examination. Only three measures (3%) were demonstrably valid across

92 racial/ethnic group. As it is now accepted that measures applicable to several racial/ethnic groups are required, this result is disappointing and needs to be addressed. Age. A measure was deemed valid over the lifespan if the total sample (or subsamples) bridged children (< 12 years), adolescents (12 years < 18 years), adults (18 years < 60 years), and late adults (> 60 years). Few studies provided age-related descriptive statistics, therefore a sample spanning each of these age groups was considered to be valid in each of these age groups, regardless of the proportion of the sample taken from each age bracket. For the measure to be considered to be valid, the assessment of validity must have been judged to be appropriate. Again, future research needs to address whether the measures being developed are flexible in administration and use. Although no measure was tested across every age-group, Ponterotto et al. (1995), Quillian (1995), Pettigrew and Mertens (1995), in research respectively with US, European, and European adolescents, adults, and late adults were able to cover all but the youngest age group. Gender. Measures were considered to be valid across gender if the sample consisted of at least 30% of the minority gender, or if it was assessed in two single but different gender samples. Once more, the instrument must have been judged to be valid as described above in each of these samples. Of the five studies that achieved the general validity criterion, four (4%) were assessed as valid across gender. As most instruments were implemented across genders, this statistic would increase if the overall validity of the instruments became enhanced.

93 Table 4 Psychometric Adequacy of Studies Documenting Explicit Measures of Racist Attitudes Adequate Factor structure

% of total studies

Inadeq.

% of total studies

Unknown

% of total studies

48

42.48

5

4.42

60

53.10

Internal consistency

71

62.83

8

7.08

34

30.09

Test-retest

12

10.62

1

0.88

100

88.50

Parallel forms

-

-

-

-

113

100.00

Inter-rater

-

-

-

-

113

100.00

Content

21

18.58

-

-

92

81.42

Criterion-relateda

64

56.64

10

8.85

39

34.51

Predictive

15

13.27

6

5.31

92

81.42

Concurrentb

96

84.96

9

7.96

8

7.08

Convergent

79

69.91

4

3.54

30

26.55

Discriminant

76

67.26

11

9.73

26

23.01

Region

-

-

-

-

113

100.00

Racial/ethnic group

3

2.65

-

-

110

97.35

Age

-

-

-

-

113

100.00

Gender

4

3.54

-

-

109

96.46

Instrument reliability

Instrument validity

Validity across groups

Note. Inadeq. denotes inadequate psychometric standard. aDenotes overall criterion-related validity. bDenotes overall concurrent validity.

94 Results of Most Cited Studies Documenting Explicit Measures of Racist Attitudes The general characteristics, reliability, and validity findings for the top ten measures presented in the most cited articles (according to Web of Science) are presented in Table 5 below. The top ten most cited measures perform relatively well compared to the majority of existing measures. Nonetheless, their psychometric characteristics are, overall, unsatisfactory and none could be recommended as a well validated and reliable measure of explicit racist attitudes. Indeed, none of the most popular scales were shown to be generally valid, highlighting the need for further research in this area to design, construct, and validate a scientifically robust measure of explicit racist attitudes.

95 Table 5 Characteristics and Adequacy of Most Cited Studies Documenting Explicit Measures of Racist Attitudes Pettigrew and Meertens (1995)

Quillian (1995)

Wittenbrink et al. (1997)

McConahay, et al. (1981)

476

393 AntiImmigrant and Racial Prejudice Index

388

318

Times cited

Measure presented

Short name Region

Subtle Prejudice Scale

Blatant Prejudice Scale

-

-

-

Europe

Europe

Europe

Brigham (1993) 149 Blacks‟ Attitude Towards Whites Scale

Whites‟ Attitude Towards Blacks Scale

Diversity Scale

Discrim. Scale

Scale of Modern Racism

Scale of Old Fashioned Racism

-

-

-

-

ATW

ATB

North America African 18-60 4 ?




North America White 18-60 20 +

North America African 18-60 20 +

Henry and Sears (2002) 117 The Symbolic Racism 2000 Scale the SR2K scale North America African 18-60 8 +

Target racial group Mixed Mixed Mixed Age group (years) 12+ 12+ 12+ No. of items 10 10 7 Factor structure + + + Instrument reliability Internal consistency + + + + + ? ? + + Test-retest ? ? ? ? ? + + ? ? Parallel forms ? ? ? ? ? ? ? ? ? Inter-rater ? ? ? ? ? ? ? ? ? Instrument validity Content ? ? ? ? ? ? ? + + Criterion-relateda + + + + + ? ? + + Predictive ? ? ? ? ? ? ? ? ? Concurrentb + + + + + + + + + Convergent + + + + + ? ? + + Discriminant + + + + + + + + + Validity across groups ? ? ? ? ? ? ? ? ? Note. Discrim. denotes Discrimination. + denotes adequate psychometric properties. – denotes inadequate psychometric properties. ? denotes unknown psychometric properties. aDenotes overall criterion-related validity. bDenotes overall concurrent validity.

+ ? ? ? ? + ? + + + ?

96 General Discussion There are many problems with studies measuring explicit racist attitudes. Most investigations presenting the development of instruments of explicit racist attitudes failed to adequately assess both the reliability and validity of the presented measures. Moreover, when tests of reliability and validity were explored, the tools often failed to meet minimum required standards expected of scientific attitude scales. Although most studies directly assessed at least one type of validity, when the assessment of validity was scrutinised further, less than 5% of studies addressed a sufficient range of reliability and validity for the instrument to be considered valid. Limitations One limitation of the current review is that only initial validation studies of measures of explicit racist attitudes were evaluated. Some of the reviewed measures may have been more adequately appraised and refined in subsequent research, which is not reflected in the present analyses. For example, Ard and Cook (1977) presented a measure that had undergone several iterations after initial presentation in Woodmansee and Cook (1967). The choice was made to review only initial presentations of measures to ensure that measures that had become popularly studied after their original publication were not advantaged over other instruments that had not been widely examined outside of the preliminary validation study. The intensive and rigorous examination of the reliability and validity of a newly developed measure should occur prior to its use and dissemination to the scientific community. Hence, it is more appropriate to evaluate only research that first presents an instrument. The exclusion of doctoral theses, conference papers, books, and other unpublished material from the examination of the current research may also have limited the accuracy and breadth of the current results. Vigorously evaluated measures of explicit racist attitudes, with demonstrated high reliability and validity, may exist and have been presented in these „grey literature‟ document forms. However, retrieving theses and conference proceedings are

97 problematic because not all documents are available for retrieval and review. I am confident however that all relevant peer reviewed journal articles were identified and reviewed. The exclusion of non-standard questionnaire measures of explicit racist attitudes further limits the results of the present review. For example, those research studies which have utilised social distance measures, feelings thermometers, or differential categorisations to evaluate racist attitudes were considered to be inappropriate for inclusion in the current research, for reasons discussed above. Each of these measures may have added meaningful data to the present review, but to ensure the uniformity of results, and so comparability, studies were included only if they presented measures which utilised a standard questionnaire/interview format with multiple forced choice options. Moreover, such instruments are much more capable of accurately and meaningfully measuring explicit racist attitudes. There is no standard or prescribed approach to assessing or evaluating a newly developed scale. Therefore the evaluation of various instruments from several regions, published across several decades, is difficult to summarise in a single review. Moreover, the objective evaluation of these scales based on the criteria utilised in the current review does not necessarily clarify which scales are better, or more useful, than any other. Although I attempted to view all instruments objectively and apply unbiased criteria to evaluate each, assessing every tool in this way may not provide the most accurate or useful outcome. Conclusion Despite limitations, in this review I have presented a comprehensive examination of the explicit racism literature. As well as describing the basic structure of the available measures of explicit racist attitudes, I have revealed the strengths and significant weaknesses that need to be addressed in future research. The field of racism research is well established, with over 100 instruments assessing explicit racist attitudes and 24 evaluating perceived racism available to the scientific

98 community (Bastos, 2011). In many circumstances newly developed measures undergo an arduous scientific scale development and validation process, resulting in comparable instruments. Yet, new instruments are often similar to existing measures and add little to the field of racism research. Future studies may therefore be better served addressing the limitations of the existing measures of explicit racist attitudes through modification and subsequent validation, rather than designing and developing redundant instruments. Conversely, if scales are required to be developed because of a lack of an appropriate measure, for instance for use in countries outside of the US, several key points must be addressed. A review of examinations of perceived racism and health highlights “a dearth of cohort studies, a lack of psychometrically validated exposure instruments, poor conceptualization and definition of racism, [and] conflation of racism with [other variables]” and proposes the need for “large-scale survey vehicles as well as longitudinal studies and studies involving children” (Paradies, 2006b, p. 888). Another suggests the inability to draw “causal conclusions about the relationship between perceived discrimination and physical or mental health because of the cross-sectional designs of most of the research in this area” (Pascoe & Richman, 2009, p. 545). In contrast a more recent review of measures of perceived racism suggests acceptable psychometric properties for existing instruments, but additionally notes that “despite the fact that discrimination stands as topic of international relevance, 23 (96%) scales were developed within the United States” and advocates for a “universal instrument which would permit cross-cultural adaptations” (Bastos, 2011, p. 4). Similar conclusions can be drawn from the preceding analyses. The development and subsequent validation of measures of explicit racist attitudes must, first and foremost, be increased to meet the growing demands of the complex and diverse communities around the globe. Such research needs to be conducted with wider samples and the measures must undergo, and pass, robust examinations of their reliability and validity prior to their

99 dissemination to the scientific community. More studies which utilise both experimental and cross-sectional designs to evaluate their instruments throughout their development stages must be adopted to improve the quality of instruments. Similarly, future research needs to validate explicit measures of racist attitudes utilising representative sampling to improve the generalisability of the results. Forthcoming research needs to acknowledge the need for instruments to be validated on the populations for which they are designed for and therefore utilise community samples more often. Instruments must undergo additional testing to validate them appropriately before they can be utilised to evaluate levels of explicit racist attitudes. Specifically, despite the restrictions in assessment of predictive validity, future research should ensure that it is appropriately evaluated. These explorations must also ensure that measures are temporally constant and can therefore be of utility to assess attitude change. More work needs to be implemented in forthcoming developments and assessments of explicit measures of racist attitudes. These actions need to be taken in all prospective investigations to ensure that measures are reliable, valid, useful, and widely generalisable. Without instruments possessing these integral characteristics, research that aims to reduce racist attitudes will be ineffective and anti-racism initiatives will continue to be unable to be accurately evaluated.

100 Chapter 4: Submitted Research Articles Chapter 4.1: “Just a Joke”: Young Australian Understandings of Racism ................................ 102 Chapter 4.2: Developing the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES) ............................................................................................................................................ 124 Chapter 4.3: Validating the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES): Item Response Theory Findings .................................................................................... 146 Chapter 4.4: Building Harmony: Racism Reduction in Australian Schools ............................... 176 Chapter 4.5: Is there a Relationship between Psychopathic Traits and Racism? ....................... 207

101 Monash University

Declaration for Thesis Chapter 4.1 Declaration by candidate In the case of Chapter 4.1, the nature and extent of my contribution to the work was the following: Nature of contribution Development, conceptualisation, conduct of research; collation and

Extent of contribution (%) 75%

analysis of results; and writing and submission of manuscript. The following co-authors contributed to the work. If co-authors are students at Monash University, the extent of their contribution in percentage terms must be stated: Name

Nature of contribution

Kaine Grigg

Development, conceptualisation, conduct of

Extent of contribution (%) for student coauthors only 75%

research; collation and analysis of results; and writing and submission of manuscript. Lenore

Development, conceptualisation, and

Manderson

supervision of research, and review of manuscript drafts.

The undersigned hereby certify that the above declaration correctly reflects the nature and extent of the candidate’s and co-authors’ contributions to this work*. Candidate’s Signature

Date

Main Supervisor’s Signature

Date

01-07-2014

01-07-2014

102 Chapter 4.1: “Just a Joke”: Young Australian Understandings of Racism Abstract.............................................................................................................................................. 103 1. Introduction ................................................................................................................................... 104 1.1. Rationale and Aim ...................................................................................................................... 105 2. Method ........................................................................................................................................... 106 3. Results ............................................................................................................................................ 108 3.1. Group versus Individual ............................................................................................................. 109 3.2. Actions versus Beliefs ................................................................................................................. 111 3.3. Exceptions, Exclusions, and Minimisation .............................................................................. 113 4. Discussion....................................................................................................................................... 116 4.1. Conclusion................................................................................................................................... 122

Submitted to the International Journal of Intercultural Relations 31-03-2014.

103 Abstract Lay understandings reflect the lived experience of racism, and our knowledge of these considerations assist with enhancing an appreciation of intergroup relations. In this article we draw on data from semi-structured interviews and focus groups with 30 school attendees aged 14-22 years, conducted from December 2011-January 2012 in Victoria, Australia, to critically examine their understandings of and experiences with racism. Data demonstrate the ambiguity of racism, while confirming that Australian youth utilise a reasonably consistent and sophisticated explanatory model to conceptualise, explain, and classify racism. Participants described racism through three primary domains: (a) Group versus Individual: racism stems from perceived differences, with individuals stereotyped as belonging to larger groups; (b) Actions versus Beliefs: individuals are classified as racist or non-racist according to their actions and beliefs; and (c) Exceptions, Exclusions and Minimisation: racism is frequently excused and minimised, reinforcing the status quo and prevailing perceptions of difference. The present research highlights the need for additional exploration of the nuances of racism in Australia from lay perspectives and provides clear evidence of the need to address racism in Australian society. Further developing the evidence base to understand the lived experience of racism in Australia could inform and support the design and evaluation of anti-racism and pro-diversity initiatives. Moreover, we hope that the present data can be drawn upon to enlighten the development of instruments to more accurately measure racist attitudes in Australian youth. Keywords: Australia, qualitative methods, racism, youth, understanding, experience

104 “Just a Joke”: Young Australian Understandings of Racism 1. Introduction Racism is any cognition, affective state, or behaviour that advances the differential treatment of individuals or groups due to their racial, ethnic, cultural, or religious background. It is a phenomenon manifested at all levels of society, from government policies to organisational structures, intergroup and interpersonal relations, and intraindividual attitudes (Sanson, et al., 1998). As these planes intersect, racism needs to be understood and addressed at all levels. Systemic racism has typically reflected historical structural inequalities. Although these can be alleviated to an extent by legislation and societal interventions, the underlying mechanisms of interpersonal racism and related intraindividual attitudes are less well appreciated and effective interventions are limited. Racist attitudes, like all attitudes, form from a young age and, depending on the environment, thrive or diminish (Lasker, 1929; Nesdale, 2011). Children as young as four years are racially aware, and the capacity for perception of discrimination is firmly established by adolescence (Aroian, 2012; Nesdale, Griffith, Durkin, & Maass, 2005). There is no consensus, however, on how young people acquire and maintain racist attitudes, or of how youth conceptualise and experience racism. Although there is now considerable quantitative research on racism and racist discourse in Australia (Augoustinos, et al., 2005; Pedersen, et al., 2012), most work has examined, through quantitative methods, the prevalence, causes, and effects of racist attitudes of White towards Black Americans. Comparatively little attention has been devoted to common-sense understandings and lived experiences (Figgou & Condor, 2006). Differences in context and cultural milieu preclude direct extrapolation of US findings to Australia (Pedersen, et al., 2004), and there has been little work on the experiences of and attitudes underpinning racism in Australia (Pedersen, et al., 2012). Still less investigation has addressed these attitudes in young people (c.f. McLeod & Yates, 2003; Poole, 1975). Further

105 qualitative research is needed to examine the complexities racism across multiple racial, ethnic, cultural, and religious groups among young Australians (Walton, Priest, & Paradies, 2013). Australia is an extremely diverse nation: one-quarter of the population is born overseas and an additional one-fifth has at least one parent born overseas (Australian Bureau of Statistics, 2012c). Victoria is Australia‟s most culturally and linguistically diverse state, with a population originating from over 230 countries, speaking more than 200 languages, and following 120 faiths (Department of Health, 2012). The City of Greater Dandenong, 40km south east of Melbourne CBD and central for participant recruitment for the study on which we draw, is the most culturally diverse municipality state-wide and the second most diverse nationally. Sixty per cent of residents are born overseas and half have heritage where English is not the dominant language (City of Greater Dandenong Council, 2012). Adjacent areas targeted for recruitment are also rapidly growing in population and diversity, and are characterised by swift cultural change (Australian Bureau of Statistics, 2012a, 2012b). Some 10-33% of Australians have experienced racism, increasing to 50% of nonEnglish speaking background individuals and 90% of Indigenous peoples (Dunn, et al., 2009; Ziersch, Gallaher, Baum, & Bentley, 2011). This is of particular concern given the increasingly consistent findings of research demonstrating serious mental and physical health problems from exposure to racism (Paradies, 2006b; Williams, et al., 2008). 1.1. Rationale and Aim Enhancing academic understandings of how lay people conceptualise racial, ethnic, cultural, and religious difference and racism more broadly is of critical importance in reducing racism and promoting alternate positive attitudes (Walton, et al., 2013). The current study was therefore conducted to explore conceptualisations of this phenomenon among young Victorians, so to enhance our understanding of the dynamics of racism, differential treatment due to perceived racial, ethnic, cultural, or religious group membership, and its

106 converse, racial, ethnic, cultural, and religious acceptance. The potential of qualitative research to be an integral, in-depth, and unobtrusive component of instrument development has been highlighted previously (Walton, et al., 2013). The conceptualisations provided by participants were therefore also intended to inform an attitudinal measure of racial, ethnic, cultural, and religious acceptance currently under construction. 2. Method Participants were recruited via advertising flyers circulated through Monash University staff mailing lists and members of a network of community service organisations involved with culturally and linguistically diverse populations, with additional participants recruited through snowball sampling. Interested participants contacted the lead author via email and were invited to attend a consultation. Interviews and focus groups allow exploration of the richness and diversity of participants‟ experiences, whilst permitting social interactions to elicit varying perspectives. These complementary methods have been widely used to investigate racism and related issues in youth aged as young as five years (Aroian, 2012; Connolly, 2000; Kennedy, 2001; McKown, 2004). However, we were concerned that youth under age 16 years may be unable to form a comprehensive and meaningful narrative of racism, and accordingly, we increased our minimum target age. Target participants therefore included youths aged 16 to 20 years and currently attending high school. Thirteen individual interviews and three focus groups, involving 17 (two, seven, and eight respectively) additional students, were conducted, ensuring the emergence of several strong and common themes and allowing theoretical saturation in the context of this study (Guest, Bunce, & Johnson, 2006). Two formal means of data collection were utilised within the interviews and focus groups: a demographic questionnaire to obtain information on racial/ethnic background, country of birth, gender, and age; and a semi-structured interview schedule (see Appendix 4). The protocol for the interview schedule was developed from previous qualitative and

107 conceptual investigations about racism, and an appraisal of the qualitative components of perceived and perpetrated racism measures. The flexible schedule enabled elicitation of spontaneous input from participants, providing insight into personal experiences of racism. Use of both interviews and focus groups allowed stimulation of commentary that would not be revealed in an individual or group setting alone. Interviews were completed prior to the commencement of focus groups, so that interview data informed group discussions. Focus group discussions drew out disagreements, contradictions, and supplementary information. Questioning in both interviews and focus groups was iterative, with inquiries building on prior responses and, as data collection progressed, addressing salient areas not in the original protocol. The data provided insight into the experience of being a member of a particular racial/ethnic group; the implications of this membership for daily life; and how this influenced the experience and conceptualisation of racism and acceptance. Nine sets of questions aimed to explore understandings and experiences of racism: Defining racism; Racist beliefs and attitudes; Racist actions; Accepting beliefs and attitudes; Accepting actions; Attributing racism to actions; Witnessed racism experiences; Individual racism experiences; and Acknowledgement of difference. All interviews were conducted in a private consultation room on the university campus or at the head office of a collaborating NGO. All focus groups were conducted in private consultation rooms: one at the university (FG1), one at a secondary college (FG2), the third at the offices of the local government authority associated with the study (FG3). Ethics approval was received from Monash University Human Research Ethics Committee. English plain language statements were provided to participants and informed consent was obtained prior to initiation of consultation. Participants were offered copies of transcripts; two individuals and participants from two focus groups expressed interest and were provided with the relevant transcripts.

108 Transcription was driven by conventional content analysis due to the aim of the research being to provide a description of the phenomenon of racism in Australian youth. This form of content analysis is most suitable when existing theory, research, or understandings are limited (Hsieh & Shannon, 2005); it was therefore considered to be the most appropriate method for the present study. To enhance the internal validity and rigor of the data, transcription was undertaken by the interviewer rather than an external party. Transcription occurred within one week to ensure content was fresh and clear. Subsequently, the lead author read through the transcripts for an overall understanding of the data and apparently salient themes were noted. The coauthor reviewed the transcripts, highlighting and noting extracts and strong themes. The authors then agreed upon prominent themes and the lead author extracted all significant passages, thematically categorising them. The most prominent and representative quotes were audited and selected for potential inclusion in the research report. Our final review ensured that the quotations were representative of the overall data. 3. Results Thirty Victorian youths (equal numbers male and female), aged 14-22 years (M=16.77, SD=1.76) at the time of their assessment, participated. Nine participants (aged 14, 15 [x6], 21, and 22 years) from the focus groups and one interview participant aged 15 years were outside the a-priori target age; we included them due to their involvement in youth programs attended by other participants. The mean age of interviewees was 16.77 years (SD=1.01; five male and eight female). The mean age for focus group participants was 16.76 years (SD=2.20; 10 male and seven female). Twelve participants (40%) self-identified as Australian; the remainder identified as being of 16 different racial/ethnic backgrounds. Sixteen participants (53%) had at least one parent born outside of Australia.

109 Three core domains were identified, elaborated into a total of 13 subordinate themes. Pseudonyms, with age and gender, are utilised in quotations to attribute comments while protecting participants‟ identities. 3.1. Group versus Individual The domain Group versus Individual encompassed five subordinate themes, „Others‟ as Victims and Whiteness as Invisible, Values and Beliefs, Intra-Racial Racism, Hierarchy, and Degradation. Each theme related to a framework used to classify individuals as a member of a particular racial, ethnic, cultural, or religious group. Most participants considered it racist to regard individuals as abstractly belonging to a specific group: “If you… have a certain idea about who someone is and you don‟t know that person and you judge them based on their appearance, their background, their culture... that‟s... racism” (Amanthi, F, 18). Conversely, ignoring group membership and acknowledging individuality was considered non-racist because “it‟s important to respect everyone as an individual” (Amanthi, F, 18). Although this focus was on the individual, and between individuals, most respondents defined racism as occurring between groups, with racial, ethnic, cultural, or religious groups “targeting and discriminating against another” (Michael, M, 16). Participants used an „Other‟ and „White‟ dichotomy to abstractly label racial, ethnic, cultural, and religious groups, although this categorisation varied amongst participants. Perpetrators of racism were often described as “the typical white racist stereotype” (Asuntha, F, 17) and those subject to racism generally as non-White minority individuals. Skin-color as the source of power and subjugation was described as internalised: one participant recollected that her sister “started to drink milk because she thought that it would make her skin turn white” (Asuntha, F, 17). Racism was associated with differences in beliefs, values, and culture: “[Racists] basically don‟t understand your culture… your roots and what they mean to you” (Bindu, F, 17). Values and beliefs were central to how people were classified; participants reflected on

110 how „racists‟ categorise individuals as members of distinct groups, and how group memberships were used to enact and justify racist behaviour: “[Racists] usually think that the other people are... wrong or that what they say is better” (Michael, M, 16). Although racism was predominantly considered to exist between groups, participants also drew attention to intra-racial racism, whereby cultural stereotypes informed racist behaviour in response to minor intra-group differences: “People don‟t even have to be… a different race to be the subject of racism - you could be the same colour, from the same community, but still be different, and be the subject [of racism]” (FG2). One participant recounted her own experience: “They didn‟t like me because I was only half-Turkish… they would be extra proud of their Turkishness when I was around... you could class that as racism” (Abeba, F, 17). Most participants considered intra-racial racism to be potentially more disorienting and self-eroding than inter-racial racism. One participant described her response to exposure to inter- and subsequently intra-racial racism: “I don‟t fit in… [I‟m] stuck in the middle... not an ethnic… not Australian… Where do I belong?” (Asuntha, F, 17). Respondents identified a hierarchical social and cultural system in Australia, with racism enacted according to this hierarchy by „racist‟ individuals on the basis of beliefs that “their own race or background is superior” (Tenagne, F, 16) and that “the other people‟s way of life is wrong” (Hinni, F, 17). Racist treatment perpetuated and reinforced inequality, implying that the treatment recipient belonged to a “lower class of person” (Michael, M, 16). A key outcome of racist behaviour was the degradation, dehumanisation, or deliberate subordination of groups and individuals: “It was hard on me because I wanted to fit in … everyone wants to feel like they belong” (Abeba, F, 17). This led to people questioning their identity: “Does everyone... classify me as black? ... Is that who I am?” (Asuntha, F, 17). Conversely, non-racist and accepting individuals humanised others, being “a nice person – just going up and treating them as a human” (Rajiv, M, 17).

111 3.2. Actions versus Beliefs The domain Actions versus Beliefs encompassed three subordinate themes, Intent versus Perception, Inaction as Accepting, and Acceptance of Difference as Non-Racism. Each theme was associated with the conceptual struggle of determining what constitutes racism. The beliefs of individuals – their “racist motivation” (FG1) – were considered more important than actions when judging if they were racist or non-racist: “It‟s easy to blame the person who carries out the actions… [but people] are racist because they have racist views” (Hinni, F, 17). However, a minority suggested that actions were more important, because “it takes a lot more to act... rather than just thinking it” (Abeba, F, 17), and because an actively racist person is “the bigger threat to society” (Asuntha, F, 17) with the potential to “hurt someone” (Hinni, F, 17). The balance between beliefs and actions was evident also in people who were accepting of difference. Some participants believed that “you can have views… and not be very expressive… [That] doesn‟t make you any less accepting... just… less verbal about expressing it” (Bindu, F, 17). This was because “[some people] are more confident and... believe in themselves... whereas someone who… doesn‟t act... [may] be afraid of repercussions” (Asuntha, F, 17). Others suggested that “little knowledge that has action is greater than a lot of knowledge that remains idle” (Habib, M, 17) because “true nonracist[s]... have a responsibility to stand up... Otherwise you accept racism... [and are] contributing to the racist attitudes of society” (Amanthi, F, 18). Beliefs were generally regarded as more dominant than actions, and the intent of an action more central than the perception of an action. An act was primarily classified as racist if there was intentionality, shaping participants‟ attitudes towards various acts, including speech acts:

112 People... say “Oh that‟s racist” when they have no racist intent whatsoever … you can still do something that is racist and not mean to be racist… It can still be racist to that other person. (Rajiv, M, 17) Intent is important… people will... react in different ways to what they perceive as racist and some people can be like really over-sensitive and stuff… [and] automatically think that something that someone does is racist. (FG1) Participants emphasised the importance of non-racist beliefs over specific actions: “[If] you don‟t… care about the race of a person… just judge them on who they are… actions [don‟t] make… a difference” (Rajiv, M, 17). Participants struggled to name non-racist actions, and most concluded that a lack of racist behaviour constituted acceptance: “Sometimes it‟s not what we do, it‟s what we don‟t do… just treat[ing] everyone normally [is non-racism]” (Amanthi, F, 18). A person who did not undertake any specific acts of acceptance, but did not enact any racist behaviour, was considered non-racist and accepting of difference: [There are] people who will actually stand up for someone who is victimised… who take action when they see a racist situation… they won‟t just be the silent bystander… and then you have other people who aren‟t racist who won‟t say anything. (Abeba, F, 17) Acceptance of difference, openness, and kindness were considered key to being nonracist: “[Non-racists believe that] everyone should be treated equal” (Amanthi, F, 18) and “don‟t prefer… everyone is the same” (Viktor, M, 16). One participant portrayed non-racist individuals as: More open minded... have more of an understanding of what is around them… [and] more willing to accept someone if they are from a different race knowing that you guys are both human and [although] you might have different values, it shouldn‟t matter. (Abeba, F, 17)

113 Another drew upon his religious knowledge: The Quran... says “We have created you in different cultures, in different tribes and different groups so you can interact and know one another, so you know about each other.” We are created so different, not so we can stay different and go against each other - it‟s so we can interact and learn about each other. (Habib, M, 17) 3.3. Exceptions, Exclusions, and Minimisation The domain Exceptions, Exclusions, and Minimisation included five themes that excepted, excluded, or minimised racist behaviour, Racism as a Continuum, Racism as Pervasive, Fear and Ignorance, Multiculturalism, and Discomfort with Other Groups. Potentially racist behaviours were frequently re-evaluated as non-racist because of context or misinterpretation. Respondents offered various excuses for racist behaviour, including ignorance or lack of information, fear, insecurity, upbringing, prior negative experiences, societal pressures, the pervasiveness of racism, the impossibility of anyone being completely non-racist, and “unconscious” action (Habib, M, 17) or innate psychological processes. Some blamed victims of racism for precipitating such attacks: “Refugees… don‟t know… Australian ways and they think that they… are back home and... act differently” (Habib, M, 17). One participant described how past negative interactions lead to “racism based on… previous experience” (Viktor, M, 16). Another participant excused racist humour: “Little jokes… it‟s just a joke… we don‟t mean it and we don‟t harm anyone… I don‟t think that I‟m racist.” (Kamila, F, 15), and one explained that jokes were acceptable if delivered „appropriately‟: I do the whole… intra-racism thing, but it is always as a joke … There is a fine line... The way you say it… if you intend to harm someone, you will obviously say it in an intimidating way, but if you meant it as a joke, then you say it in a comical way. (Bindu, F, 17) Others pardoned potentially racist behaviour, suggesting that it can be misinterpreted:

114 You could be the type of person who thinks that everyone is racist, like every bad thing that happens to you in your life is due to racism, or you can be the opposite and think that it‟s just meant to be and nothing is racist. (Asuntha, F, 17) People can use racism… as a scapegoat… like maybe “I didn‟t get a job because of my background” – that‟s not how the world works now, I don‟t think many people, or employers think about it… you got the qualifications, you got the personality, you get the job. (Rajiv, M, 17) Participants proposed two independent categories of “racist” and “non-racist”: “Either you are accepting of other people being from other places, or you‟re not” (Tenagne, F, 16). However, most perceived that racism functioned along a continuum. One person could be completely non-racist, believing in absolute equality and acceptance; another could hold beliefs of extreme racism; most people were positioned between these extremes, with “everyone… racist to a degree” (FG3): There are ... people who are only a little bit racist, but they are... racist when compared to people who typically aren‟t. So it would be difficult to have... “racist” and “not racist”. It would have to be a continuum. (Abeba, F, 17) Racism was considered to be pervasive. Participants maintained that they and others, even non-racist individuals, unavoidably had infrequent racist thoughts: “everyone has said something racist… in their life” (FG3). Participants explained this inherent pervasiveness: Everyone... has an image of a certain race and no one can ever be innocent of never thinking, “Oh, it‟s a typical Indian taxi driver” or typical this, or typical that… we all exhibit racist thoughts to an extent. (Amanthi, F, 18) Yet a small minority argued that “[although] it‟s hard… a bit of an ideal to be completely non-racist… you definitely can” (FG1). Accordingly, occasional racist thoughts or behaviours did not necessarily result in being classified as racist:

115 There are people who are non-racist… But even those people… once in their entire life have definitely thought something… racist. (Rajiv, M, 17) Participants believed racism to stem from fear of difference, ignorance, “insecurity within themselves” (Abeba, F, 17), and to function to “hide the fact that [racists] are scared” (Tenagne, F, 16). Racism was excused where an individual had a bad experience leading to generalised fear and was rationalised when an individual lacked knowledge or understanding: When they see something foreign they... attack it because they don‟t understand… why it‟s different… Kind of like bullying… they need to discriminate against others… so they feel better. (Abeba, F, 17) Racism is… I wouldn‟t say hatred… just not understanding… not enough information on different backgrounds, and the only way to cure it is to learn about other cultures. (Rajiv, M, 17) In noting Australia‟s culturally diversity – “having most of the races” (Rajiv, M, 17) – participants rejected global racist labels of Australia: “we are not a racist society” (Fianna, F, 16). Participants argued that multiculturalism and racism were mutually exclusive: We are the cultural… melting pot … and to say something racist… would be... contradictory because we can‟t deny... [Australia‟s] diverse population… Saying… or doing something racist… would be... holding us back and… fabricating what Australia is. (Abeba, F, 17) However, individual racism was also explained by Australian societal influences because “[no] one is born racist… it is something that is bestowed upon kids… someone will tell them when they are young that they are better or worse than... some other racial background” (FG1). Such pressures were associated with systems and institutions that perpetuated a racist, prejudiced, and discriminatory worldview: “Society tends to ignore it and… by society ignoring it, it is accepting it” (Amanthi, F, 18). Participants illustrated how institutions contributed to racism:

116 Schools where the majority of the kids are Australian… racism is very, very, common… [but] racism will never be directly targeted because it is such a sensitive topic and no one wants to classify [or] consider their school to be a racist school with racist kids. (Abeba, F, 17) Participants noted a conundrum that “when people isolate an ethnicity, that ethnicity… will... stick together… Then we judge them on being together, so they can‟t win either way” (Amanthi, F, 18). Many suggested that such segregation and group affiliation was appropriate, justifiable, and so non-racist; for instance, where interactions with others caused discomfort. One participant explained this same-group affiliative need: I stick to people who look like me, because I feel comfortable… [It isn‟t] racism... if they just don‟t feel comfortable… People are not going to talk to other people if they don‟t feel comfortable. (Fianna, F, 16) Such behaviour would be regarded as racist, however, if an individual remained uncomfortable in an extended interaction, or if interactions differed according to racial, ethnic, cultural, or religious background: “It is just a case of getting to know people… if you still feel uncomfortable then it might be racism, but… initially you are always going to feel uncomfortable” (Asuntha, F, 17). Most participants regarded deliberate isolation from people of other backgrounds as ethnocentric and racist, even when the perpetrator was of minority background: “African kids… stick together because they are… afraid of expanding outwards… sometimes it‟s… backwards racism… fighting racism with racism” (Asuntha, F, 17). 4. Discussion Racism is defined and conceptualised in various ways, as illustrated by these everyday understandings of racism in Australian youth. When viewed together, they form interrelated components of a sophisticated explanatory model. The abstract classification of an individual subject to racism, in relation to a particular group, is utilised to enact racism.

117 Individuals who are perpetrators of racism are classified as racist based on actions and beliefs. However, various exceptions, exclusions, and the minimisation of racist behaviour perpetuate views of individuals (subject to racism) as shaped by group membership. This process repeats continually from perception of group membership, to classification as racist, to excusing racist behaviour, to perpetuating perceptions of group membership. Although the domains flow bi-directionally, the preceding indicates the dominant progression, as illustrated in Figure 1 below.

118

‘Others’ as Victims and Whiteness as Invisible Values and Beliefs Intra-Racial Racism Hierarchy Degradation

Racism as a Continuum

Intent versus Perception

Group versus Individual

Racism as Pervasive

Inaction as Accepting

Fear and Ignorance

Acceptance of Difference as NonRacism

Multiculturalism Discomfort with Other Groups

Exceptions, Exclusions and Minimisation

Actions versus Beliefs

Figure 1. Overall circular relationship between domains and sub-themes.

Conceptual literature delineates actions and beliefs as independent (Armitage & Christian, 2003). Lay youth and academic understandings both acknowledge the ambiguity of

119 many potentially racist actions, suggesting that behaviours perceived as racist and as having racist intent are distinct from behaviours perceived as racist but without racist intent (Guerin, 2005; Stevens, 2008). Participants advocated acceptance and belief in equality as characteristic of non-racist individuals. However, people offered varying views legitimating racist opinions (see also Augoustinos, et al., 2005); the classification of individuals as nonracist in the absence of racist action suggests tolerance rather than acceptance. This is not equivalent to anti-racist action (Green & Sonn, 2005). Australian youths‟ understandings of group and individual attributes suggest an underlying core psychological process (Allport, 1954), with the classification of people as ingroup (individual‟s membership group) or out-group (individual‟s non-membership group) supporting hierarchies of power (Hewstone, et al., 2002). However, in their interviews, young Australians illustrated also the importance of within-group differences in the form of intraracial racism (see also MacNaughton, 2001). Social identity, a component of the self-concept derived from group membership, theoretically leads to positive differentiation of in-groups from out-groups (Tajfel & Turner, 1979; Turner, 1986), perpetuating prevailing social structures, racist action, and the promotion of the dominant culture. Social hierarchies consisting of group beliefs of superiority and the inequitable distribution of power are essential for racism (Solorzano, et al., 2000). However, participants presented only vague conceptualisations of, and explanations for, minority group disadvantage, and advocated dominant ideals of merit, hard work, and individuality as core to success (see also Augoustinos, et al., 2005). Utilising majority values to assess all groups leads to a „White‟ versus „Other‟ binary, with „Others‟ represented by White Australian interpretations rather than minority group introspection (Green & Sonn, 2005). The interrogation of, and challenges to, normative „Whiteness‟, privilege, and associated institutional racism are therefore crucial to effective anti-racism strategies (Green & Sonn, 2005; Hollinsworth, 2012; Riggs & Augoustinos,

120 2005). Fear of distinct groups, their beliefs and differences, leads to the problematisation and moral exclusion of minorities (Pedersen, et al., 2012). Further marginalisation occurs with expectations that English language and Australian values are adopted (Riggs & Due, 2011); such assimilatory beliefs are core to contemporary racism (Quayle & Sonn, 2009). In extremes these forms of fear, exclusion, and out-group rejection lead to depersonalisation, delegitimisation, and dehumanisation (Haslam, 2006; Tileagă, 2007). Such on-going racist interactions degrade one‟s sense of self, with one core effect being the internalisation of racist attitudes (Jones, 2000; Solorzano, et al., 2000). Consistent with prior research, participants questioned their own identity after being subjected to racism. The domain Exceptions, Exclusions, and Minimisation was the one area within which understandings of Australian youth and academia conflicted considerably. Mitigation strategies downplayed racism while exonerating perpetrator(s) and participants of all racial, ethnic, cultural, and religious backgrounds used these techniques to minimise various forms of racism. Victim blaming, classifying minority group ethnocentricity as racist, and ignoring the role of non-dominant group social spaces in providing positive spaces for minority youth are some strategies highlighted in previous research which were utilised by participants (Solorzano, 1998; Solorzano, et al., 2000). Minorities are principally blamed for problems, with little responsibility on the wider community (Morrice, 2007; Riggs & Due, 2010), ignoring that intra-group contact is preferred because of important ethnocultural and linguistic similarities in friendship and subjective belonging, so enhancing youth wellbeing (Correa-Velez, Gifford, & Barnett, 2010; MacInnis & Hodson, 2012; Nesdale, 2011; Riggs & Due, 2010). Both minority and majority group participants felt more comfortable with people from their own background, yet questioned whether discomfort was racist. Participants referred to „human nature‟, „societal influences‟, and „ignorance‟ as core to the development of racism (see also Silva, 2012). In addition, they consistently made

121 exceptions and minimised the racist intent of racist behaviours, discounting the detrimental effect on racial interactions of microaggressions and their potentially devastating additive impact (Burdsey, 2011; Solorzano & Bernal, 2001; Sue, et al., 2007). Jokes in particular were dismissed: participants found it difficult to classify jokes as racist because of their ambiguity, pervasiveness, and the capacity of people of all backgrounds to share in the humour (Roberts, Bell, & Murphy, 2008; Stevens, 2008). Although casual racist comments may maintain social peer groups rather than endorsing racism, as our respondents noted, they also marginalise targets (Burdsey, 2011; Guerin, 2003). Participants highlighted the contradictions of Australian multiculturalism and racism. Multiculturalism emerges as conditional and tainted by systemic, cultural, and interpersonal racism, which is deeply engrained within cultural and institutional practices, even where explicitly rejected (Billig, 1995; Howarth, 2004; Wetherell & Potter, 1992). Multicultural attitudes, like all social attitudes and behaviour, are strongly influenced by institutions, structures, and socialisation processes (Louis, Mavor, & Terry, 2003; White & Gleitzman, 2006); media, peers, parents, and educational institutions all inform young people‟s understandings and attitudes (MacNaughton, 2001; Nesdale, 2011; Palmer, 1990; Watt & Larkin, 2010). Parental and peer attitudes are especially influential (Palmer, 1990; Pedersen, Griffiths, Contos, Bishop, & Walker, 2000) and the media is important in legitimating racism through the negative portrayal of minority groups as a societal threat, thus perpetuating social distance, further marginalisation, and acceptance of poor treatment (Quayle & Sonn, 2009; Sulaiman-Hill, Thompson, Afsar, & Hodliffe, 2011). Conversely, the media has the potential to encourage social inclusion and raise awareness of multiculturalism and diversity by promoting positive normative messages (Watt & Larkin, 2010). Contemporary racism has often been suggested to reflect a significant shift from earlier overt racism (Sanson, et al., 1998), and the present data confirm that racism in Australia has changed form. Youth felt uncertain about, and ambivalent towards, classifying

122 an action, thought, belief, or individual, as categorically racist. Although none considered themselves to be racist, most admitted to perpetrating racist acts, thoughts, or beliefs, highlighting the difficulty of understanding, targeting, and therefore reducing racism. These results echo previous qualitative research (McLeod & Yates, 2003; Walton, et al., 2013), and provide a complementary and deeper insight into Australian youth conceptualisations of racism. The research literature primarily focuses on perceptions of racism by victims, or on racist attitudes in perpetrators. Our focus on everyday contexts and understandings of racism, outside of the accepted victim/perpetrator dichotomy, highlight the complexity of racism as understood and experienced by Australian youth. Generalisability is limited, however, given the small sample. Moreover, there is potential that younger participants especially may not have been able or willing to express their inner thoughts and opinions, or that older participants may have presented distinct conceptualisations. However, differences in opinion or conceptualisation were not apparent across ages and there was a strong and consistent recurrence of themes. These observations indicate that our data are robust. 4.1. Conclusion The present research highlights the need for additional exploration of the nuances of racism in Australia from lay perspectives. Research in varying social, economic, and demographic contexts is required. Further examination of the generalisability of the explanatory model utilised by participants would assist in developing an evidence base to address the lived experience of racism in Australia. Such deeper understandings could subsequently be utilised to inform and support the design and evaluation of anti-racism and pro-diversity initiatives. Moreover, we hope that the present data can be drawn upon to enlighten the development of instruments to more accurately measure racist attitudes in Australian youth.

123 Monash University

Declaration for Thesis Chapter 4.2 Declaration by candidate In the case of Chapter 4.2, the nature and extent of my contribution to the work was the following: Nature of contribution Development, conceptualisation, conduct of research; collation and

Extent of contribution (%) 75%

analysis of results; and writing and submission of manuscript. The following co-authors contributed to the work. If co-authors are students at Monash University, the extent of their contribution in percentage terms must be stated: Name

Nature of contribution

Kaine Grigg

Development, conceptualisation, conduct of

Extent of contribution (%) for student coauthors only 75%

research; collation and analysis of results; and writing and submission of manuscript. Lenore

Development, conceptualisation, and

Manderson

supervision of research, and review of manuscript drafts.

The undersigned hereby certify that the above declaration correctly reflects the nature and extent of the candidate’s and co-authors’ contributions to this work*. Candidate’s Signature

Date

Main Supervisor’s Signature

Date

01-07-2014

01-07-2014

124 Chapter 4.2: Developing the Australian Racism, Acceptance, and CulturalEthnocentrism Scale (RACES) Abstract.............................................................................................................................................. 125 Developing the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES) 126 Study 1: Item Development .............................................................................................................. 127 Study 2-4 Preliminary Data Analysis .............................................................................................. 128 Study 2: Principal Components and Exploratory Factor Analyses ............................................. 129 Materials and Methods ..................................................................................................................... 129 Results ................................................................................................................................................ 129 Study 3: Exploratory and Confirmatory Factor Analyses ............................................................ 131 Materials and Methods ..................................................................................................................... 131 Results ................................................................................................................................................ 132 Study 4: Item Response Theory Analyses ....................................................................................... 136 Materials and Methods ..................................................................................................................... 136 Results ................................................................................................................................................ 136 Study 5: Convergent and Discriminant Validity ............................................................................ 137 Materials and Methods ..................................................................................................................... 137 Results ................................................................................................................................................ 137 General Discussion ............................................................................................................................ 142

Submitted to the Journal of Community and Applied Social Psychology 26-05-2014.

125 Abstract Australian measures of racist attitudes focus on single groups or have not been validated across the lifespan. To redress this, a measure of racial, ethnic, cultural, and religious acceptance – the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES) – was developed and validated with children, adolescents, and adults. Interviews and focus groups were conducted with 30 adolescents in Victoria, Australia, to develop the instrument, which was pilot tested with eight children. The novel 34-item scale consists of three subscales (Accepting Attitudes–12 items; Racist Attitudes–8 items; Ethnocentric Attitudes–4 items) and a 10-item social desirability measure. The instrument was tested with 296 school children and 402 adolescents and adults from the Australian community, with data modelled and analysed utilising Classical Test Theory and Item Response Theory. Keywords: Australia, racism, scale, Item Response Theory, Rasch analysis

126 Developing the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES) Racism has been consistently linked with negative mental health outcomes in various minority racial/ethnic groups in various societies with immigrant and indigenous populations (Cunningham & Paradies, 2013; Dunn & Geeraert, 2003; Harris et al., 2006; Larson, Gillies, Howard, & Coffin, 2007; Paradies, 2006b; Williams, et al., 2008). Most racism research, however, focuses on the effects of racism on its victims, while overlooking the factors that produce racism and preclude racial, ethnic, cultural, and religious acceptance. Internationally, various measures of racist attitudes exist, but these generally concentrate on anti-Black attitudes and are validated only for US populations. Since differences in context and cultural milieu preclude direct extrapolation of US findings to Australia (Pedersen, et al., 2004), several Australian measures have been developed. However, these either focus on one group (e.g., Indigenous Australians; Pedersen, et al., 2004) or have not been empirically developed and appropriately validated (e.g., Dunn & Geeraert, 2003). No Australian instrument has been developed utilising advanced psychometric analyses such as Item Response Theory (IRT), nor appropriately validated across ages, inhibiting the accurate evaluation of interventions addressing racist attitudes. This research explored racism as experienced by Australians from diverse backgrounds. Using an accepted scientific process of scale development (DeVellis, 2012), a measure of racial, ethnic, cultural, and religious acceptance was developed. Initial stages explored conceptualisations of racism based on data from in-depth semi-structured interviews and focus groups with young people from various racial/ethnic backgrounds, which were utilised to develop the preliminary items. Secondary stages examined the underlying latent factor structure of the measure across multiple age groups. Final stages validated the psychometric properties of this scale in adolescents and adults. Ethics approval for each stage was provided by Monash University Human Research Ethics Committee.

127 Study 1: Item Development Qualitative research was conducted from December 2011 to March 2012 on young Australian conceptualisations of, and experiences, with racism, to generate sufficient data to form the basis of a scale (detailed elsewhere; Grigg & Manderson, 2014c). The purpose of the final instrument was to inform anti-racism and pro-diversity initiatives. Items were therefore designed to measure acceptance of difference and racism viewed along a continuum. An initial item pool of 420 statements was reviewed for appropriateness, comprehensiveness, redundancy, and clarity of items; the item pool was reviewed by two experts in the racism field and then reduced to 40 statements covering 14 themes including comfort with, and acceptance of, difference, perceptions of safety with difference, and acts of racism. The preliminary scale contained 15 items with higher scores indicating greater acceptance and 25 items with higher scores indicating lower acceptance. Items were reworded to ensure a balance of positive and negative items, to avoid response bias due to the sensitivity of the attitudes under evaluation (Schriesheim & Hill, 1981; Schweizer & Schreiner, 2010) and to explore both positive (acceptance) and negative (racism) attitudes, which have been found to be functionally independent (i.e., positive attitudes are stronger predictors of positive behaviours and negative attitudes are stronger predictors of negative behaviours) (Pittinsky, et al., 2011) and conceptually distinct (Phillips & Ziller, 1997). A 10item version of the Marlowe-Crowne Social Desirability Scale (MCSDS; Fischer & Fick, 1993; Strahan & Gerbasi, 1972) was also amended and included in the preliminary scale (MCSDS-A) to assess self-presentation bias in Australia. Socially desirable responding was considered important to assess and is often included in addition to the primary measure of interest when scales address potentially uncomfortable or anxiety provoking topics (Anastasi & Urbina, 1996; Loewenthal, 2001). This is especially a concern when measuring sensitive concepts, such as those related to racism (Phillips & Ziller, 1997).

128 The items were randomised, with each eliciting a response on a four point Likert-type scale, from “Strongly Disagree” to “Strongly Agree” (half reverse scored). The preliminary scale was reviewed by six primary school principals and an experienced clinical child psychologist; introductory instructions were subsequently expanded and simplified. The scale was reviewed by participants in three focus groups (14-22 years, N = 17) (see Grigg & Manderson, 2014c) and a preliminary scale reliability analysis was performed. Cronbach‟s Alpha was very high (.94). Four items had low item-total correlations (< .20), but none were removed as all were considered important. The preliminary scale was suitable for children with a Grade 4 reading level (as per Gunning Fog and Flesch Kincaid Grade level indexes). The scale was pilot tested with eight children aged 9-12 years for review of item clarity and developmental appropriateness. Cognitive interviewing techniques (Willis, 2005) were utilised to ensure that young children could comprehend the intended meaning and appropriately respond to each question. Participants were asked to (1) read each question, (2) verbalise their thoughts, (3) explain what the question was asking, and (4) define any key terms or words. No items required removal, but some were re-worded. Study 2-4 Preliminary Data Analysis Each of the data sets (primary school, combined community, 15-20 years, and 21+ years) utilised in Study 2-4 were collated and cleaned in IBM SPSS Statistics 20. Data were screened for univariate outliers (nil), missing data were identified, and all cases with more than 5% missing data were removed. Regression analysis was utilised to deal with the remaining missing data. Exploratory analyses were performed to assess the distribution of the data. Although the Shapiro-Wilk test of normality indicated that each of the 40 measured variables across all data sets were significantly univariately skewed (p30 (Field, 2009; Games, 1984; Tabachnick & Fidell, 2007). The minimum amount of data for factor analysis was satisfied (Tabachnick & Fidell, 2007). Initial ratios of cases per variable were approximately 3:1 for primary school data set and 7:1 for overall community data set (4:1 for 15-20 years and 3:1 for 21+ years). These ratios improved as items were deleted from the preliminary scale (final ratios 8:1, 6:1, and 5:1 respectively). Study 2: Principal Components and Exploratory Factor Analyses This component aimed to identify the underlying latent structure of the measure according to Classical Test Theory (CTT). Materials and Methods Participants were 296 students enrolled in years five or six at six primary schools from different socioeconomic and cultural backgrounds in a growth corridor in the southeast of Melbourne, Australia. Each school participated as part of provision of a community-based anti-racism program in September 2012. Only 194 (66%) responses were usable, with 111 males and 82 females aged between 10-13 years (M=11.38, SD=0.73). Participants were demographically diverse: 25 (13%) born outside of Australia in addition to 54 (28%) mothers and 65 (34%) fathers. Results Principal Components Analysis. Data were examined using PCA to produce an initial empirical summary (Tabachnick & Fidell, 2007). Oblimin rotation was performed with the primary school data set to estimate the number of components, absence of multicollinearity, and factorability of the correlation matrices. Eleven components with initial Eigenvalues above one were extracted. None were internally consistent or well defined by the variables (highest Squared Multiple Correlation .24). Conversely, Communality values were adequate: the smallest was .53, above recommended minimum of .40 (Costello & Osborne,

130 2005). Sampling adequacy was acceptable: the Kaiser-Meyer-Olkin measure was .83, above recommended minimum of .60 (Tabachnick & Fidell, 2007), and Bartlett‟s Test of Sphericity was significant (p