PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia ISSN: 1983-9456 (Impressa) ISSN: 2317-0123 (On-line) Editor: Fauze Najib Mattar Sistema de avaliação: Triple Blind Review Idiomas: Português e Inglês Publicação: ABEP – Associação Brasileira de Empresas de Pesquisa
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales
Mensuração e Escalas de Verificação: uma Análise Comparativa das Escalas de Likert e Phrase Completion
Submission: Mar/1//2014 - Approval: June/24/2014
Severino Domingos da Silva Júnior Master and Bachelor of business administration from the Federal University of Paraiba-UFPB. Teacher in higher education Union of Campina Grande - UNESC. Member of the research groups and Cyberculture and GPCiber MEQAD-quantitative methods in business administration. E-mail:
[email protected] Professional address: Av. Estrela, nº 499 - Centro – 58306-190 - Cidade de Bayeux/PB – Brasil.
Francisco José Costa PhD in business administration from Fundação Getúlio Vargas – FGV-SP. Master and Bachelor's degree in business administration from State University of Ceará - UECE. Professor in the graduate program in business administration at the Federal University of Paraiba-UFPB. E-mail:
[email protected]
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
ABSTRACT This study analyses the use of Likert and Phrase Completion scales, as scale options for measuring multiple-item constructs in Marketing and Management research. Initially, a literature review was carried, what made possible the comparison of the nature and the potential of the two measurements alternatives. The construct intrinsic religiosity was used as a comparison reference, and, afterwards, a questionnaire was developed containing items from the two types of scales. Data collection generated a sample size of 229 respondents. Comparative proceedings of the paired corresponding items and of the aggregated measures were conducted, and the results demonstrated that the scales have differences between them regarding the verification of paired items; on the other hand, no significant differences were encountered through the analysis of the aggregated items. These findings indicate that the scale alternative is a researcher’s decision, which may take into account the kind of research and the sample’s profile. KEYWORDS: Mensuration, Likert scale, Phrase Completion scale. RESUMO Este estudo se propôs a analisar a utilização de escalas do tipo Likert e Phrase Completion como alternativas de escalas de verificação para mensuração de construtos de múltiplos itens em pesquisas de Marketing e de Administração. Inicialmente, foi realizada uma revisão teórica que contribuiu para a comparação de natureza e de potencialidade das duas alternativas de medição. Foi utilizado o construto religiosidade intrínseca como referência de comparação e, em seguida, foi desenvolvido um questionário contendo itens dos dois tipos de escalas, o qual foi aplicado em uma amostra de 229 respondentes. Foram realizados procedimentos comparativos dos pares de itens correspondentes e das medidas agregadas. Os resultados mostraram que as escalas possuem diferenças entre si na verificação dos pares de itens, entretanto, não houve diferenças significativas na análise dos itens agregados. Isso indica que a escolha da escala é uma decisão dos pesquisadores e estes poderão levar em conta o tipo de pesquisa e as características dos respondentes. PALAVRAS-CHAVE: Mensuração, escala Likert, escala Phrase Completion.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 2
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
1
INTRODUCTION
The measurement is one of the means by which are accessed and reported data to understand the facts and phenomena of interest. For this, the measurement is a matter present in all sciences and there are several studies on the subject published in majors journals. The more consistent development of quantitative empirical studies is only feasible due to advances in theory and measurement practices. In this way, the theory of social science and behavioral measurement, such as Administration, Psychology and Sociology, has developed in his studies to propose alternatives that can measure more adequately their variables, part of which are abstract content (for example, stress, satisfaction, loyalty etc.). The handling of data and measures of these Sciences originate information and knowledge that can be targeted both academic and professionals purposes, which signals the need for special care in the process of definition of the measures generated. Therefore it is indispensable to a well-founded set of procedures for the creation and use of scales to reach adequately the objectives of studies and professional demands. This research aims to contribute to such analyses, through the comparison of two types of scales used in Marketing Research and administration: Likert and Phrase Completion. Over the past six decades a large amount of quantitative studies developed in Marketing used the Likert scale in research instruments that measure constructs as attitudes, perceptions, interests etc. This scale is used to measure concordance of people to certain statements related to constructs of interest. However, as will be seen in the next item, this scale has been fairly criticized so that new alternatives were developed (COSTA, 2011). Probably the best alternative I suggested was the Phrase Completion scale, which measures the construct by inserting its intensity in his own utterance of scale, making it potentially the respondents understanding and measuring more reliably and valid what is being investigated (HODGE; GILLESPIE, 2007). The following are the theoretical fundaments of verification scales, the third item, the methodological procedures of the fieldwork carried out in the room are the results of the fieldwork, the fifth, and the final considerations of the study and, in the sixth and last item, recommendations for further studies. 2
THEORETICAL DISCUSSION
The measurement is defined as attribution of numeric symbols, preferably, to ownership of the objects that you want to measure. These symbols are directed to quantify or classify certain characteristics. Thus, the measurement is a process of representation, relating some aspect of the real world with symbolic systems (MARI, 1996, 1999; FINKELSTEIN, 2003, 2009). By this concept, it is understood that the measurement can be performed to capture the essence of the object measured, and aims to facilitate data manipulation of sets of subject or simply to facilitate better understanding of the attribute. For such uses is difficult to understand why the measurement is the basis of academic and professional development support of various sciences. Even though most studies developed on measurement is carried out by exact sciences (FINKELSTEIN, 2003), the social sciences have been achieving progress in this area (COSTA, 2011).
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 3
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
In spite of a lot of criticism and possible denials the possibility to measure certain variables (such as desire or pleasure, for example), the concept proposal, the problem is not in the Act of measuring, but the symbolic assignment mechanism adopted by the researcher, including that not every measure is directed to quantify. Even in the face of possible limitations, the measurement remains the viable mechanism for the development of empirical research associated with abstract constructs. Especially in the social and behavioral sciences, measurement of interest variables is accomplished through specific scales, which are constructed in way to adapt to the abstract nature of much of the constructs. Such measurement ranges are part of the basic instrumentation measurement, making formatting varied as Psychology tests, exams and proof of education or the various scales on Marketing and administration. The scales in the universe of social and behavioral sciences, as well as in all other sciences, are based on specific assumptions and developed for specific models. As Costa (2011), a scale of measurement is composed of a set of indicators, plus a range of verification and a set of rules: The indicators are the content elements that ensure the presence of the concept of the construct on the scale of measurement. Are examples of indicators the assertions about a particular construct, for which a subject will issue its agreement; the indicators are also extreme points of a behavior (for example, hate and love given tag). The verification scale involves the numbers that come attached to indicators for its measurement. For example, you can adopt concordance levels 1 to 5 or 1 to 7 numbers use between the options of hate and love a brand. The rules are the indications for use of the instrument in terms of its application and interpretation. For example, you can define which concordances between 4 and 5 indicate high level of assessment of the construct under analysis and that values between 1 and 2 indicate low level to assess the same construct. The main interest in this article is on analysis of the check scales. These have been case for controversy over the past six decades. For example, in a scale agreement, how many points are best suited for measuring effect? A measurement of 0 to 10 is more efficient than a measurement of 1 to 5? The use of concordance to measure a construct is a good option (for example, one wants to measure satisfaction, but the correlation is measured with the phrase I'm satisfied)? One of the central focuses of discussion is use or not, the so-called Likert scale, the most adopted in research. The next item is presented this scale and, in sub-item later, presented the alternative Phrase Completion scale. 2.1 LIKERT SCALE The most widely used model and debated among researchers was developed by Rensis Likert (1932) to gauge attitudes in the context of behavioral sciences. The scale Likert verification consists of taking a construct and develop a set of claims related to its definition, for which the respondents shall issue their degree of agreement. Chart 1 shows an example of this scale for measuring satisfaction with a service in 5 points.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 4
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
CHART 1 Example of the Likert scale. I AM SATISFIED WITH THE SERVICE RECEIVED: Totally Partially I do not agree nor disagree disagree disagree 1 2 3
I agree partially 4
Totally agree 5
On this scale the respondents self positioning in according to a measure of agreement assigned to the item, and in accordance with this statement, if infers the measure of the construct. Constructs such as self-esteem, depression, ethnocentrism, religion and racism are some examples recurrently measured by Likert scales. The original scale was proposed to be applied with five points, ranging from total disagreement until the total agreement. However, currently there are called Likert-type models with variations in scoring, at the discretion of the browser. The great advantage of the Likert scale is its ease of handling because it is easy for a researched issue a degree of agreement on a statement of some kind. Additionally, the confirmation of psychometric consistency in metrics that used this scale contributed positively to its application in various searches (COSTA, 2011). However, despite the positive points, the Likert scale has significant difficulties (CUMMINS; GULLONE, 2000; RABBIT; EDWARDS, 2007; DAWES, 2008). According to the critics, questions with the respondent's request Likert model at least two dimensions to be examined: content and intensity. The individual needs to verify the contents of the proposition of the item, and then give an opinion disagreeing or agreeing with the statement, whereas the intensity of this agreement. Although there appears no problem for the purpose of use, critics claim that this feature greatly increases cognitive complexity level of the scale, especially when the scale has many points (HODGE; GILLESPIE, 2003). There is also a problem in the use of these scales related to the designation of points (for example, easy styling 3 points, with 1 - disagree, 2 – not agree nor disagree, and 3 – agree; however, styling 10 points on a scale of 1 to 10 is much more complicated). The adequate selection of the number of points is complicated. Empirical studies show that, in multiple scales items with reflective measurement in relation to the construct, the reliability is best in scales whose items are measured with more than 7 points, and decreases when the items have less than 5 points. However, fewer points seem to make easier the answers so that, by increasing the number of points gained in psychometric and loses consistency in safety in responses. Obviously, this is a weakness of this scale. An option that potentially solves this difficulty is to anchor the extreme levels of agreement in the numerical limits and leave the remaining points as intermediate levels of agreement. This allows you to even use a larger number of points such as 1 to 10 or 0 to 10, whose levels of knowledge are more widespread in countries like Brazil. There is also difficulty of decision about the odd or even number of points. The current indication is that a scale with odd number of points makes it easy to answer because of the halfway point, it would be a neutral level between assent and dissent. But the designation of neutral at this point has a central problem of the conceptual point of view, as in a scale agreement, who is not neutral is any
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 5
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
agreement and that number is a central point of agreement (COSTA, 2011; HODGE; GILLESPIE, 2003). Another difficulty encountered is the possible loss of information associated with the Likert scale (RUSSELL; J. BOBKO, 1992), whereas the definition of points in a range of integers (from 1 to 5 or 1 to 10, for example), makes the answers necessarily discreet, being intermediate between these references lost points (for example, an agreement between 3 and 4 do not have how to be measured by a particular subject). Even with appropriate statistical procedures for operationalizing of discrete variables these problems persist due to the violation of assumptions of parametric approaches conventionally applied (as confirmatory factor analysis that assumes that the variables are continuous and normally distributed). Possibly, the loss of information resulting from the restrictions of data on Likert scale would be resolved in two ways: the first consists in measuring in most points; the second to increase the number of scan items by means of multiple scales items, because, in their aggregate, the final variable is potentially a much greater number of points, approaching the condition of continuous variable. Finally, we highlight the Rossiter (2002) critic that, in their model C-OAR SE, argues that the use of adverbs (totally disagree, disagree a lot etc.) hinders even more the responder's positioning. In fact, in a four-point scale (for example, with 1- strongly disagree; 2 – partially disagree; 3 – i agree partially; 4 – totally agree), it is very clear the distinction between an answer like 2 or 3, after all the partial disagreement seems to be equivalent to the partial agreement. 2.2 PHRASE COMPLETION SCALES Given the difficulties associated with Likert scale, new scales were developed in order to measure the constructs, among them, the Phrase Completion scales. This scale was developed by Hodge and Gillespie (2003) just as an alternative to resolve the difficulties of verification scale Likert. Hodge and Gillespie (2003) proposed a standard range scale of 11 points, to 0 to 10 in the sequence of integers in which 0 has association with missing attribute, while the 10 relates to the maximum intensity of his presence. Chart 2 presents an example of the Phrase Completion scales, with the same Framework 1 variable (satisfaction with the service). CHART 2 Example da Phrase Completion scale. MY LEVEL OF SATISFACTION WITH THE SERVICE WAS: TOO SMAL MODERATE 0 1 2 3 4 5 6
7
8
TOO BIG 9
10
According to the authors, the fact of this scale have 11 points (from 0 to 10), facilitates the interpretation on the part of the respondent, since, in General, people are familiar with this reference (in educational assessments, for example). Therefore, the potential difficulty of response associated with the number of points dissipates. Additionally, the highest number of points potentially improves the reliability and validity of the scale, without owning the conventional problems associated with a few points.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 6
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
As indicated, the Likert scale is an indirect approach to measuring a construct with the transfer of its direct measurement to a measurement of the concordance with a statement associated with the construct. There is, therefore, an intermediate stage of measurement, which makes possible a fragility that once measurement practice that should occur and information loss between the content you want to measure and its transformation into an affirmation, then assess the concordance (COSTA, 2011). In this way, the Phrase Completion scale demand outdo even this difficulty, seeking to measure the intensity of a given construct directly on own scale. Is this possibility of application that comes the name of this scale. The example in Chart 2, shows clearly this characteristic, because the responder indicates the level of satisfaction by completing the sentence within the range of 11 points, with three points of reference. Of course, this advantage becomes a disadvantage if the research involved small samples, with quantitative implementation based on frequencies (so that it creates a dispersion without much sense in response options and makes it possible for points stay with small or zero frequency). According to studies designed to test the psychometric stability of Phrase Completion scale compared with the Likert scale in constructs measured by multiple reflective items, Hodge and Gillespie (2003, 2007) showed that the Phrase Completion scales resulted in better reliability and better factorial consistency. This suggests that, in addition to more intuitive and logically constructed, Phrase Completion scales would have still the advantages from the point of view of statistical operation. Of course, only two studies are not sufficient for a statement of this nature, requiring new efforts to better positioning definition. However, some research developed using the Phrase Completion scales have a serious limitation related to occupied space in research instruments. In fact, studies that use Likert scales are usually a series of assertions near the verification scale, thus occupying less space in questionnaires. Phrase Completion scales already do not have this possibility, so that each check item demand at least three lines. In academic research, for example, in which the resource for motivation of people to answer questionnaires is usually very small, an extensive questionnaire is a potential generator of errors of measurement. Otherwise, in some market research, in which the measurement can be made with only one item and which normally evaluates fewer constructs simultaneously, it ceases to be a problem. Thus, if it is maintained the validity of content and there is consistency in the measurements between the two ranges, academic researchers and market can make use of the mode that is most appropriate in terms of space and ease of reply. Following the same idea of Hodge and Gillespie (2003, 2007) in compare the measures with the two scales, here was developed an empirical study, whose methodological details are shown below. 3
FIELD PROCEDURES
As indicated in the introduction, the present study aims to verify the similarities and differences in the results of verification of Likert scales and Phrase Completion. To do this, we selected five items of intrinsic religiosity construct measurement, the same way they were used in the study by Hodge and Gillespie (2003), with some adaptations of meanings due to variation in translation (although interest is analyzing research constructs in Marketing and administration, it was decided to use the same construct that the authors cited, with a view to the possibility of comparative procedures; the
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 7
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
results would not be probably very distinctive by using other constructs; the items are set out in item 4). Construct measurement items of reference (intrinsic religiosity) kept a reflective perspective in relation to latent construct. The decision was that these items would be presented in the format statement for the response Likert scale (in 11 points, of 0 to 10). Then, the items were adapted for model scale response Phrase Completion, with separation by a set of other questions to minimize risks associated with the halo effect. The collection was performed with the application of questionnaires in spaces of agglomeration of two capitals of the Northeast Region of Brazil, such as businesses, churches, shopping malls and cultural events held. The questionnaire contained, in addition to the two scales, questions about the profile of the respondents (religion, sex, marital status, age and income). So, data from 241 respondents were collected and transferred to the software SPSS. Of these, 12 questionnaires were invalidated, by present inconsistencies like items with no reply and insufficient information for the study. Thus, the analysis of the result was accomplished with 229 valid questionnaires. The individuals surveyed showed a relatively balanced gender distribution (54.6% women and 45.4% men), most claimed Catholic religion (76.9%), marital status single predominant (71.2%), ages between 21 and 30 years (37.7%) and household income approximate to $ 2,000 (40.5%). According to this information, it turns good sample heterogeneity, characteristic that assures good conditions for data analysis. Data analysis was performed with descriptive procedures and various parametric and nonparametric tests. So, initially were proceed the tests of normality of variables; then was undertaken measures extraction average, median and standard deviation of all variables (in general measures have been evaluated yet the asymmetry and kurtosis). Procedures were used for the comparison of measurements of the variables corresponding to each of the two scales and, specifically in the aggregate variables, comparison procedures were applied and Bivariate Association. At the end of each procedures, were analyzed differences and similarities between the metrics from each type of scale (Likert and Phrase Completion). All procedures were performed in the software SPSS 18 and based on specialized literature (CONOVER, 1999; HAIR et al. 2009; LATTIN; CARROLL; GREEN, 2011). 4
RESULTS
Followings are the results from the field study. In item 4.1 descriptive results are found; in item 4.2 is made the verification of psychometric structure; item 4.3 ends with the exposure of additional procedures of analysis. 4.1 DESCRIPTIVE RESULTS First, the set of ten items was subjected to an analysis of normality to verify the hypothesis that the data of each item could be from a variable with a normal distribution. The nonparametric test of Kolmogorov-Smirnov (KS test) was used and it was found that the null hypothesis testing (which assumes that the sample was extracted from a normally distributed population) has been refuted (α p < 0.001) for all items, so that no one should assume, for purposes of analysis and some applied
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 8
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
techniques, there is normalcy in the data source. In sequence was extracted the descriptive measures of each variable: average, median and standard deviation. Table 1 presents the results by matching pairs of variables in each scale of verification. Measurements are discrepant according to peers, but in general, it was found that, in the first three pairs, the Phrase Completion range averages were higher than the averages of Likert scales checked by, however the last two pairs, measured by Likert averages were higher than Phrase Completion. In four of the five pairs the median values followed the same behavior. Dispersions also presented similar variations between the pairs of items (in a few pairs was greater on Likert scale and, in others, was smaller in scale Phrase Completion). TABLE 1 Descriptive measures. PAIRS VARIÁBLES My religious belief affects my life in (to ' none ' to ' aspect all aspects '). My whole Outlook on life is based on my religion. I am aware of the presence of God (from ' never ' to ' all Second the time '). I've always had a strong sense of the presence of God. My religion in relation to my life is (from ' a reason that Third guides me ' to ' the greatest of my life '). I try to faithfully live according to my religious beliefs. I read about my religion (from ' never ' to ' always '). Fourth I like to read about my religion. The time that I dedicate in religious moments and meditation happens (from ' never ' to ' daily, without fail). Fifth For me, it's important to have a time dedicated to my thoughts and prayers. Source: Survey data. First
AVERAGE MEDIAN
STANDARD DEVIATION
6,94
8,00
3,09
6,07
7,00
2,89
9,13
10,00
1,36
8,81
9,00
1,58
7,61
8,00
2,23
6,76 6,15 6,89
7,00 6,00 7,00
2,80 2,73 2,74
6,63
7,00
2,41
8,24
9,00
2,04
In order to evaluate the consistency of these statistical differences were adopted comparison tests of averages and medians. The averages of each pair of items were compared using t-test, which is based on the Student's t-statistic and significance associated with the null hypothesis of equal means in the source population of the sample paired. Thus, t high values and significance less than 0.05 indicate that the null hypothesis should be rejected. The limitation of this test comes from his character and parametric assumption of normality of variables that gave rise to two paired samples. For this reason, the non-parametric test of Wilcoxon for matched data is a suitable alternative, which applies particularly to the median. This test and according to the algorithm implemented in SPSS, the null hypothesis is that the median of the differences between the two random variables that gave rise to the sample is equal to 0; Here, with high values of W and significance less than 0.05, refutes the null hypothesis. The results indicated in Table 2, it is possible to observe that, in both tests, there is no evidence of equality in the variables that gave rise to the sample data, i.e. in all pairs of variables there was significant difference between extractions with the Phrase Completion and the Likert scale. This suggests that, for this first perspective (which is only exploratory,) both scales are not replaceable for fact-finding measures effect of position.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 9
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
TABLE 2 Tests of averages and medians TEST T STATÍSTICS P-VALUE 4,064 0,000 3,700 0,000 5,856 0,000 -5,611 0,000 -12,552 0,000
PAIRS First Second Third Fourth Fifth Source: Survey data.
TEST WILCOXON STATÍSTICS P-VALUE -4,428 0,000 -3,853 0,000 -5,789 0,000 -5,463 0,000 -9,994 0,000
Otherwise, as indicated in Table 1, there seems to be tradeoffs between the items, and believe that, in a joint verification, specific differences between items are compensated. With this understanding, the items were then aggregated as follows: on each scale average scores were extracted from each respondent (adding up the scores of each scale and dividing them by 5). Following the same procedure of separate extraction, was initially checked the normality of the sample source, having been observed, by testing the hypothesis that KS normality is again refuted for both variables, although levels of significance ever closer to the cutting point of normality (0.026 in scale Phrase Completion and 0.014 in Likert scale). After this procedure, were again extracted averages, medians and standard deviations of each variable, in addition to the measures of skewness and kurtosis (to explore in addition the normality). The values that are listed in Table 3 have confirmed the expectation indicated that the averages and medians would be much closer (in the case of these medians were equal in the rounding to two decimal places). The standard deviation values were also close, which indicates again a convergence of aggregate results. As for the kurtosis and asymmetry, it was observed that, by conventional criteria of extractions of SPSS, the values in the variables of normality evidence signals of sample origin, although the KS test has been opposed to this evidence. TABLE 3 General descriptive measures and tests – First aggregation. DESCRIPTIVE MEASURES GENERAL MEASURES
General measure on the scale Phrase Completion General measure in the range of Likert
TESTS
AVERAGE
MEDIAN
STANDARD DEVIATION
7,29
7,60
1,77
-0,724
0,225
7,35
7,60
1,88
-0,581
-0,504
ASYMMETRY
KURTOSIS
t
WILCOXON
t = - 0,814; z = - 0,353; p = 0,417 p = 0,724
Source: Survey data.
Also here were applied both tests for the comparison of variables and both the t-test (now already more secure by the proximity of normality of variables), as the Wilcoxon test showed that there was no evidence of difference between the two variables that gave rise to the sample. The signaling is, therefore, in terms of descriptive measures of position aggregate variables converge in results. In additional exploration, was extracted the correlation between the two variables, having obtained very high values for both: the parametric correlation coefficient of Pearson (0.813) and nonparametric Spearman's coefficient (0.805). This indicates that the two measures have high degree of joint variation, including linear variation, what is expected in case of substitution of a measure by another (as shown by the correlation coefficient of Pearson).
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 10
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
In summary, these results indicate that, in the construct under analysis, when the evaluation of each particular item, is not adequate to replace an item measured on a scale by its correspondent in another scale, because the position and dispersion measures vary. Otherwise, in an aggregation of the items set, these differences nullify and then the aggregate variables have a good overlap, and we will be possible to take one or another for purposes of description of the sample. 4.2 PSYCHOMETRIC STRUCTURE In this step have been valuated psychometric characteristics of the scale by the conventional tools of implementation in items sets that are reflective indicators in relation to the latent construct. As indicated in the conventional theory, procedures are (COSTA, 2011): the factorial consistency checking (with extraction by the method of principal components), which applies the variance extracted (desirable that is larger than 0.5) and by factorial scores (desirable that are larger than 0.4); and internal consistency, indicative of reliability, which checks the coefficient alpha of Cronbach's alpha (desirable that is larger than 0.6). Table 4 presents the results of factor extraction. These results may show that, regardless of the scale, check each set of variables remained in one factor, and in both, the variances extracted were above the reference value of 0.5, including next to each other (0.556 on items measured with the scale Phrase Completion, and measured with items 0.589 Likert scale). In relation to factorial scores, it was verified that, in both extractions, minimum scores were of 0.570 (to the Likert scale) and 0.606 (for the Phrase Completion scale), values above the reference point. For these results, it is possible to ensure that the items of each verification scale constitute an only factor with good factorial consistency. TABLE 4 Factor extraction results for the two scales used. SCALE FACTOR OF VERIFICATION PHRASE COMPLETION; EXTRACTED VARIANCE= 0,556 VARIABLES
My religious belief affects my life in (to ' none ' to ' aspect all aspects '). I am aware of the presence of God (from ' never ' to ' all the time '). My religion in relation to my life is (from ' a reason that my life guide ' to ' the greatest of my life '). I read about my religion (from ' never ' to ' always '). The time that I dedicate in religious moments and meditation happens (from ' never ' to ' daily, without fail '). VERIFICATION SCALE FACTOR WITH LIKERT SCALE; EXTRACTED VARIANCE= 0,589 VARIABLES
My whole Outlook on life is based on my religion. I've always had a strong sense of the presence of God. I try to faithfully live according to my religious beliefs. I like to read about my religion. For me, it's important to have a time dedicated to my thoughts and prayers. Source: Survey data.
ESCORE
0,608 0,570 0,843 0,809 0,852 ESCORE
0,777 0,606 0,838 0,808 0,787
On internal consistency checking for indication of reliability, it was observed that, in the case of items that had scan with the Phrase Completion scales, the alpha of Cronbach's alpha was 0.780. In the case of the items measured with Likert scale the alpha was 0.820. Here again the two values were above the reference value (0.6) and very close to each other. The sign of these results is that reliability does not suffer very significant change as a result of the verification scale, i.e. remain well controlled random errors associated with the measurement process in their respective items
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 11
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
sets. This result is contrary to what was achieved by Hodge and Gillespie (2003), there the items in the template Phrase Completion presented best fit, with greaters validity and reliability and lower amount of measurement errors. However, it should be noted that the research of these authors was performed only with graduate students, which is a sample with obvious peculiarities. Notably, with a more heterogeneous sample, the results make clear that, from the point of view of psychometric structure, at least in the methods of inquiry here adopted, there is no problem in using one or other of the two scales of verification. 4.3 ADDITIONAL PROCEDURES A first additional procedure consisted of extracting a new aggregate measure of variables, the weighting of scores of respondents for each item by the factorial of the score variable (second extraction effect), in each sample unit. The results, which are shown in Table 5, generated values very close to the previous measures (listed in Table 3) in terms of the five measures used (average, median, standard deviation, skewness and kurtosis). The tests of difference between variables (t-test and Wilcoxon test) also ensured that the two variables are not distinct from one another, reaffirming the previous result. The test of normality, the verification was different now, indicated normal aggregated using variable measurement by Phrase Completion, although in a statement almost at the limit of negation of the null hypothesis (p = 0.052). Finally, in assessing the correlation, the previous result was reaffirmed, and noted a high correlation by both the Pearson coefficient (0.825) and of Spearman (0.816). TABLE 5 General descriptive measures and tests – the second aggregation. GENERAL MEASURES
General measure on the scale Phrase Completion General measure in the range of Likert
DESCRIPTIVE MEASURES STANDARD ASYMMETRY DEVIATION
AVERAGE
MEDIAN
7,19
7,48
1,83
7,27
7,55
1,94
TESTS KURTOSIS
T
WILCOXON
-0,731
0,268
t = - 1,203;
z = - 0,795;
-0,604
-0,453
p = 0,230
p = 0,427
Source: survey data.
These results reaffirm what has been indicated above, i.e. in empirical research there is no major variations of results by the use of one or another method of measurement for the purpose of extraction of descriptive measures of aggregate variables. A second additional relevant verification was made by analyzing variations by gender (highlighted here for being the categorical variable which divided the sample into two nearly equal parts). First, aggregate variables (the latter method), were analyzed by analysis of variance and Mann-Whitney U test (nonparametric), to verify that the behavior of the variables was the same in the two forms of inquiry. The results are given in Table 6 and it is possible to observe in the two tests, there is significant difference (α p < 0.05) in variable intensity by gender, regardless of scale. This is more an indication that the behavior of variables follows parallel in each type of scale.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 12
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
TABLE 6 Comparisons by genre. VARIABLES AND TESTS
GENUS
NUMBER
MÉDIA
MEDIANA
General measure on the scale Phrase Completion Anova – F = 13,655, p < 0,001 U Mann-Whitney – Z = - 3,018, p < 0,001 General measure in the range of Likert Anova – F = 6,566, p < 0,05 U Mann-Whitney – Z = - 2,370, p < 0,05 Fonte: Dados da pesquisa.
Male Female Total Male Female Total
104 125 229 104 125 229
6,71 7,58 7,19 6,92 7,57 7,28
6,88 7,81 7,48 6,94 7,87 7,55
DESVIOPADRÃO 2,08 1,49 1,83 2,08 1,77 1,94
After a separation by split-half of SPSS, the t tests and Wilcoxon test for comparing variables only within each group, showed again that there are no differences between the two variables in the analysis group of men restricted (test t: t = 1.747, p = 0.084; Wilcoxon: Z = -1.135, p = 0.257) and the Group of women (test t: t = 0.136, p = 0.892; Wilcoxon: Z = -0.012, p = 0.991). This shows more strongly that the behavior in the two measurements is convergent. 5
CONCLUSIONS AND FINAL THOUGHTS
This research compared the alternatives of measurement of constructs in Marketing and administration, from items measured with scales of Likert and checking Phrase Completion. As reported, the Likert scale is widely used in the social sciences, but has characteristics that put their efficiency in doubt and, among the alternatives proposed, the Phrase Completion scales seems to be one of the best options. According to data from this research conducted with 229 respondents, relatively small differences were observed between the aggregate scale, although they were statistically significant differences in individual comparisons of pairs of items. This result certify that the items, when they are analyzed separately, may show some variations between the measures in the two scales, which puts in doubt, initially, the possibility of one scale be used to replace the other. Therefore, by the nature of results and uncertainty of measurement of constructs with multiple reflective items in relation to construct, it is not possible to tell what would be the best of the two scales, except for content validation and face of the preparatory steps of the procedure of measurement (COSTA, 2011). Otherwise, when the aggregate data were analyzed, the measurements of the two ranges converge on all checks made, either in comparisons of descriptive measures, in the conventional psychometric consistency indications in general association or segmented by gender. This shows that, for reviews of multiple items with aggregation, the two ranges have practically the same results. Thus, the choice of one or another scale is a researcher's decision, which will make it taking into account the peculiarities of public or research interest. Thus, the Likert scale may continue to be used in academic marketing and administration researches, even possessing a indirect measurement construct examined, because this type of scale is more suitable for long instruments and have an easier time adapting to a larger number of constructs. In short, as reaffirmed in the study, this verification range has good psychometric properties, easy organization and has an operational advantage as regards the structure of the research instrument.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 13
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
However, to the professional and decisive market research it is suggested that instruments are applied in scale Phrase Completion because of the item to measure directly the construct examined and thus meet the objectives commonly offered by market research. In fact, the Phrase Completion range is more logical and intuitive, though larger instruments space occupies application. In those terms, and considering that the market research instruments typically have more direct, you can understand that this scale has a greater fitness and does not lose in terms of psychometric consistency of validity and reliability. 6
RESEARCH LIMITATIONS AND SUGGESTIONS FOR FURTHER RESEARCH
Although the data has been assessed carefully and statistical rigor, there were some weaknesses in the study regarding the design of the research, both in terms of access and sample size. It is therefore recommended that other studies contribute to the sampling procedure to refine the comparisons. Instead of the questionnaire were applied with separation of measurement items with Likert scale and Phrase Completion scale, by through other issues of categorical variables, it is possible that the similarity in content has influenced the answers, by the halo effect. This may have influenced in the convergence of psychometric structure observed. For this reason, it is recommended to conduct other studies that seek to adopt procedures for comparison to resolve doubts about this kind of effect. The construct used also has characteristics that may explain the results, being possible to assume variations in other constructs. For the purposes of the study here developed, there are no problems in terms of result, because the contents of the construct has not been object of analysis, saved from the perspective of content validation of the scale. Still, it is recommended that other similar studies will be make with other types of constructs. 7
REFERENCES
COELHO, P. S.; ESTEVES, S. P. The choice between a five-point and a ten-point scale in the framework of customer satisfaction measurement. International Journal of Market Research, 49 (3), p. 313-339, 2007. CONOVER, W. J. Practical nonparametric statistics. 3. ed. New York: John Wiley, 1999. COSTA, F. J. Mensuração e desenvolvimento de escalas: aplicações em administração. Rio de Janeiro: Ciência Moderna, 2011. CUMMINS, R. A.; GULLONE, E. Why we should not use 5-point Likert scales: the case for subjective quality of life measurement. International Conference on Quality of Life in Cities, 2. Singapore. Proceedings… Singapore: National University of Singapore, 2000. DAWES, J. Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50 (1), p. 61-77, 2008. FINKELSTEIN, L. Widely-defined measurement: An analysis of challenges. Measurement, 42, p.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 14
Measurement and Verification Scales: a Comparative Analysis between the Likert and Phrase Completion Scales Severino Domingos da Silva Júnior/ Francisco José Costa
1270-1277, 2009. FINKELSTEIN, L. Widely, strongly and weakly defined measurement. Measurement, 34, p. 39-48, 2003. HAIR JR. J.; BLACK, W. C.; BABIN, B. J.; ANDERSON, R. E.; TATHAM, R. L. Análise multivariada de dados. 6. ed. Porto Alegre: Bookman, 2009. HODGE, D. R.; GILLESPIE, D. F. Phrase completion scales: a better measurement approach than Likert scales? Journal of Social Service Research, 33 (4), p. 1-12, 2007. HODGE, D. R.; GILLESPIE, D. F. Phrase completion: an alternative to Likert scales. Social Work Research, 27 (1), p. 45-55, 2003. LATTIN, J.; CARROL, J. D.; GREEN, P. E. Análise de dados multivariados. São Paulo: Cengage Learning, 2011. LIKERT, R. A technique for the measurement of attitudes. Archives in Psychology, 140, p. 1-55, 1932. MARI, L. Notes towards a qualitative analysis of information in measurement results. Measurement, 25 (3), p. 183-192, 1999. MARI, L. The meaning of “quantity” in measurement. Measurement, 17 (2), p. 127-138, 1996. ROSSITER, J. R. The C-OAR-SE procedure for scale development in marketing. International Journal of Research in Marketing, 19 (4), p. 305-335, 2002. RUSSELL, C. J.; BOBKO, P. Moderated regression analysis and Likert scales too coarse for comfort. Journal of Applied Psychology, 77 (3), p. 336-342, 1992.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 15, p. 1-15, outubro, 2014 - www.revistapmkt.com.br 15