Reconstructing One-Mode Three-way Asymmetric ...

3 downloads 179 Views 676KB Size Report
and Optimization, Studies in Classification, Data Analysis, and Knowledge Organization, ..... and evaluation of effectiveness of search engine optimization. It is thought that ... University for the great support and advice in analyzing data.
Reconstructing One-Mode Three-way Asymmetric Data for Multidimensional Scaling Atsuho Nakayama and Akinori Okada

Abstract Some models have been proposed to analyze one-mode three-way data [e.g. De Rooij and Gower (J Classification 20:181–220, 2003), De Rooij and Heiser (Br J Math Stat Psychol 53:99–119, 2000)]. These models usually assume triadic symmetric relationships. Therefore, it is general to transform asymmetric data into symmetric proximity data when one-mode three-way asymmetric proximity data are analyzed using multidimensional scaling. However, valuable information among objects is lost by symmetrizing asymmetric proximity data. It is necessary to devise this transformation so that valuable information among objects is not lost. In one-mode two-way asymmetric data, a method that the overall sum of the rows and columns are equal was proposed by Harshman et al. (Market Sci 1:205–242, 1982). Their method is effective to analyze the data that have differences among the overall sum of the rows and columns caused by external factors. Therefore, the present study proposes a method that extends (Harshman et al., Market Sci 1:205–242, 1982) method to one-mode three-way asymmetric proximity data. The proposed method reconstructs one-mode three-way asymmetric data so that the overall sum of the rows, columns and depths is made equal.

A. Nakayama () Graduate School of Social Sciences, Tokyo Metropolitan University, 1-1 Minami-Ohsawa, Hachioji-shi, Tokyo 192-0397, Japan e-mail: [email protected] A. Okada Graduate School of Management and Information Sciences, Tama University, 4-4-1 Hijirigaoka, Tama-shi Tokyo 206-0022, Japan e-mail: [email protected] W. Gaul et al. (eds.), Challenges at the Interface of Data Analysis, Computer Science, and Optimization, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-642-24466-7 14, © Springer-Verlag Berlin Heidelberg 2012

133

134

A. Nakayama and A. Okada

1 Introduction In multidimensional scaling (MDS), the prototypical data matrix is square and symmetric one-mode two-way data, which represents a set of empirically obtained similarities or other kinds of proximities between pairs of objects. The relationships among such objects are shown as spatial representation, where pairs of objects perceived to be highly similar locate closely to each other in a multidimensional space of relatively low dimensionality such as two or three dimensions. However, there is a need for a model that is capable of analyzing asymmetric proximities; the cell entry ıij ¤ ıj i for i; j D 1; : : : ; nI i ¤ j . Asymmetric relationships among objects are common phenomena in marketing research, consumer studies, and so on. For example, asymmetric relationships are the probability of a switching to brand j , given that brand i was bought on the last purchase or consumers’ preference to brand j , given that consumers have already chosen item i . There are some approaches to analyzing asymmetric proximities. Some procedures were used (including additive or multiplicative adjustments of rows and columns) to symmetrize asymmetric proximities. The easiest way was to average the conjugate entries, ıij and ıj i . In another approach, asymmetric proximities were reconstructed before analysis. For examples, there exist the reconstructing method to remove the extraneous size differences (Harshman et al. 1982) or that on the basis of entropy (Yokoyama and Okada 2006, 2007). Otherwise, rows and columns were treated as separate points by an unfolding-type of MDS analysis or asymmetric proximities were directly analyzed by some asymmetric models such as Okada and Imaizumi (1987), Saburi and Chino (2008), and Zielman and Heiser (1993). In one-mode two-way asymmetric data, Harshman et al. (1982) proposed a method making the overall sum of the rows and columns equal over all objects. Their method reconstructs the data ıij so that the sum of each row and column excluding the diagonal element become fixed as follows; ıi: D

1 X .ıij C ıj i / D k: 2n

(1)

The value of k is usually set at the grand mean of the unadjusted data. An iterative procedure successively adjusts the row and columns of the data until the mean entry for each segment is equal. Their method is effective to analyze the data that have differences among the overall sum of the rows and columns depending on external factors. So, it is possible to show the structural factors of interest clearly. Therefore, the present study proposes a method that extends Harshman et al. (1982)’s method to one-mode three-way asymmetric proximity data. The proposed method reconstructs one-mode three-way asymmetric data so that the overall sum of the rows, columns and depths is equal over all objects.

Reconstructing One-Mode Three-way Asymmetric Data for Multidimensional Scaling

135

2 The Method Research interests have been increasing in models with triadic relationships among objects. MDS has often been used to analyze two-way proximity data and to obtain spatial representations for these data defined as the distance between two objects. However, the representations obtained from these analyses have not been sufficiently precise to explain the high-level phenomena underlying the data. Therefore, a need exists for a model that is capable of representing relationships among three or more objects. Some models have been proposed to analyze onemode three-way proximity data among three objects (e.g. De Rooij and Gower (2003); De Rooij and Heiser (2000)). These models usually assume triadic symmetric relationships except for De Rooij and Heiser (2000). When one-mode three-way asymmetric proximity data ıij k are analyzed using MDS, it is usual to transform one-mode three-way asymmetric proximity data ıij k among three objects i , j , and k into symmetric data ıij0 k as follows; ıij0 k D .ıij k C ıi kj C ıj i k C ıj ki C ıkij C ıkj i /=6:

(2)

Symmetrized asymmetric proximities are not able to reflect asymmetric information. Valuable information among objects is lost by symmetrizing asymmetric proximity data. It is necessary to devise this transformation so that valuable information among objects is not lost. The present study proposes a method that extends Harshman et al. (1982)’s method to one-mode three-way asymmetric proximity data. The method reconstructs asymmetric one-mode two-way proximity data so that the sum of each row and column except the diagonal element may become equalized. The method that we propose reconstructs asymmetric one-mode threeway proximity data so that the sum of each row, column, and depth are equalized. The proposed method adds the condition of depth to Harshman et al. (1982)’s method. The method reconstructs asymmetry one-mode three-way proximity data as follows; ıi:: D

1 XX .ıij k C ıi kj C ıj i k C ıj ki C ıkij C ıkj i / 6n

(3)

for each i . Diagonal elements are included, so it is thought that they have valuable information among objects. The method that we propose reconstructs one-mode three-way asymmetric proximity data so that the sum of each row, column, and depth is equlized by multiplying row i , column i , depth i by constant ci . Our proposed method iteratively finds constant ci by the quasi-Newton method under the following conditions; X .ci c1 c1 ıi11 C    C ci c1 cn ıi1n C c1 ci c1 ı1i1 C : : : i

Ccn ci c1 ıni1 C c1 c1 ci ı11i C    C c1 cn ci ı1ni /

136

A. Nakayama and A. Okada

D

X .ci c2 c1 ıi 21 C    C ci c2 cn ıi 2n C c1 ci c2 ı1i 2 C : : : i

Ccn ci c2 ıni 2 C c2 c1 ci ı21i C    C c2 cn ci ı2ni / X D .ci cn c1 ıi n1 C    C ci cn cn ıi nn C c1 ci cn ı1i n C : : : i

Ccn ci cn ıni n C cn c1 ci ın1i C    C cn cn ci ınni / X X 1 D .ıij1 C    C ıij n C ı1ij C    C ınij C ıj1i C    C ıj ni /: n i j

(4)

The difference between the sum of each row, column, and depth which multiplied row i , column i , and depth i by constant ci and the grand mean of the unadjusted one-mode three-way asymmetric proximity data may be minimized iteratively to find constant ci which is able to equalize the sum of each row, column, and depth.

3 An Application We applied the proposed method to consecutive Swedish election data in 1964, 1968, and 1970. As an illustration, we look at the data set obtained from Upton (1978). There are four political parties, Social Democrats (SD), the Center party (C), the People’s party (P), and the Conservatives (Con). This ordering is from left-to right-wing parties. The data gives the frequency of 64 possible sequences between these four parties at the three time points. One-mode three-way asymmetric proximity data was calculated from the Swedish election data. The one-mode threeway asymmetric proximity data were reconstructed by our proposed method. For comparison with the results of reconstructed data, we also use the one-mode threeway asymmetric proximity data without reconstructing data. The reconstructed and non-reconstructed asymmetric proximity data were symmetrizied by using (3). These symmetrizing proximity data were analyzed by a generalized Euclidean distance model. In a generalized Euclidean distance model, the triadic distances dij k are defined as follows: dij k D .dij2 C dj2k C di2k /1=2 ;

(5)

where dij is Euclidean distance between points i and j representing objects i and j . The disparity of the triadic distance dOij k satisfies the following conditions; mij k < mrst ) dOij k < dOrst for all i < j < k; r < s < t;

(6)

where mij k shows the reconstructed one-mode three-way symmetrized proximity data. We use non-metric MDS model, so the results are not changed by the magnitude of coefficient c.

Reconstructing One-Mode Three-way Asymmetric Data for Multidimensional Scaling

137

These analyses were done by using the maximum dimensionalities of nine through five and the minimum dimensionality of one. The smallest stress value in each dimensional space was chosen as the minimized stress value in that dimensional space. The two-dimensional configuration is now discussed in the present analysis. Two-dimensional configuration helps easy understanding of the relationships among the objects. The stress value in two-dimensional space obtained from reconstructed data was 0.226, and the stress value in two-dimensional space obtained from non-reconstructed data was 0.300. Figure 1 is the two-dimensional configuration obtained from the analysis of reconstructed transition frequency data by the proposed method. Figure 2 is the two-dimensional configuration obtained from the analysis of symmetrized transition frequency data without reconstructing data. In Fig. 1, the positions of parties are based on their characteristics. Left- to right-wing parties locate clockwise. The relationships among parties are able to be expressed pretty well. However, in Fig. 2,

Dimension 2

1.5

Con ×

1.0

0.5

0.0 –1.5

–1.0

–0.5

×

P

Fig. 1 The two-dimensional configuration obtained from the analysis of reconstructed transition frequency data by the proposed method

Fig. 2 The two-dimensional configuration obtained from the analysis of symmetrized transition frequency data without reconstructing data

C

×

0.0 –0.5

–1.0

–1.5

0.5

1.0

× 1.5 SD

Dimension1

138

A. Nakayama and A. Okada

the positions of each party are not based on the characteristics of them. The relationships among parties are not able to be expressed well. Then, the two-dimensional results of the present method are compared with those of De Rooij and Heiser (2000)’s one-mode three-way asymmetric model. De Rooij and Heiser (2000) analyzed the same transition frequency data of Swedish respondents. The results of De Rooij and Heiser (2000) have the same tendency of the proposed method. The positions of the objects in the result of De Rooij and Heiser (2000) corresponds with those in the proposed method. Left- to right-wing parties locate clockwise. From these results, the proposed method seems to give good results in the present analysis. The analysis of reconstructed one-mode three-way data were able to reveal new relationships among web pages which were not clear in the analysis of the averaged one-mode three-way data. The proposed method may be used successfully to facilitate a clear understanding of the relationships among objects.

4 Conclusion and Outlook The above analysis was carried out to apply the consecutive Swedish election data to our proposed reconstructed method and we compare the results of the reconstructed data with those of non-reconstructed data. As noted above, our reconstructed method seems to have clearly revealed the triadic relationships among objects. However, the effectiveness of our method must be discussed in greater details. Therefore, our proposed reconstructed method were applied to the transition data of web pages. We compare the results of the reconstructed one-mode three-way proximity data based on our method with those of the reconstructed one-mode two-way proximity data based on Harshman et al. (1982)’s method. These analyses were done by using the maximum dimensionalities of nine through five and the minimum dimensionality of one. The smallest stress value in each dimensional space was chosen as the minimized stress value in that dimensional space. The two-dimensional configuration is now discussed in the present analysis. Twodimensional configuration helps easy understanding of the relationships among web pages. The stress value in two-dimensional space obtained from reconstructed onemode three-way data was 0.572, and the stress value in two-dimensional space obtained from reconstructed one-mode two-way data was 0.373. Figure 3 is the two-dimensional configuration of reconstructed one-mode threeway symmetrized transition frequency data among web pages. In Fig. 3, there exist some groups based on relationships with transitions among web pages. One group consists of Pages 2, 3, and 13. Pages 2 and 3 are the pages explaining access analysis and Page 13 illustrates the price of the application. These pages are located near the center and the distance among pages is very short. Therefore, it is thought that these pages are often browsed simultaneously. The second group consists of Pages 12, 14, 15, and 16. These pages show a group relevant to services provided by the company administering the web site such as the pages of flow of the application, introduction example, web site strategic report service, access analysis consulting

Reconstructing One-Mode Three-way Asymmetric Data for Multidimensional Scaling

139

Dimension 2

service. The third group consists of Pages 4, 5, 6, 7, and 8. These pages illustrate a group relevant to the general function provided by the company such as the pages of site analysis, page analysis, and path analysis. The fourth group consists of Pages 9, 10, and 11. These pages explain advertising effectiveness measurement and evaluation of effectiveness of search engine optimization. It is thought that these groups show the tendency of a general browse of the present web site. However these proposals are not generalized from only the results of the present analysis. It is necessary to check the validity of the present analysis in the future. Figure 4 is the two-dimensional configuration of reconstructed one-mode two-way symmetrized transition frequency data among web pages. Figures 3 and 4 show similar tendencies. There exist the same groups in Figs. 3 and 4. However, Fig. 4 partially differs in the tendency from Fig. 3. These differences exist in the pages of the access analysis and the service provided by the company. In Fig. 3, the page of the access analysis such as Pages 2, 3, and 13 are one group. However, only Pages 2 and 3 are one group in Fig. 4. Pages 12, 14, 15, and 16 are one group in Fig. 3, but Pages 12 and 13 are one group and Pages 14, 15, and 16 the other group in Fig. 4. The results of the above analysis provide an important new insight. The analysis of one-mode three-way data revealed new relationships among web pages which were not clear in analysis of one-mode two-way data. Our model clearly identified differences among groups of triadic web pages. These results indicate that our model solution produced more detailed representations among triadic web pages than the two-way distance model.

2.0 1.5 1.0

Page 10 -2.5

-2.0

-1.5

Page 16

Page 12 Page 15 0.5 Page 14 Page 11 Page 13 Page 2 Page 3 Page 9 0.0 Page 10.5 -1.0 -0.5 1.0 1.5 0.0 Page 4 -0.5 Page 5 Page 8 -1.0

2.0

2.5

Dimension 1

Page 6

-1.5 Page 7 -2.0

Fig. 3 The two-dimensional configuration of reconstructed one-mode three-way symmetrized transition frequency data among web pages

140

A. Nakayama and A. Okada

Dimension 2

2.0

1.5

Page 16 1.0 Page 15 Page 10 Page 11

0.5 Page 12

Page 14

Page 13 Page 1

Page 3 Page 2

0.0 –2.0

–1.5

–1.0 Page 9

–0.5

0.0

0.5 1.0 Page 4

1.5

2.0

Dimension 1

–0.5

Page 5

Page 8 –1.0 Page 6 –1.5 Page 7 –2.0

Fig. 4 The two-dimensional configuration of reconstructed one-mode two-way symmetrized transition frequency data among web pages

In the future study, we would like to establish the validity of the proposed method to apply to various one-mode three-way asymmetric proximity data. The two-dimensional results are discussed in the present analyses because those help easy understanding of the relationships among the objects. So, we would like to consider higher dimensional representations in the future study. We would like to extend the proposed method to the analysis of four-way or more-way data, too. Acknowledgements We would like to express our gratitude to two anonymous referees for their valuable reviews. We wish to thank the Officials of Data Analysis Competition 2006 administered by Joint Association Study Group of Management Science which provided valuable transition data among web pages. We are also greatly indebted to Professor Satoru Yokoyama of Teikyo University for the great support and advice in analyzing data. The major part of the present work done by Nakayama was carried out when he was at Nagasaki University. This work was supported by KAKENHI(20700257).

References De Rooij M, Gower JC (2003) The geometry of triadic distances. J Classification 20:181–220 De Rooij M, Heiser WJ (2000) Triadic distances models for the analysis of asymmetric three-way proximity data. Br J Math Stat Psychol 53:99–119 Harshman RA, Green PE, Wind Y, Lundy ME (1982) A model for the analysis of asymmetric data in marketing research. Market Sci 1:205–242

Reconstructing One-Mode Three-way Asymmetric Data for Multidimensional Scaling

141

Okada A, Imaizumi T (1987) Nonmetric multidimensional scaling of asymmetric proximities. Behaviormetrika 21:81–96 Saburi S, Chino N (2008) A maximum likelihood method for an asymmetric MDS model. Comput Stat Data Anal 52:4673–4684 Upton GJG (1978) The analysis of cross-tabulated data. Wiley, Chichester Yokoyama S, Okada A (2006) Rescaling a proximity matrix using entropy in brand-switching data. Jpn J Behaviormetrics 33:159–166 Yokoyama S, Okada A (2007) Rescaling proximity matrix using entropy analyzed by INDSCAL. In: Decker R, Lenz HJ (eds) Advances in data analysis. Springer, Heidelberg, Germany, pp 327–334 Zielman B, Heiser WJ (1993) Analysis of asymmetry by a slide vector. Psychometrika 58:101–114