Multi-level scaling properties of instant-message ... - Science Direct

2 downloads 0 Views 448KB Size Report
Xiaopu Han a. , Binghong Wang a,c ... Department of Physics, The Hong Kong University of Science and Technology, Hong Kong c. Shanghai Academy for ...
Available online at www.sciencedirect.com

Physics Procedia 3 (2010) 1897–1905 Physics Procedia 00 (2008) 000–000

www.elsevier.com/locate/procedia www.elsevier.com/locate/XXX

Multi-Level Scaling Properties of Instant-Message Communications Guanxiong Chena,b, Xiaopu Hana, Binghong Wanga,c * a

Department of Modern Physics, University of Science and Technology of Chian, Hefei 230026 b Department of Physics, The Hong Kong University of Science and Technology, Hong Kong c Shanghai Academy for Systemic Sciences, Shanghai 200093 Elsevier use only: Received date here; revised date here; accepted date here

Abstract To research the statistical properties of human’s communication behaviors is one of the highlight areas of Human Dynamics. In this paper, we analyze the instant message data of QICQ from volunteers, and discover that there are many forms of non-Poisson characters, such as inter-event distributions of sending and receiving messages, communications between two friends, log-in activities, the distribution of online time, quantities of messages, and so on. These distributions not only denote the pattern of human communication activities, but also relate to the statistical property of human behaviors in using software. We find out that most of these exponents distribute between -1 and -2, which indicates that the Instant Message (IM) communication behavior of human is different from Non-IM communication behaviors; there are many fat-tail characters related to IM communication behavior. c 2010 © Elsevier B.V. All rights

2010 Published by Elsevier Ltd reserved Open access under CC BY-NC-ND license.

Keyword: Inter-event distribution; Instant Message Communication; Software Operation; non-Poisson Property; Human Dynamics

1. Introduction Recently, the rising of the researches of spatio-temporal statistical properties in human behaviors attracts attentions of many researchers [1]. In the past, people generally assume that human behaviors happen randomly. One of the conclusions of this assumption is, Poisson process can describe the behavior completely, and thus the distribution of time intervals is homogeneous type. There is neither a long time of silence, nor a short time of burst. But from 2005 and after, the interval time distributions of many human behaviors in daily life are studied, such as email and surface mail sending and replying, and people found that these behaviors are sharply different from the Poisson process assumption [2,3]: The inter-event time distributions show a power-law like fat tail, namely, it is inhomogeneous. Such fat-tail property cannot be described by the traditional Poisson process. This surprising conclusion implies that the dynamic mechanism of individual behaviors may be very complex. And here comes a

* Corresponding author. Tel.: +86-551-3607407; fax: +86-551-3607407. E-mail address: [email protected]

c 2010 Published by Elsevier Ltd Open access under CC BY-NC-ND license. 1875-3892 doi:10.1016/j.phpro.2010.07.034

1898

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Guanxiong Chen/ Physics Procedia 00 (2010) 000–000

2

further question: Is this non-Poisson character universal in all kinds of human behaviors? By now, people have analyzed many kinds of human behaviors within this question. Using different data collecting methods, behaviors like market trade [4,5,6,7], web browsing [8,9], online movie ordering [10], online music enjoying [11], mobile phone communication [12,13], online game and behaviors in virtual community [14,15], executing of computer instructions[16] and so on, were studied. These behaviors include business behaviors, entertainment behaviors, daily life habit, etc, and touch most of fields of human behaviors. These empirical explorations indicate that the fat-tail property is wide-spreading in human behavior; Most are different from Poisson process. This shows that such nonPoisson property of inter-event distribution is universal in many human behaviors except those controlled by physiological rhythms. Recently, several mechanisms of this distribution are proposed to investigate the origin of such non-Poisson property. An important one of them points out that when people face many tasks, they deal with them just like a queue of assignments: the one with high-priority will be treated firstly [2,17,18,19]. Besides, because human behaviors are so complex and they are easily influenced by many factors, some other models different from queue model are also promoted by researchers, for example, the model considers the impact of human memory [20] and the effect of change of personal preference [21]; Another researches the effect of periodic and seasonal human behavior on non-Poisson mechanism [22]. One recent model explains the character of human behavior from multi-Poisson distribution [23]. These researches imply that the underlying mechanism of such nonPoisson property in human behaviors maybe multiform. Communication activities between different people are important in human behaviors. It is helpful not only in understanding human behaviors and society structure, but also in optimization of communication systems and observation of the evolution of social structures. People have found that non-Poisson characters also exist in many communication, such as e-mail [2], surface mail [3], mobile phone [12,13] and so on; there even exist many kinds of scaling distributions in different levels [13]. Recently, more widely investigations of human behaviors and social structures are raised based on the database of many online societies and mobile communications, and contact the personalities in behaviors of different users and different relationships [24-26]. In the ways of online communications, internet Instant Message (IM) is an important one. QICQ (QQ for short) is the most popular IM communication software in China. In this paper, we investigate the statistical properties of individual behaviors of QQ users, including both communication activities and software operations, and find that there are many type of non-Poisson properties. 2. Data and anyalysis We collect QQ data (just the time point of every message sending and receiving) from 9 volunteers. Information of the datasets of the 9 volunteers is shown in Table 1. Where dataset A, B, C, D, and E are complete records of communication during all the collecting time, others are only including time of each message sending with their friends. The time ranges of different dataset are different, varying from 1 month to 1 year, and the average frequencies of software operation and friend numbers are also different. Table 1. QQ data from volunteers Datasets (Volunteers) A B C D E F G H I

Number of sending 12312 10396 2834 2052 18284 986 1036 671 2197

Time Range (month)

Number of receivers

4.5 12 6 11 11 2 4 1 3

4 8 5 6 10 2 2 2 9

1899

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Chen Guan-xiong/ Physics Procedia 00 (2010) 000–000

3

The temporal pattern shown in interval distribution between adjacent events is one of the important parameters of human behaviors. Here, the interval time distribution between adjacent two communication events or operation activities is the focus of our work. We investigated the inter-event time distributions of communication events in three different levels. As shown in Figure 1, the inter-event time distribution of message sending for all messages are all power-law like with fat-tails. The exponent values of these distributions are between -2.0 and -2.5 for different users. The distributions of the inter-event time between two adjacent sending from user to his/her most frequency contact friend can also be well fitted by power-law distributions with exponent between -1.5 and -3.0 for different users, as shown in Figure 2. Similar fat-tail inter-event distribution can also be observed in the distribution of message sending in each conversation, as shown in Figure 3, where each conversation can be distinguished by the silences longer than 10 minutes. These empirical results indicate that such fat-tail properties in inter-event distributions are wide-spread and covering several levels in the behaviors of IM communication. Ref. [10] and [13] show that there is obvious positive correlation between the density of occurrence of event and the power-law exponent of inter-event distribution. This correlation is also under our consideration, but we did not find out a clearly correlation between them. One possible reason is our samples are not large enough, so it’s still uncertain and needs more deep studies. -1

10

-2

10

-3

10

-4

P(W)

10

-1

10

-2

10

-3

10

-4

P(W)

10

(1) Slope = -2.01 10

1

10

2

10

3

(3)

Slope = -2.02 10

1

10

P(W)

10

2

10

3

10

-1

10

-2

10

-3

10

-4

10

-1

10

-2

10

-3

10

-4

(2) Slope = -2.07 10

10 10

-3

10

2

10

3

(4) Slope = -2.49

10

-1

-2

1

1

10

2

10

3

(5) Slope = -2.01 10

1

W(s)

10

2

10

3

FIG 1. The inter-event time distribution of message sending in log-log plots, where datasets (1), (2), (3), (4), (5) are from volunteer A, B, C, D, E, respectively.

1900

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Guanxiong Chen/ Physics Procedia 00 (2010) 000–000

4

0

0

10

Slope = -1.96

-1

P(W)

10

-2

-3

P(W)

10

-3

2

10

10

Slope = -2.11

-1

-2

10

10

Slope = -2.10

P(W)

P(W)

2

10

10

Slope = -1.99

-1

10

10

-2

10

2

10

10

1

2

10

0

10

Slope = -2.15

-1

-3

1

(9)

-3

10

2

10

-2

10

Slope = -2.11

-1

10

-2

10

Slope = -2.42

-1

10

-2

10

(10)

2

10

10

(8) 1

0

1

10

0

Slope = -2.12

-1

10

(6)

-3

10

2

10

10

10

Slope = -1.53

-1

10

-2

10

10

-3

10

2

10

10

(5) 1

0

-3

10

10

10

10

(7) 1

0

10

10

1

10

0

-2

-2

10

10

2

10

Slope = -1.73

-1

-3

2

10

-1

-3

10

(3)

-3

1

10

-2

10 10

0

10

(4) 1

0

10

(2)

10

10

-3

10

Slope = -2.21

-1

10

-2

-2

10

10

-1

10

(1) 1

0

10

Slope = -2.82

10

10 10

0

10

10

(11)

(12)

-3

1

10

2

W(s)

10

10

1

10

2

10

FIG 2. The distribution of the inter-event time between two adjacent sendings from user to his/her most frequency contact friend. (1),(2),(3) from dataset A, (4),(5),(6) from C, (7),(8) from E, (9) from F, (10) from G, (11) from H, (12) from I. 0

0

10

10

P(W)

(1) -1

-1

10

-2

10

10

Slope = -2.08 1

0

10

(2)

-2

10

1

2

10

Slope = -1.79

10

0

10

10

10

2

(4)

(3) -1

-1

P(W)

10

-2

10

10

Slope = -2.10 1

10

-2

10

10

W(s)

2

Slope = -1.98 1

10

W(s)

10

2

FIG 3. The inter-event distribution of message sending in each conversation. (1),(2) from A, (3) from C, (4) from E.

1901

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Chen Guan-xiong/ Physics Procedia 00 (2010) 000–000

5

The statistical character of people operating computer system is important to application software development; it is also a hotspot in recent Human Dynamics researches. People have discovered non-Poisson property in computer commands executing [16]. In this paper, the statistical character in using QQ will be discussed in follows. We investigate the inter-event time distribution of login and logout of single user. Our datasets do not include the time of each login and logout, but it can be obtained from the time of longer term activity silence. Because people usually login QQ immediately once they start the computer system, then hold online for a long time until turning off the computer (which implies the logout of the QQ), and the time duration for most people using computer each time is generally shorter than one day. So the time of each login can be obtained roughly by the following method: if the interval time of two adjacent message sending is longer than 86400 seconds (one day), it generally implies that the user has logged out in this term. Thus these two message sending can be divided into different logins of two days, and the time of the second sending is treated as the login time, and the time of the first sending as logout time. The interval distributions of adjacent login time are shown in Figure 4. They are also power-law like distributions with exponent between -1 and -2 for different users. This result implies that the non-Poisson properties also exist in human using IM software. In addition, our results show that the more frequent people using the software, the bigger the absolute value of the fitting exponent.

10

0

(1)

(2)

Slope = -1.90

-1

10

-2

P(W)

10

10

0

10

1

(3)

Slope=-1.25

10

0

W(day)

10

1

Slope = -2.07

10

0

10

1

FIG 4. The inter-event distributions of adjacent login time. Time unit of IJ is one day. Where (1),(2),(3) are obtained from dataset B,D,E, respectively.

1902

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Guanxiong Chen/ Physics Procedia 00 (2010) 000–000

6

On the other hand, we also discover when people login QQ, the length of online-time in each login also obeys power-law like distribution. Since the limitation of the total number of login dataset, the time unit of statistics should be chosen carefully. Too large time unit will reduce the number of data points; in the other hand too small time unit will bring large fluctuation to the curves of distributions. Comparing with the curves of inter-event distributions obtained by different unit settings, we found the statistic property is more clearly and the fitting errors are smaller when the time unit is set to 10 minutes. As shown in figure 5, the values of fitting exponents are about 1.5.

10

0

(1)

(2)

Slope = -1.67 -1

10

-2

Slope=-1.37

Slope = -1.57

P(W)

10

(3)

10

0

10

1

10

0

10

1

W(10 minutes)

10

0

10

1

FIG 5. The distributions of online time in each login of different users. Time unit of IJ is 10 minutes. Where (1), (2), (3) are obtained from dataset B, D, E, respectively.

1903

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Chen Guan-xiong/ Physics Procedia 00 (2010) 000–000

7

Finally, the distributions of total number of message sending in each login also show scaling properties. According to the same reason with the above distribution of online time, the unit of statistics is set to 30 pieces to get clearly statistic property. The unit of statistics is 30 pieces, as shown in Figure 6. They are also obeying powerlaw like type with the exponents about -2.0. Noticed, this distribution is different from the above ones, it does not indicate the statistical pattern on time, but indicates an intensity distribution of people behavior in operating of the software.

10

0

-1

10

-2

10

-3

P(n)

10

(2)

(1)

Slope = -2.09

Slope = -2.01

0

10

10

n

1

10

0

10

1

n

FIG 6. The distributions of total number of message sending in each login. The unit of n is 30 pieces. Where (1),(2) are obtained from B,E, respectively.

1904 8

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Guanxiong Chen/ Physics Procedia 00 (2010) 000–000

3. Conclusions and discussions The data of IM software QQ in communications and software using are investigated in this paper. Our empirical statistics indicate that non-Poisson properties are wide-spread in user activities of IM software communications in different levels, such as the interval distribution of sending and receiving messages, interval time distribution of communication sending, inter-event time distribution of log-in, distribution of online time, and distribution of total number of message sending. These scaling properties are not only found in the statistics of message sending, but also in the behaviors of software operating. These characters indicate that when people are using IM communication, these behaviors do not occur randomly. Generally, using simple fitting, the distribution exponents of these distributions are mainly between -1 and -2, and for some other samples are in the range from -2 to -3. These possible values are different from the values got from email and surface mail where they are -1 and -1.5, respectively [18], but similar with that of mobile phone communication [12,13]. The common properties in the communication of both short messages and IM are instant property and the character that many times sending are needed to complete a communication. This character is not in the un-instant communications such as email and surface mail. So our results imply that the difference of the inter-event distributions between the instant communications (IM and short message) and other un-instant communications may be generated by the two properties. Since the instant communications in IM and short messages is much similar with the natural talk of human, our results imply that the pattern in the human activity in natural speak could have similar properties. Our results also imply that human software operation activities obey such non-Poisson properties. In addition, we find out there is obvious scaling property in the intensity distribution of IM contacts, implies that similar scaling properties could be wide-spreading in social system, although our sample is small. These inferences and conclusions need further developments.

4. Acknowledgement

This work is supported the National Basic Research Program of China (973 Program No.2006CB705500), and the National Natural Science Foundation of China (Grant No. 60744003, 10975126 and 70871082). 5. References [1] T. Zhou, X.-P. Han, B.-H. Wang, Towards the Understanding of Human Dynamics, In: Science Matters: Humanities as Complex Systems, M. Burguete, and L. Lam, World Scientific, 2008. [2] A.-L. Barabási, Nature 435, (2005) 207 [3] J.G. Oliveira, A.-L. Barabási, Nature 437, (2005) 1251. [4] V. Plerou, P. Gopikrishnan, L. A. N. Amaral, X. Gabaix, and H. E. Stanley, Phys. Rev. E 62, (2000) 3023(R). [5] J. Masoliver, and M. Montero, Phys. Rev. E 67, (2003) 021112. [6] M. Politi, and E. Scalas, Physica A 387, (2008)2025. [7] Z. Q. Jiang, et al, Physica A 387, (2008) 5818. [8] Z. DezsĘ, E. Almaas1, A. Lukács, B. Rácz, I. Szakadát, and A.-L. Barabási, Phys. Rev. E 73, (2006) 066132. [9] B. Goncalves, and J. J. Ramasco, Phys. Rev. E 78, (2008) 026123. [10] T. Zhou, H. A.-T. Kiet, B.J. Kim, B.-H. Wang, and P. Holme, Europhys. Lett. 82, (2008) 28002. [11] H. B. Hu, and D. Y. Han, Physica A 387, (2008) 5916. [12] J. Candia, M.C. González, P. Wang, T. Schoenhar, G. Madey and A.-L. Barabási, J. Phys. A: Math. Theor. 41, (2008) 224015. [13] W. Hong, X.-P. Han, T. Zhou, and B.-H. Wang, Chin. Phys. Lett. 26, (2009) 028902. [14] T. Henderson, and S. Nhatti, Proc. 9th ACM Int. Conf. on Multimetia, pp. 212, ACM Press 2001. [15] A. Grabowski, N. Kruszewska, and R. A. KosiĔski, Phys. Rev. E 78, (2008) 066110. [16] S. K. Baek, T. Y. Kim, and B. J. Kim, Physica A 387, (2008) 3660. [17] A. Vázquez, Phys. Rev. Lett. 95, (2005) 248710.

G. Chen et al. / Physics Procedia 3 (2010) 1897–1905 Chen Guan-xiong/ Physics Procedia 00 (2010) 000–000

1905 9

[18] A. Vázquez, J. G. Oliveira, Z. DezsĘ, K.-I. Goh, I. Kondor, and A.-L. Barabási, Phys. Rev. E 73, (2006) 036127. [19] A. Gabrielli and G. Caldarelli, Phys. Rev. Lett. 98, (2007) 208701. [20] A. Vazquez, Physica A 373, (2006) 747. [21] X.-P. Han, T. Zhou, and B.-H. Wang, New J. Phys.10, (2008) 073010. [22] A. Cesar, R. Hidalgo, Physica A 369, (2006) 877. [23] R. D. Malmgren, D. B. Stouffer, A. E. Motter and L. A. N. Amaral, Proc. Natl. Acad. Sci. USA 105, (2008) 18153. [24] N. Eagle, A. Pentland, and D. Lazer, Proc. Natl. Acad. Sci. USA 106, (2009) 15274. [25] H.-B. Hu and X.-F Wang, Europhys. Lett. 86, (2009) 10083. [26] M. Mitroviü and B. Tadiü, arxiv: 0910.2849.

Suggest Documents